2 198 7

Jaehyun Jun

btjhjeon

https://btjhjeon.github.io/

btjhjeon

AI & ML interests

Multimodal

Recent Activity

updated a collection 3 days ago

Multimodal Alignment

upvoted a paper 3 days ago

MLLM-as-a-Judge for Image Safety without Human Labeling

updated a collection 3 days ago

Multimodal Dataset

View all activity

Organizations

btjhjeon's activity

upvoted 2 papers 3 days ago

MLLM-as-a-Judge for Image Safety without Human Labeling

Paper • 2501.00192 • Published 6 days ago • 22

2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining

Paper • 2501.00958 • Published 4 days ago • 75

upvoted 3 papers 6 days ago

Next Token Prediction Towards Multimodal Intelligence: A Comprehensive Survey

Paper • 2412.18619 • Published 21 days ago • 49

On the Compositional Generalization of Multimodal LLMs for Medical Imaging

Paper • 2412.20070 • Published 9 days ago • 40

Explanatory Instructions: Towards Unified Vision Tasks Understanding and Zero-shot Generalization

Paper • 2412.18525 • Published 13 days ago • 62

upvoted 4 papers 10 days ago

MMFactory: A Universal Solution Search Engine for Vision-Language Tasks

Paper • 2412.18072 • Published 13 days ago • 14

Molar: Multimodal LLMs with Collaborative Filtering Alignment for Enhanced Sequential Recommendation

Paper • 2412.18176 • Published 13 days ago • 15

Video-Panda: Parameter-efficient Alignment for Encoder-free Video-Language Models

Paper • 2412.18609 • Published 12 days ago • 13

Mulberry: Empowering MLLM with o1-like Reasoning and Reflection via Collective Monte Carlo Tree Search

Paper • 2412.18319 • Published 13 days ago • 34

upvoted 3 papers 13 days ago

upvoted 5 papers 17 days ago

FastVLM: Efficient Vision Encoding for Vision Language Models

Paper • 2412.13303 • Published 19 days ago • 13

Descriptive Caption Enhancement with Visual Specialists for Multimodal Perception

Paper • 2412.14233 • Published 18 days ago • 6

Qwen2.5 Technical Report

Paper • 2412.15115 • Published 18 days ago • 335

LLaVA-UHD v2: an MLLM Integrating High-Resolution Feature Pyramid via Hierarchical Window Transformer

Paper • 2412.13871 • Published 19 days ago • 17

Thinking in Space: How Multimodal Large Language Models See, Remember, and Recall Spaces

Paper • 2412.14171 • Published 18 days ago • 23

upvoted 3 papers 19 days ago

Emma-X: An Embodied Multimodal Action Model with Grounded Chain of Thought and Look-ahead Spatial Reasoning

Paper • 2412.11974 • Published 21 days ago • 9

OmniEval: An Omnidirectional and Automatic RAG Evaluation Benchmark in Financial Domain

Paper • 2412.13018 • Published 20 days ago • 41

Multi-Dimensional Insights: Benchmarking Real-World Personalization in Large Multimodal Models

Paper • 2412.12606 • Published 20 days ago • 41