-
MLLM-as-a-Judge for Image Safety without Human Labeling
Paper • 2501.00192 • Published • 22 -
2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining
Paper • 2501.00958 • Published • 75 -
Xmodel-2 Technical Report
Paper • 2412.19638 • Published • 23 -
HuatuoGPT-o1, Towards Medical Complex Reasoning with LLMs
Paper • 2412.18925 • Published • 86
Collections
Discover the best community collections!
Collections including paper arxiv:2412.18925
-
2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining
Paper • 2501.00958 • Published • 75 -
Are Vision-Language Models Truly Understanding Multi-vision Sensor?
Paper • 2412.20750 • Published • 17 -
Do NOT Think That Much for 2+3=? On the Overthinking of o1-Like LLMs
Paper • 2412.21187 • Published • 25 -
HuatuoGPT-o1, Towards Medical Complex Reasoning with LLMs
Paper • 2412.18925 • Published • 86
-
ProcessBench: Identifying Process Errors in Mathematical Reasoning
Paper • 2412.06559 • Published • 72 -
Maya: An Instruction Finetuned Multilingual Multimodal Model
Paper • 2412.07112 • Published • 25 -
OpenAI o1 System Card
Paper • 2412.16720 • Published • 29 -
Diving into Self-Evolving Training for Multimodal Reasoning
Paper • 2412.17451 • Published • 41
-
Video Creation by Demonstration
Paper • 2412.09551 • Published • 8 -
DiffSensei: Bridging Multi-Modal LLMs and Diffusion Models for Customized Manga Generation
Paper • 2412.07589 • Published • 46 -
Unraveling the Complexity of Memory in RL Agents: an Approach for Classification and Evaluation
Paper • 2412.06531 • Published • 71 -
APOLLO: SGD-like Memory, AdamW-level Performance
Paper • 2412.05270 • Published • 38
-
GenEx: Generating an Explorable World
Paper • 2412.09624 • Published • 87 -
Multimodal Music Generation with Explicit Bridges and Retrieval Augmentation
Paper • 2412.09428 • Published • 7 -
BrushEdit: All-In-One Image Inpainting and Editing
Paper • 2412.10316 • Published • 33 -
FashionComposer: Compositional Fashion Image Generation
Paper • 2412.14168 • Published • 16
-
gradientai/Llama-3-8B-Instruct-Gradient-1048k
Text Generation • Updated • 3.37k • 678 -
Are Your LLMs Capable of Stable Reasoning?
Paper • 2412.13147 • Published • 91 -
RetroLLM: Empowering Large Language Models to Retrieve Fine-grained Evidence within Generation
Paper • 2412.11919 • Published • 33 -
HuatuoGPT-o1, Towards Medical Complex Reasoning with LLMs
Paper • 2412.18925 • Published • 86
-
Interactive Medical Image Segmentation: A Benchmark Dataset and Baseline
Paper • 2411.12814 • Published • 21 -
SegBook: A Simple Baseline and Cookbook for Volumetric Medical Image Segmentation
Paper • 2411.14525 • Published • 19 -
MRGen: Diffusion-based Controllable Data Engine for MRI Segmentation towards Unannotated Modalities
Paper • 2412.04106 • Published • 5 -
PepTune: De Novo Generation of Therapeutic Peptides with Multi-Objective-Guided Discrete Diffusion
Paper • 2412.17780 • Published • 3