Dandandooo
's Collections
Mamba-YOLO-World: Marrying YOLO-World with Mamba for Open-Vocabulary
Detection
Paper
•
2409.08513
•
Published
•
11
Windows Agent Arena: Evaluating Multi-Modal OS Agents at Scale
Paper
•
2409.08264
•
Published
•
43
Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at
Any Resolution
Paper
•
2409.12191
•
Published
•
75
LLMs + Persona-Plug = Personalized LLMs
Paper
•
2409.11901
•
Published
•
31
InfiMM-WebMath-40B: Advancing Multimodal Pre-Training for Enhanced
Mathematical Reasoning
Paper
•
2409.12568
•
Published
•
47
Language Models Learn to Mislead Humans via RLHF
Paper
•
2409.12822
•
Published
•
9
Imagine yourself: Tuning-Free Personalized Image Generation
Paper
•
2409.13346
•
Published
•
68
YesBut: A High-Quality Annotated Multimodal Dataset for evaluating
Satire Comprehension capability of Vision-Language Models
Paper
•
2409.13592
•
Published
•
49
A Case Study of Web App Coding with OpenAI Reasoning Models
Paper
•
2409.13773
•
Published
•
6
Emu3: Next-Token Prediction is All You Need
Paper
•
2409.18869
•
Published
•
94
MIO: A Foundation Model on Multimodal Tokens
Paper
•
2409.17692
•
Published
•
53
Paper
•
2410.05258
•
Published
•
168
LLMs Know More Than They Show: On the Intrinsic Representation of LLM
Hallucinations
Paper
•
2410.02707
•
Published
•
47
Personalized Visual Instruction Tuning
Paper
•
2410.07113
•
Published
•
69
What Matters in Transformers? Not All Attention is Needed
Paper
•
2406.15786
•
Published
•
30
SemiEvol: Semi-supervised Fine-tuning for LLM Adaptation
Paper
•
2410.14745
•
Published
•
47
PyramidDrop: Accelerating Your Large Vision-Language Models via Pyramid
Visual Redundancy Reduction
Paper
•
2410.17247
•
Published
•
45
Breaking the Memory Barrier: Near Infinite Batch Size Scaling for
Contrastive Loss
Paper
•
2410.17243
•
Published
•
89
Denoising Diffusion Probabilistic Models
Paper
•
2006.11239
•
Published
•
3
Denoising Diffusion Implicit Models
Paper
•
2010.02502
•
Published
•
3
Learning to (Learn at Test Time): RNNs with Expressive Hidden States
Paper
•
2407.04620
•
Published
•
27
o1-Coder: an o1 Replication for Coding
Paper
•
2412.00154
•
Published
•
41
Byte Latent Transformer: Patches Scale Better Than Tokens
Paper
•
2412.09871
•
Published
•
80
Are Your LLMs Capable of Stable Reasoning?
Paper
•
2412.13147
•
Published
•
91
Motion Prompting: Controlling Video Generation with Motion Trajectories
Paper
•
2412.02700
•
Published
•
15