ryanafufu
's Collections
my_read_book
updated
MambaVision: A Hybrid Mamba-Transformer Vision Backbone
Paper
•
2407.08083
•
Published
•
28
Transfusion: Predict the Next Token and Diffuse Images with One
Multi-Modal Model
Paper
•
2408.11039
•
Published
•
58
The Mamba in the Llama: Distilling and Accelerating Hybrid Models
Paper
•
2408.15237
•
Published
•
37
Fine-Tuning Image-Conditional Diffusion Models is Easier than You Think
Paper
•
2409.11355
•
Published
•
28
OmniGen: Unified Image Generation
Paper
•
2409.11340
•
Published
•
108
To CoT or not to CoT? Chain-of-thought helps mainly on math and symbolic
reasoning
Paper
•
2409.12183
•
Published
•
36
InfiMM-WebMath-40B: Advancing Multimodal Pre-Training for Enhanced
Mathematical Reasoning
Paper
•
2409.12568
•
Published
•
48
Imagine yourself: Tuning-Free Personalized Image Generation
Paper
•
2409.13346
•
Published
•
68
Training Language Models to Self-Correct via Reinforcement Learning
Paper
•
2409.12917
•
Published
•
136
MaskBit: Embedding-free Image Generation via Bit Tokens
Paper
•
2409.16211
•
Published
•
16
Emu3: Next-Token Prediction is All You Need
Paper
•
2409.18869
•
Published
•
94
FreeScale: Unleashing the Resolution of Diffusion Models via Tuning-Free
Scale Fusion
Paper
•
2412.09626
•
Published
•
19
Byte Latent Transformer: Patches Scale Better Than Tokens
Paper
•
2412.09871
•
Published
•
80
ColorFlow: Retrieval-Augmented Image Sequence Colorization
Paper
•
2412.11815
•
Published
•
26
Mulberry: Empowering MLLM with o1-like Reasoning and Reflection via
Collective Monte Carlo Tree Search
Paper
•
2412.18319
•
Published
•
33