-
LinFusion: 1 GPU, 1 Minute, 16K Image
Paper • 2409.02097 • Published • 33 -
Phidias: A Generative Model for Creating 3D Content from Text, Image, and 3D Conditions with Reference-Augmented Diffusion
Paper • 2409.11406 • Published • 26 -
Diffusion Models Are Real-Time Game Engines
Paper • 2408.14837 • Published • 123 -
Segment Anything with Multiple Modalities
Paper • 2408.09085 • Published • 22
Collections
Discover the best community collections!
Collections including paper arxiv:2411.04965
-
1.58-bit FLUX
Paper • 2412.18653 • Published • 72 -
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
Paper • 2402.17764 • Published • 607 -
BitNet a4.8: 4-bit Activations for 1-bit LLMs
Paper • 2411.04965 • Published • 64 -
BitNet: Scaling 1-bit Transformers for Large Language Models
Paper • 2310.11453 • Published • 96
-
Video Creation by Demonstration
Paper • 2412.09551 • Published • 8 -
DiffSensei: Bridging Multi-Modal LLMs and Diffusion Models for Customized Manga Generation
Paper • 2412.07589 • Published • 45 -
Unraveling the Complexity of Memory in RL Agents: an Approach for Classification and Evaluation
Paper • 2412.06531 • Published • 71 -
APOLLO: SGD-like Memory, AdamW-level Performance
Paper • 2412.05270 • Published • 38
-
OpenCoder: The Open Cookbook for Top-Tier Code Large Language Models
Paper • 2411.04905 • Published • 113 -
LLaVA-o1: Let Vision Language Models Reason Step-by-Step
Paper • 2411.10440 • Published • 113 -
Marco-o1: Towards Open Reasoning Models for Open-Ended Solutions
Paper • 2411.14405 • Published • 58 -
ROICtrl: Boosting Instance Control for Visual Generation
Paper • 2411.17949 • Published • 82
-
BitNet a4.8: 4-bit Activations for 1-bit LLMs
Paper • 2411.04965 • Published • 64 -
"Give Me BF16 or Give Me Death"? Accuracy-Performance Trade-Offs in LLM Quantization
Paper • 2411.02355 • Published • 47 -
Ultra-Sparse Memory Network
Paper • 2411.12364 • Published • 19 -
VisionZip: Longer is Better but Not Necessary in Vision Language Models
Paper • 2412.04467 • Published • 105
-
MinerU: An Open-Source Solution for Precise Document Content Extraction
Paper • 2409.18839 • Published • 27 -
FAN: Fourier Analysis Networks
Paper • 2410.02675 • Published • 25 -
Differential Transformer
Paper • 2410.05258 • Published • 169 -
UniMuMo: Unified Text, Music and Motion Generation
Paper • 2410.04534 • Published • 19
-
QLoRA: Efficient Finetuning of Quantized LLMs
Paper • 2305.14314 • Published • 48 -
EfficientQAT: Efficient Quantization-Aware Training for Large Language Models
Paper • 2407.11062 • Published • 8 -
Spectra: A Comprehensive Study of Ternary, Quantized, and FP16 Language Models
Paper • 2407.12327 • Published • 78 -
BitNet a4.8: 4-bit Activations for 1-bit LLMs
Paper • 2411.04965 • Published • 64
-
Megalodon: Efficient LLM Pretraining and Inference with Unlimited Context Length
Paper • 2404.08801 • Published • 65 -
RecurrentGemma: Moving Past Transformers for Efficient Open Language Models
Paper • 2404.07839 • Published • 44 -
Eagle and Finch: RWKV with Matrix-Valued States and Dynamic Recurrence
Paper • 2404.05892 • Published • 33 -
Mamba: Linear-Time Sequence Modeling with Selective State Spaces
Paper • 2312.00752 • Published • 139