Collections
Discover the best community collections!
Collections including paper arxiv:2403.03507
-
Jamba: A Hybrid Transformer-Mamba Language Model
Paper • 2403.19887 • Published • 106 -
sDPO: Don't Use Your Data All at Once
Paper • 2403.19270 • Published • 41 -
ViTAR: Vision Transformer with Any Resolution
Paper • 2403.18361 • Published • 53 -
Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models
Paper • 2403.18814 • Published • 46
-
Adding NVMe SSDs to Enable and Accelerate 100B Model Fine-tuning on a Single GPU
Paper • 2403.06504 • Published • 53 -
Yi: Open Foundation Models by 01.AI
Paper • 2403.04652 • Published • 62 -
GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection
Paper • 2403.03507 • Published • 184
-
GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection
Paper • 2403.03507 • Published • 184 -
Flora: Low-Rank Adapters Are Secretly Gradient Compressors
Paper • 2402.03293 • Published • 6 -
PRILoRA: Pruned and Rank-Increasing Low-Rank Adaptation
Paper • 2401.11316 • Published • 1 -
MoRA: High-Rank Updating for Parameter-Efficient Fine-Tuning
Paper • 2405.12130 • Published • 47
-
GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection
Paper • 2403.03507 • Published • 184 -
Few-Shot Parameter-Efficient Fine-Tuning is Better and Cheaper than In-Context Learning
Paper • 2205.05638 • Published • 3 -
The Power of Scale for Parameter-Efficient Prompt Tuning
Paper • 2104.08691 • Published • 10 -
In-Context Learning Demonstration Selection via Influence Analysis
Paper • 2402.11750 • Published • 2
-
Scaling Instruction-Finetuned Language Models
Paper • 2210.11416 • Published • 7 -
Mamba: Linear-Time Sequence Modeling with Selective State Spaces
Paper • 2312.00752 • Published • 139 -
Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context
Paper • 2403.05530 • Published • 62 -
Yi: Open Foundation Models by 01.AI
Paper • 2403.04652 • Published • 62
-
GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection
Paper • 2403.03507 • Published • 184 -
Mixture-of-Subspaces in Low-Rank Adaptation
Paper • 2406.11909 • Published • 3 -
Grass: Compute Efficient Low-Memory LLM Training with Structured Sparse Gradients
Paper • 2406.17660 • Published • 5 -
From GaLore to WeLore: How Low-Rank Weights Non-uniformly Emerge from Low-Rank Gradients
Paper • 2407.11239 • Published • 8