-
Shortened LLaMA: A Simple Depth Pruning for Large Language Models
Paper • 2402.02834 • Published • 14 -
BiLLM: Pushing the Limit of Post-Training Quantization for LLMs
Paper • 2402.04291 • Published • 48 -
PB-LLM: Partially Binarized Large Language Models
Paper • 2310.00034 • Published • 1 -
ShortGPT: Layers in Large Language Models are More Redundant Than You Expect
Paper • 2403.03853 • Published • 61
Collections
Discover the best community collections!
Collections including paper arxiv:2402.04291
-
BiTA: Bi-Directional Tuning for Lossless Acceleration in Large Language Models
Paper • 2401.12522 • Published • 11 -
Hydragen: High-Throughput LLM Inference with Shared Prefixes
Paper • 2402.05099 • Published • 19 -
BiLLM: Pushing the Limit of Post-Training Quantization for LLMs
Paper • 2402.04291 • Published • 48 -
Shortened LLaMA: A Simple Depth Pruning for Large Language Models
Paper • 2402.02834 • Published • 14
-
Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models
Paper • 2402.19427 • Published • 52 -
Simple linear attention language models balance the recall-throughput tradeoff
Paper • 2402.18668 • Published • 18 -
ChunkAttention: Efficient Self-Attention with Prefix-Aware KV Cache and Two-Phase Partition
Paper • 2402.15220 • Published • 19 -
Linear Transformers are Versatile In-Context Learners
Paper • 2402.14180 • Published • 6
-
Self-Rewarding Language Models
Paper • 2401.10020 • Published • 145 -
ReFT: Reasoning with Reinforced Fine-Tuning
Paper • 2401.08967 • Published • 29 -
Tuning Language Models by Proxy
Paper • 2401.08565 • Published • 21 -
TrustLLM: Trustworthiness in Large Language Models
Paper • 2401.05561 • Published • 66
-
Contrastive Preference Optimization: Pushing the Boundaries of LLM Performance in Machine Translation
Paper • 2401.08417 • Published • 34 -
PIXART-δ: Fast and Controllable Image Generation with Latent Consistency Models
Paper • 2401.05252 • Published • 47 -
BiLLM: Pushing the Limit of Post-Training Quantization for LLMs
Paper • 2402.04291 • Published • 48 -
FiT: Flexible Vision Transformer for Diffusion Model
Paper • 2402.12376 • Published • 48
-
SqueezeLLM: Dense-and-Sparse Quantization
Paper • 2306.07629 • Published • 4 -
Norm Tweaking: High-performance Low-bit Quantization of Large Language Models
Paper • 2309.02784 • Published • 1 -
Extreme Compression of Large Language Models via Additive Quantization
Paper • 2401.06118 • Published • 12 -
BiLLM: Pushing the Limit of Post-Training Quantization for LLMs
Paper • 2402.04291 • Published • 48
-
QuIP: 2-Bit Quantization of Large Language Models With Guarantees
Paper • 2307.13304 • Published • 2 -
SpQR: A Sparse-Quantized Representation for Near-Lossless LLM Weight Compression
Paper • 2306.03078 • Published • 3 -
OmniQuant: Omnidirectionally Calibrated Quantization for Large Language Models
Paper • 2308.13137 • Published • 17 -
AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration
Paper • 2306.00978 • Published • 9
-
Exponentially Faster Language Modelling
Paper • 2311.10770 • Published • 117 -
stabilityai/stable-video-diffusion-img2vid-xt
Image-to-Video • Updated • 154k • 2.8k -
LucidDreamer: Domain-free Generation of 3D Gaussian Splatting Scenes
Paper • 2311.13384 • Published • 50 -
HierSpeech++: Bridging the Gap between Semantic and Acoustic Representation of Speech by Hierarchical Variational Inference for Zero-shot Speech Synthesis
Paper • 2311.12454 • Published • 30
-
DeepSpeed Ulysses: System Optimizations for Enabling Training of Extreme Long Sequence Transformer Models
Paper • 2309.14509 • Published • 17 -
LLM Augmented LLMs: Expanding Capabilities through Composition
Paper • 2401.02412 • Published • 36 -
DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models
Paper • 2401.06066 • Published • 44 -
Tuning Language Models by Proxy
Paper • 2401.08565 • Published • 21