-
Universal Language Model Fine-tuning for Text Classification
Paper • 1801.06146 • Published • 6 -
Exploiting Similarities among Languages for Machine Translation
Paper • 1309.4168 • Published -
Theory, Analysis, and Best Practices for Sigmoid Self-Attention
Paper • 2409.04431 • Published • 1 -
Kolmogorov-Arnold Transformer
Paper • 2409.10594 • Published • 39
Collections
Discover the best community collections!
Collections including paper arxiv:2409.10594
-
SciLitLLM: How to Adapt LLMs for Scientific Literature Understanding
Paper • 2408.15545 • Published • 34 -
Controllable Text Generation for Large Language Models: A Survey
Paper • 2408.12599 • Published • 64 -
To Code, or Not To Code? Exploring Impact of Code in Pre-training
Paper • 2408.10914 • Published • 41 -
Automated Design of Agentic Systems
Paper • 2408.08435 • Published • 39
-
Associative Recurrent Memory Transformer
Paper • 2407.04841 • Published • 32 -
Mixture-of-Agents Enhances Large Language Model Capabilities
Paper • 2406.04692 • Published • 55 -
Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality
Paper • 2405.21060 • Published • 64 -
Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone
Paper • 2404.14219 • Published • 253
-
RLHF Workflow: From Reward Modeling to Online RLHF
Paper • 2405.07863 • Published • 66 -
Chameleon: Mixed-Modal Early-Fusion Foundation Models
Paper • 2405.09818 • Published • 126 -
Meteor: Mamba-based Traversal of Rationale for Large Language and Vision Models
Paper • 2405.15574 • Published • 53 -
An Introduction to Vision-Language Modeling
Paper • 2405.17247 • Published • 87
-
Nemotron-4 15B Technical Report
Paper • 2402.16819 • Published • 42 -
Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models
Paper • 2402.19427 • Published • 52 -
RWKV: Reinventing RNNs for the Transformer Era
Paper • 2305.13048 • Published • 15 -
Reformer: The Efficient Transformer
Paper • 2001.04451 • Published
-
Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models
Paper • 2401.01335 • Published • 64 -
Lumiere: A Space-Time Diffusion Model for Video Generation
Paper • 2401.12945 • Published • 86 -
Adding NVMe SSDs to Enable and Accelerate 100B Model Fine-tuning on a Single GPU
Paper • 2403.06504 • Published • 53 -
Transformer-Lite: High-efficiency Deployment of Large Language Models on Mobile Phone GPUs
Paper • 2403.20041 • Published • 34