-
Unlocking the conversion of Web Screenshots into HTML Code with the WebSight Dataset
Paper • 2403.09029 • Published • 55 -
LLMLingua-2: Data Distillation for Efficient and Faithful Task-Agnostic Prompt Compression
Paper • 2403.12968 • Published • 25 -
RAFT: Adapting Language Model to Domain Specific RAG
Paper • 2403.10131 • Published • 69 -
Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking
Paper • 2403.09629 • Published • 76
Collections
Discover the best community collections!
Collections including paper arxiv:2403.05812
-
ShortGPT: Layers in Large Language Models are More Redundant Than You Expect
Paper • 2403.03853 • Published • 62 -
SLEB: Streamlining LLMs through Redundancy Verification and Elimination of Transformer Blocks
Paper • 2402.09025 • Published • 6 -
Shortened LLaMA: A Simple Depth Pruning for Large Language Models
Paper • 2402.02834 • Published • 15 -
Algorithmic progress in language models
Paper • 2403.05812 • Published • 18
-
Measuring the Effects of Data Parallelism on Neural Network Training
Paper • 1811.03600 • Published • 2 -
Adafactor: Adaptive Learning Rates with Sublinear Memory Cost
Paper • 1804.04235 • Published • 2 -
EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks
Paper • 1905.11946 • Published • 3 -
Yi: Open Foundation Models by 01.AI
Paper • 2403.04652 • Published • 62
-
Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models
Paper • 2402.19427 • Published • 53 -
Beyond Language Models: Byte Models are Digital World Simulators
Paper • 2402.19155 • Published • 50 -
StarCoder 2 and The Stack v2: The Next Generation
Paper • 2402.19173 • Published • 137 -
Simple linear attention language models balance the recall-throughput tradeoff
Paper • 2402.18668 • Published • 19
-
MegaScale: Scaling Large Language Model Training to More Than 10,000 GPUs
Paper • 2402.15627 • Published • 35 -
Rainbow Teaming: Open-Ended Generation of Diverse Adversarial Prompts
Paper • 2402.16822 • Published • 16 -
FuseChat: Knowledge Fusion of Chat Models
Paper • 2402.16107 • Published • 37 -
Multi-LoRA Composition for Image Generation
Paper • 2402.16843 • Published • 29
-
Large-Scale Automatic Audiobook Creation
Paper • 2309.03926 • Published • 54 -
Agents: An Open-source Framework for Autonomous Language Agents
Paper • 2309.07870 • Published • 42 -
PDFTriage: Question Answering over Long, Structured Documents
Paper • 2309.08872 • Published • 53 -
StarCoder: may the source be with you!
Paper • 2305.06161 • Published • 30