Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2403.05812

Unlocking the conversion of Web Screenshots into HTML Code with the WebSight Dataset

Paper • 2403.09029 • Published Mar 14, 2024 • 55
LLMLingua-2: Data Distillation for Efficient and Faithful Task-Agnostic Prompt Compression

Paper • 2403.12968 • Published Mar 19, 2024 • 25
RAFT: Adapting Language Model to Domain Specific RAG

Paper • 2403.10131 • Published Mar 15, 2024 • 69
Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking

Paper • 2403.09629 • Published Mar 14, 2024 • 76

Papers - Performance Trends in AI

Algorithmic progress in language models

Paper • 2403.05812 • Published Mar 9, 2024 • 18

ShortGPT: Layers in Large Language Models are More Redundant Than You Expect

Paper • 2403.03853 • Published Mar 6, 2024 • 62
SLEB: Streamlining LLMs through Redundancy Verification and Elimination of Transformer Blocks

Paper • 2402.09025 • Published Feb 14, 2024 • 6
Shortened LLaMA: A Simple Depth Pruning for Large Language Models

Paper • 2402.02834 • Published Feb 5, 2024 • 15
Algorithmic progress in language models

Paper • 2403.05812 • Published Mar 9, 2024 • 18

Papers - Training Research

Measuring the Effects of Data Parallelism on Neural Network Training

Paper • 1811.03600 • Published Nov 8, 2018 • 2
Adafactor: Adaptive Learning Rates with Sublinear Memory Cost

Paper • 1804.04235 • Published Apr 11, 2018 • 2
EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks

Paper • 1905.11946 • Published May 28, 2019 • 3
Yi: Open Foundation Models by 01.AI

Paper • 2403.04652 • Published Mar 7, 2024 • 62

daily_paper_coll

Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models

Paper • 2402.19427 • Published Feb 29, 2024 • 53
Beyond Language Models: Byte Models are Digital World Simulators

Paper • 2402.19155 • Published Feb 29, 2024 • 50
StarCoder 2 and The Stack v2: The Next Generation

Paper • 2402.19173 • Published Feb 29, 2024 • 137
Simple linear attention language models balance the recall-throughput tradeoff

Paper • 2402.18668 • Published Feb 28, 2024 • 19

MegaScale: Scaling Large Language Model Training to More Than 10,000 GPUs

Paper • 2402.15627 • Published Feb 23, 2024 • 35
Rainbow Teaming: Open-Ended Generation of Diverse Adversarial Prompts

Paper • 2402.16822 • Published Feb 26, 2024 • 16
FuseChat: Knowledge Fusion of Chat Models

Paper • 2402.16107 • Published Feb 25, 2024 • 37
Multi-LoRA Composition for Image Generation

Paper • 2402.16843 • Published Feb 26, 2024 • 29

Large-Scale Automatic Audiobook Creation

Paper • 2309.03926 • Published Sep 7, 2023 • 54
Agents: An Open-source Framework for Autonomous Language Agents

Paper • 2309.07870 • Published Sep 14, 2023 • 42
PDFTriage: Question Answering over Long, Structured Documents

Paper • 2309.08872 • Published Sep 16, 2023 • 53
StarCoder: may the source be with you!

Paper • 2305.06161 • Published May 9, 2023 • 30

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs