Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2404.08634

Model Training - Learning Scheme

Instruction Pre-Training: Language Models are Supervised Multitask Learners

Paper • 2406.14491 • Published Jun 20, 2024 • 86
Pre-training Small Base LMs with Fewer Tokens

Paper • 2404.08634 • Published Apr 12, 2024 • 34
Stacking Your Transformers: A Closer Look at Model Growth for Efficient LLM Pre-Training

Paper • 2405.15319 • Published May 24, 2024 • 25
Can LLMs Learn by Teaching? A Preliminary Study

Paper • 2406.14629 • Published Jun 20, 2024 • 19

Rho-1: Not All Tokens Are What You Need

Paper • 2404.07965 • Published Apr 11, 2024 • 88
VASA-1: Lifelike Audio-Driven Talking Faces Generated in Real Time

Paper • 2404.10667 • Published Apr 16, 2024 • 18
Instruction-tuned Language Models are Better Knowledge Learners

Paper • 2402.12847 • Published Feb 20, 2024 • 25
DoRA: Weight-Decomposed Low-Rank Adaptation

Paper • 2402.09353 • Published Feb 14, 2024 • 26

Toward Self-Improvement of LLMs via Imagination, Searching, and Criticizing

Paper • 2404.12253 • Published Apr 18, 2024 • 54
MeshLRM: Large Reconstruction Model for High-Quality Mesh

Paper • 2404.12385 • Published Apr 18, 2024 • 26
Pre-training Small Base LMs with Fewer Tokens

Paper • 2404.08634 • Published Apr 12, 2024 • 34

Pre-training Small Base LMs with Fewer Tokens

Paper • 2404.08634 • Published Apr 12, 2024 • 34
Ziya2: Data-centric Learning is All LLMs Need

Paper • 2311.03301 • Published Nov 6, 2023 • 16
How to Train Data-Efficient LLMs

Paper • 2402.09668 • Published Feb 15, 2024 • 40
MiniCPM: Unveiling the Potential of Small Language Models with Scalable Training Strategies

Paper • 2404.06395 • Published Apr 9, 2024 • 21

Pre-training Small Base LMs with Fewer Tokens

Paper • 2404.08634 • Published Apr 12, 2024 • 34
ODA: Observation-Driven Agent for integrating LLMs and Knowledge Graphs

Paper • 2404.07677 • Published Apr 11, 2024 • 1

Pre-training Small Base LMs with Fewer Tokens

Paper • 2404.08634 • Published Apr 12, 2024 • 34

LLM2Vec: Large Language Models Are Secretly Powerful Text Encoders

Paper • 2404.05961 • Published Apr 9, 2024 • 64
Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention

Paper • 2404.07143 • Published Apr 10, 2024 • 104
Scaling (Down) CLIP: A Comprehensive Analysis of Data, Architecture, and Training Strategies

Paper • 2404.08197 • Published Apr 12, 2024 • 27
Pre-training Small Base LMs with Fewer Tokens

Paper • 2404.08634 • Published Apr 12, 2024 • 34

Beyond Language Models: Byte Models are Digital World Simulators

Paper • 2402.19155 • Published Feb 29, 2024 • 49
Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models

Paper • 2402.19427 • Published Feb 29, 2024 • 52
VisionLLaMA: A Unified LLaMA Interface for Vision Tasks

Paper • 2403.00522 • Published Mar 1, 2024 • 44
Resonance RoPE: Improving Context Length Generalization of Large Language Models

Paper • 2403.00071 • Published Feb 29, 2024 • 22

daily_paper_coll

Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models

Paper • 2402.19427 • Published Feb 29, 2024 • 52
Beyond Language Models: Byte Models are Digital World Simulators

Paper • 2402.19155 • Published Feb 29, 2024 • 49
StarCoder 2 and The Stack v2: The Next Generation

Paper • 2402.19173 • Published Feb 29, 2024 • 136
Simple linear attention language models balance the recall-throughput tradeoff

Paper • 2402.18668 • Published Feb 28, 2024 • 18

Data-efficient LLMs

dataset pruning for advancing the capabilities of LLMs

Effective pruning of web-scale datasets based on complexity of concept clusters

Paper • 2401.04578 • Published Jan 9, 2024
How to Train Data-Efficient LLMs

Paper • 2402.09668 • Published Feb 15, 2024 • 40
A Survey on Data Selection for LLM Instruction Tuning

Paper • 2402.05123 • Published Feb 4, 2024 • 3
LESS: Selecting Influential Data for Targeted Instruction Tuning

Paper • 2402.04333 • Published Feb 6, 2024 • 3

Previous
1
2
Next

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs