-
ByT5: Towards a token-free future with pre-trained byte-to-byte models
Paper • 2105.13626 • Published • 3 -
Beyond Language Models: Byte Models are Digital World Simulators
Paper • 2402.19155 • Published • 49 -
MEGABYTE: Predicting Million-byte Sequences with Multiscale Transformers
Paper • 2305.07185 • Published • 9 -
Byte-Level Recursive Convolutional Auto-Encoder for Text
Paper • 1802.01817 • Published
Collections
Discover the best community collections!
Collections including paper arxiv:2402.19155
-
A Survey on Data Selection for Language Models
Paper • 2402.16827 • Published • 4 -
Instruction Tuning with Human Curriculum
Paper • 2310.09518 • Published • 3 -
Fine-Tuning or Retrieval? Comparing Knowledge Injection in LLMs
Paper • 2312.05934 • Published • 1 -
Language Models as Agent Models
Paper • 2212.01681 • Published
-
Beyond Language Models: Byte Models are Digital World Simulators
Paper • 2402.19155 • Published • 49 -
Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models
Paper • 2402.19427 • Published • 52 -
VisionLLaMA: A Unified LLaMA Interface for Vision Tasks
Paper • 2403.00522 • Published • 44 -
Resonance RoPE: Improving Context Length Generalization of Large Language Models
Paper • 2403.00071 • Published • 22