Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2412.09871

Beyond A*: Better Planning with Transformers via Search Dynamics Bootstrapping

Paper • 2402.14083 • Published Feb 21, 2024 • 47
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits

Paper • 2402.17764 • Published Feb 27, 2024 • 607
Genie: Generative Interactive Environments

Paper • 2402.15391 • Published Feb 23, 2024 • 70
Humanoid Locomotion as Next Token Prediction

Paper • 2402.19469 • Published Feb 29, 2024 • 27

Papers - Attention

Linear Transformers with Learnable Kernel Functions are Better In-Context Models

Paper • 2402.10644 • Published Feb 16, 2024 • 80
GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints

Paper • 2305.13245 • Published May 22, 2023 • 5
ChunkAttention: Efficient Self-Attention with Prefix-Aware KV Cache and Two-Phase Partition

Paper • 2402.15220 • Published Feb 23, 2024 • 19
Sequence Parallelism: Long Sequence Training from System Perspective

Paper • 2105.13120 • Published May 26, 2021 • 5

EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters

Paper • 2402.04252 • Published Feb 6, 2024 • 26
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models

Paper • 2402.03749 • Published Feb 6, 2024 • 13
ScreenAI: A Vision-Language Model for UI and Infographics Understanding

Paper • 2402.04615 • Published Feb 7, 2024 • 41
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss

Paper • 2402.05008 • Published Feb 7, 2024 • 22

new architecture

Blending Is All You Need: Cheaper, Better Alternative to Trillion-Parameters LLM

Paper • 2401.02994 • Published Jan 4, 2024 • 49
MambaByte: Token-free Selective State Space Model

Paper • 2401.13660 • Published Jan 24, 2024 • 53
Repeat After Me: Transformers are Better than State Space Models at Copying

Paper • 2402.01032 • Published Feb 1, 2024 • 23
BlackMamba: Mixture of Experts for State-Space Models

Paper • 2402.01771 • Published Feb 1, 2024 • 24

interesting stuff

Chain-of-Verification Reduces Hallucination in Large Language Models

Paper • 2309.11495 • Published Sep 20, 2023 • 37
Adapting Large Language Models via Reading Comprehension

Paper • 2309.09530 • Published Sep 18, 2023 • 77
CulturaX: A Cleaned, Enormous, and Multilingual Dataset for Large Language Models in 167 Languages

Paper • 2309.09400 • Published Sep 17, 2023 • 84
Language Modeling Is Compression

Paper • 2309.10668 • Published Sep 19, 2023 • 83

Previous
1
...
7
8
9
Next

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs