Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2402.04248

BlockFusion: Expandable 3D Scene Generation using Latent Tri-plane Extrapolation

Paper • 2401.17053 • Published Jan 30, 2024 • 31
Can Mamba Learn How to Learn? A Comparative Study on In-Context Learning Tasks

Paper • 2402.04248 • Published Feb 6, 2024 • 30
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models

Paper • 2402.03300 • Published Feb 5, 2024 • 76
WebLINX: Real-World Website Navigation with Multi-Turn Dialogue

Paper • 2402.05930 • Published Feb 8, 2024 • 38

Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models

Paper • 2402.19427 • Published Feb 29, 2024 • 52
Simple linear attention language models balance the recall-throughput tradeoff

Paper • 2402.18668 • Published Feb 28, 2024 • 18
ChunkAttention: Efficient Self-Attention with Prefix-Aware KV Cache and Two-Phase Partition

Paper • 2402.15220 • Published Feb 23, 2024 • 19
Linear Transformers are Versatile In-Context Learners

Paper • 2402.14180 • Published Feb 21, 2024 • 6

Self-Rewarding Language Models

Paper • 2401.10020 • Published Jan 18, 2024 • 145
ReFT: Reasoning with Reinforced Fine-Tuning

Paper • 2401.08967 • Published Jan 17, 2024 • 29
Tuning Language Models by Proxy

Paper • 2401.08565 • Published Jan 16, 2024 • 21
TrustLLM: Trustworthiness in Large Language Models

Paper • 2401.05561 • Published Jan 10, 2024 • 66

new architecture

Blending Is All You Need: Cheaper, Better Alternative to Trillion-Parameters LLM

Paper • 2401.02994 • Published Jan 4, 2024 • 49
MambaByte: Token-free Selective State Space Model

Paper • 2401.13660 • Published Jan 24, 2024 • 53
Repeat After Me: Transformers are Better than State Space Models at Copying

Paper • 2402.01032 • Published Feb 1, 2024 • 22
BlackMamba: Mixture of Experts for State-Space Models

Paper • 2402.01771 • Published Feb 1, 2024 • 23

A Picture is Worth a Thousand Words: Principled Recaptioning Improves Image Generation

Paper • 2310.16656 • Published Oct 25, 2023 • 40
Unsupervised Universal Image Segmentation

Paper • 2312.17243 • Published Dec 28, 2023 • 19
Self-Discover: Large Language Models Self-Compose Reasoning Structures

Paper • 2402.03620 • Published Feb 6, 2024 • 114
Can Mamba Learn How to Learn? A Comparative Study on In-Context Learning Tasks

Paper • 2402.04248 • Published Feb 6, 2024 • 30

Mamba: Linear-Time Sequence Modeling with Selective State Spaces

Paper • 2312.00752 • Published Dec 1, 2023 • 138
SparQ Attention: Bandwidth-Efficient LLM Inference

Paper • 2312.04985 • Published Dec 8, 2023 • 38
Lightning Attention-2: A Free Lunch for Handling Unlimited Sequence Lengths in Large Language Models

Paper • 2401.04658 • Published Jan 9, 2024 • 25
E^2-LLM: Efficient and Extreme Length Extension of Large Language Models

Paper • 2401.06951 • Published Jan 13, 2024 • 25

Previous
1
2
3
Next

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs