Tempo14
's Collections
new architecture
updated
Blending Is All You Need: Cheaper, Better Alternative to
Trillion-Parameters LLM
Paper
•
2401.02994
•
Published
•
49
MambaByte: Token-free Selective State Space Model
Paper
•
2401.13660
•
Published
•
52
Repeat After Me: Transformers are Better than State Space Models at
Copying
Paper
•
2402.01032
•
Published
•
22
BlackMamba: Mixture of Experts for State-Space Models
Paper
•
2402.01771
•
Published
•
23
Can Mamba Learn How to Learn? A Comparative Study on In-Context Learning
Tasks
Paper
•
2402.04248
•
Published
•
30
KAN: Kolmogorov-Arnold Networks
Paper
•
2404.19756
•
Published
•
108
Zamba: A Compact 7B SSM Hybrid Model
Paper
•
2405.16712
•
Published
•
22
Transformers are SSMs: Generalized Models and Efficient Algorithms
Through Structured State Space Duality
Paper
•
2405.21060
•
Published
•
63
Block Transformer: Global-to-Local Language Modeling for Fast Inference
Paper
•
2406.02657
•
Published
•
37
Breaking the Attention Bottleneck
Paper
•
2406.10906
•
Published
•
4
Learning to (Learn at Test Time): RNNs with Expressive Hidden States
Paper
•
2407.04620
•
Published
•
27
Jamba-1.5: Hybrid Transformer-Mamba Models at Scale
Paper
•
2408.12570
•
Published
•
30
A Comprehensive Survey of Mamba Architectures for Medical Image
Analysis: Classification, Segmentation, Restoration and Beyond
Paper
•
2410.02362
•
Published
•
17
Paper
•
2410.05258
•
Published
•
168
GPT or BERT: why not both?
Paper
•
2410.24159
•
Published
•
14
Relaxed Recursive Transformers: Effective Parameter Sharing with
Layer-wise LoRA
Paper
•
2410.20672
•
Published
•
6
SambaMixer: State of Health Prediction of Li-ion Batteries using Mamba
State Space Models
Paper
•
2411.00233
•
Published
•
7
Hymba: A Hybrid-head Architecture for Small Language Models
Paper
•
2411.13676
•
Published
•
39
Gated Delta Networks: Improving Mamba2 with Delta Rule
Paper
•
2412.06464
•
Published
•
9
Byte Latent Transformer: Patches Scale Better Than Tokens
Paper
•
2412.09871
•
Published
•
80