Collections

8

StableSSM: Alleviating the Curse of Memory in State-space Models through Stable Reparameterization

Paper • 2311.14495 • Published Nov 24, 2023 • 1
Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model

Paper • 2401.09417 • Published Jan 17, 2024 • 59
SegMamba: Long-range Sequential Modeling Mamba For 3D Medical Image Segmentation

Paper • 2401.13560 • Published Jan 24, 2024 • 1
Graph-Mamba: Towards Long-Range Graph Sequence Modeling with Selective State Spaces

Paper • 2402.00789 • Published Feb 1, 2024 • 2

4

1

-

StableSSM: Alleviating the Curse of Memory in State-space Models through Stable Reparameterization

Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model

SegMamba: Long-range Sequential Modeling Mamba For 3D Medical Image Segmentation

Graph-Mamba: Towards Long-Range Graph Sequence Modeling with Selective State Spaces

Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model

VMamba: Visual State Space Model

SegMamba: Long-range Sequential Modeling Mamba For 3D Medical Image Segmentation

Graph-Mamba: Towards Long-Range Graph Sequence Modeling with Selective State Spaces

The Impact of Depth and Width on Transformer Language Model Generalization

Retentive Network: A Successor to Transformer for Large Language Models

RWKV: Reinventing RNNs for the Transformer Era

Attention Is All You Need

QMoE: Practical Sub-1-Bit Compression of Trillion-Parameter Models

Pre-gated MoE: An Algorithm-System Co-Design for Fast and Scalable Mixture-of-Expert Inference

Towards MoE Deployment: Mitigating Inefficiencies in Mixture-of-Expert (MoE) Inference

EvoMoE: An Evolutional Mixture-of-Experts Training Framework via Dense-To-Sparse Gate

StableSSM: Alleviating the Curse of Memory in State-space Models through Stable Reparameterization

Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model

SegMamba: Long-range Sequential Modeling Mamba For 3D Medical Image Segmentation

Graph-Mamba: Towards Long-Range Graph Sequence Modeling with Selective State Spaces

BlackMamba: Mixture of Experts for State-Space Models

OpenMoE: An Early Effort on Open Mixture-of-Experts Language Models

MoE-LLaVA: Mixture of Experts for Large Vision-Language Models

DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models

Simple linear attention language models balance the recall-throughput tradeoff

Linear Transformers with Learnable Kernel Functions are Better In-Context Models

Repeat After Me: Transformers are Better than State Space Models at Copying

Zoology: Measuring and Improving Recall in Efficient Language Models

BlackMamba: Mixture of Experts for State-Space Models

OpenMoE: An Early Effort on Open Mixture-of-Experts Language Models

DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models

LongAlign: A Recipe for Long Context Alignment of Large Language Models

Efficient Tool Use with Chain-of-Abstraction Reasoning

Scavenging Hyena: Distilling Transformers into Long Convolution Models

Rethinking Interpretability in the Era of Large Language Models

DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models

GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding

OpenMoE: An Early Effort on Open Mixture-of-Experts Language Models

BlackMamba: Mixture of Experts for State-Space Models