Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2110.03742

Papers - MoE - Multilingual

Beyond Distillation: Task-level Mixture-of-Experts for Efficient Inference

Paper • 2110.03742 • Published Sep 24, 2021 • 4

Papers - MoE - Router - Task

Beyond Distillation: Task-level Mixture-of-Experts for Efficient Inference

Paper • 2110.03742 • Published Sep 24, 2021 • 4

Papers - MoE - Router

Turn Waste into Worth: Rectifying Top-k Router of MoE

Paper • 2402.12399 • Published Feb 17, 2024 • 2
CompeteSMoE -- Effective Training of Sparse Mixture of Experts via Competition

Paper • 2402.02526 • Published Feb 4, 2024 • 3
Buffer Overflow in Mixture of Experts

Paper • 2402.05526 • Published Feb 8, 2024 • 8
OpenMoE: An Early Effort on Open Mixture-of-Experts Language Models

Paper • 2402.01739 • Published Jan 29, 2024 • 27

about 1 month ago

Non-asymptotic oracle inequalities for the Lasso in high-dimensional mixture of experts

Paper • 2009.10622 • Published Sep 22, 2020 • 1
MoE-LLaVA: Mixture of Experts for Large Vision-Language Models

Paper • 2401.15947 • Published Jan 29, 2024 • 50
MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts

Paper • 2401.04081 • Published Jan 8, 2024 • 70
MoE-Infinity: Activation-Aware Expert Offloading for Efficient MoE Serving

Paper • 2401.14361 • Published Jan 25, 2024 • 2

Foundation Models and Tools

FinTral: A Family of GPT-4 Level Multimodal Financial Large Language Models

Paper • 2402.10986 • Published Feb 16, 2024 • 78
bigcode/starcoder2-15b

Text Generation • Updated Jun 5, 2024 • 29.9k • • 578
Zephyr: Direct Distillation of LM Alignment

Paper • 2310.16944 • Published Oct 25, 2023 • 123
mixedbread-ai/mxbai-rerank-large-v1

Text Classification • Updated Jul 22, 2024 • 26.4k • 120

Foundation AI Papers

Curated List of Must-Reads on LLM reasoning at Temus AI team

Language Agent Tree Search Unifies Reasoning Acting and Planning in Language Models

Paper • 2310.04406 • Published Oct 6, 2023 • 8
Chain-of-Thought Reasoning Without Prompting

Paper • 2402.10200 • Published Feb 15, 2024 • 105
ICDPO: Effectively Borrowing Alignment Capability of Others via In-context Direct Preference Optimization

Paper • 2402.09320 • Published Feb 14, 2024 • 6
Self-Discover: Large Language Models Self-Compose Reasoning Structures

Paper • 2402.03620 • Published Feb 6, 2024 • 115

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs