RichardForests
's Collections
Transformers & MoE
updated
SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention
Paper
•
2312.07987
•
Published
•
41
Interfacing Foundation Models' Embeddings
Paper
•
2312.07532
•
Published
•
10
Point Transformer V3: Simpler, Faster, Stronger
Paper
•
2312.10035
•
Published
•
17
TheBloke/quantum-v0.01-GPTQ
Text Generation
•
Updated
•
22
•
2
TheBloke/PiVoT-MoE-GPTQ
Text Generation
•
Updated
•
22
•
1
mobiuslabsgmbh/Mixtral-8x7B-Instruct-v0.1-hf-attn-4bit-moe-2bit-HQQ
Text Generation
•
Updated
•
16
•
38
Denoising Vision Transformers
Paper
•
2401.02957
•
Published
•
28
DeepSeekMoE: Towards Ultimate Expert Specialization in
Mixture-of-Experts Language Models
Paper
•
2401.06066
•
Published
•
44
Buffer Overflow in Mixture of Experts
Paper
•
2402.05526
•
Published
•
8
Beyond Scaling Laws: Understanding Transformer Performance with
Associative Memory
Paper
•
2405.08707
•
Published
•
27