Damai Dai's picture

2 1

Damai Dai

DeepSeekDDM

·

AI & ML interests

None yet

Recent Activity

updated a model 5 days ago

deepseek-ai/DeepSeek-V3

updated a model 5 days ago

deepseek-ai/DeepSeek-V3-Base

View all activity

Organizations

DeepSeekDDM's activity

updated 2 models 5 days ago

deepseek-ai/DeepSeek-V3

Updated 5 days ago • 54.8k • 1.07k

deepseek-ai/DeepSeek-V3-Base

Updated 5 days ago • 7.53k • 1.12k

upvoted a paper 4 months ago

Auxiliary-Loss-Free Load Balancing Strategy for Mixture-of-Experts

Paper • 2408.15664 • Published Aug 28, 2024 • 11

authored a paper 4 months ago

Auxiliary-Loss-Free Load Balancing Strategy for Mixture-of-Experts

Paper • 2408.15664 • Published Aug 28, 2024 • 11

authored 9 papers 7 months ago

DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligence

Paper • 2406.11931 • Published Jun 17, 2024 • 58

Calibrating Factual Knowledge in Pretrained Language Models

Paper • 2210.03329 • Published Oct 7, 2022 • 1

A Survey on In-context Learning

Paper • 2301.00234 • Published Dec 31, 2022 • 2

Why Can GPT Learn In-Context? Language Models Implicitly Perform Gradient Descent as Meta-Optimizers

Paper • 2212.10559 • Published Dec 20, 2022

Label Words are Anchors: An Information Flow Perspective for Understanding In-Context Learning

Paper • 2305.14160 • Published May 23, 2023 • 1

Math-Shepherd: Verify and Reinforce LLMs Step-by-step without Human Annotations

Paper • 2312.08935 • Published Dec 14, 2023 • 4

On the Representation Collapse of Sparse Mixture of Experts

Paper • 2204.09179 • Published Apr 20, 2022 • 1

StableMoE: Stable Routing Strategy for Mixture of Experts

Paper • 2204.08396 • Published Apr 18, 2022 • 1

DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model

Paper • 2405.04434 • Published May 7, 2024 • 14

New activity in deepseek-ai/deepseek-moe-16b-base 10 months ago

A little question about aux_loss

#4 opened 10 months ago by

authored a paper 12 months ago

DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models

Paper • 2401.06066 • Published Jan 11, 2024 • 44

updated a model 12 months ago

deepseek-ai/deepseek-moe-16b-base

Text Generation • Updated Jan 12, 2024 • 10.9k • 84

authored a paper 12 months ago

DeepSeek LLM: Scaling Open-Source Language Models with Longtermism

Paper • 2401.02954 • Published Jan 5, 2024 • 41