DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
Abstract
We introduce our first-generation reasoning models, DeepSeek-R1-Zero and DeepSeek-R1. DeepSeek-R1-Zero, a model trained via large-scale reinforcement learning (RL) without supervised fine-tuning (SFT) as a preliminary step, demonstrates remarkable reasoning capabilities. Through RL, DeepSeek-R1-Zero naturally emerges with numerous powerful and intriguing reasoning behaviors. However, it encounters challenges such as poor readability, and language mixing. To address these issues and further enhance reasoning performance, we introduce DeepSeek-R1, which incorporates multi-stage training and cold-start data before RL. DeepSeek-R1 achieves performance comparable to OpenAI-o1-1217 on reasoning tasks. To support the research community, we open-source DeepSeek-R1-Zero, DeepSeek-R1, and six dense models (1.5B, 7B, 8B, 14B, 32B, 70B) distilled from DeepSeek-R1 based on Qwen and Llama.
Community
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Improving Multi-Step Reasoning Abilities of Large Language Models with Direct Advantage Policy Optimization (2024)
- Offline Reinforcement Learning for LLM Multi-Step Reasoning (2024)
- Reasoning Language Models: A Blueprint (2025)
- Learning to Generate Research Idea with Dynamic Control (2024)
- Search-o1: Agentic Search-Enhanced Large Reasoning Models (2025)
- Skill-Enhanced Reinforcement Learning Acceleration from Demonstrations (2024)
- Enhancing LLMs for Physics Problem-Solving using Reinforcement Learning with Human-AI Feedback (2024)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 13
Browse 13 models citing this paperDatasets citing this paper 0
No dataset linking this paper