57 14 113

chansung park PRO

chansung

AI & ML interests

None yet

Recent Activity

reacted to their post with 👍 about 5 hours ago

Simple Summarization on DeepSeek-R1 from DeepSeek AI The RL stage is very important. ↳ However, it is difficult to create a truly helpful AI for people solely through RL. ↳ So, we applied a learning pipeline consisting of four stages: providing a good starting point, reasoning RL, SFT, and safety RL, and achieved performance comparable to o1. ↳ Simply fine-tuning other open models with the data generated by R1-Zero (distillation) resulted in performance comparable to o1-mini. Of course, this is just a brief overview and may not be of much help. All models are accessible on Hugging Face, and the paper can be read through the GitHub repository. Model: https://huggingface.co/deepseek-ai Paper: https://github.com/deepseek-ai/DeepSeek-R1

posted an update about 5 hours ago

upvoted an article 1 day ago

Introducing multi-backends (TRT-LLM, vLLM) support for Text Generation Inference

View all activity

Articles

Organizations

chansung's activity

reacted to their post with 👍 about 5 hours ago

Post

122

Simple Summarization on DeepSeek-R1 from DeepSeek AI

The RL stage is very important.
↳ However, it is difficult to create a truly helpful AI for people solely through RL.
↳ So, we applied a learning pipeline consisting of four stages: providing a good starting point, reasoning RL, SFT, and safety RL, and achieved performance comparable to o1.
↳ Simply fine-tuning other open models with the data generated by R1-Zero (distillation) resulted in performance comparable to o1-mini.

Of course, this is just a brief overview and may not be of much help. All models are accessible on Hugging Face, and the paper can be read through the GitHub repository.

Model: https://huggingface.co/deepseek-ai
Paper: https://github.com/deepseek-ai/DeepSeek-R1

1 reply

posted an update about 5 hours ago

Post

122

Simple Summarization on DeepSeek-R1 from DeepSeek AI

The RL stage is very important.
↳ However, it is difficult to create a truly helpful AI for people solely through RL.
↳ So, we applied a learning pipeline consisting of four stages: providing a good starting point, reasoning RL, SFT, and safety RL, and achieved performance comparable to o1.
↳ Simply fine-tuning other open models with the data generated by R1-Zero (distillation) resulted in performance comparable to o1-mini.

Of course, this is just a brief overview and may not be of much help. All models are accessible on Hugging Face, and the paper can be read through the GitHub repository.

Model: https://huggingface.co/deepseek-ai
Paper: https://github.com/deepseek-ai/DeepSeek-R1