DeepSeek R1 Incentivizing Reasoning Capability in LLMs via Reinforcement Learning (paper explained)
DeepSeek R1 is the latest model from DeepSeek. It is the first work to show that directly training with Reinforcement Learning is sufficient. We don't need the Supervised Fine-Tuning(SFT) step typically followed while training LLMs.
In this video, we read the paper and understand the model architecture, training approach, and the results.
RELATED LINKS
DeepSeep R1 release - https://api-docs.deepseek.com/news/news250120
Try DeepSeek - https://chat.deepseek.com
DeepSeek API docs - https://api-docs.deepseek.com
ArXiv paper - https://arxiv.org/pdf/2501.12948
DeepSeekMath - https://arxiv.org/pdf/2402.03300
⌚️ ⌚️ ⌚️ TIMESTAMPS ⌚️ ⌚️ ⌚️
0:00 - Intro
2:38 - Training LLMs
5:05 - DeepSeek R1 Zero Training
5:54 - Group Relative Policy Optimization
8:45 - Reward Modelling
10:21 - Training Performance
11:33 - Self-evolution
13:3 - DeepSeek R1
17:20 - Results
AI BITES KEY LINKS
Website: https://www.ai-bites.net
YouTube: https://www.youtube.com/@AIBites
Twitter: https://twitter.com/ai_bites
Patreon: https://www.patreon.com/ai_bites
Github: https://github.com/ai-bites