DeepSeek R1 Incentivizing Reasoning Capability in LLMs via Reinforcement Learning (paper explained)

AI Bites 4,209 lượt xem 3 months ago

Video Not Working? Fix It Now

DeepSeek R1 Incentivizing Reasoning Capability in LLMs via Reinforcement Learning (paper explained)

DeepSeek R1 is the latest model from DeepSeek. It is the first work to show that directly training with Reinforcement Learning is sufficient. We don't need the Supervised Fine-Tuning(SFT) step typically followed while training LLMs.

In this video, we read the paper and understand the model architecture, training approach, and the results.

RELATED LINKS
DeepSeep R1 release - https://api-docs.deepseek.com/news/news250120
Try DeepSeek - https://chat.deepseek.com
DeepSeek API docs - https://api-docs.deepseek.com
ArXiv paper - https://arxiv.org/pdf/2501.12948
DeepSeekMath - https://arxiv.org/pdf/2402.03300

⌚️ ⌚️ ⌚️ TIMESTAMPS ⌚️ ⌚️ ⌚️
0:00 - Intro
2:38 - Training LLMs
5:05 - DeepSeek R1 Zero Training
5:54 - Group Relative Policy Optimization
8:45 - Reward Modelling
10:21 - Training Performance
11:33 - Self-evolution
13:3 - DeepSeek R1
17:20 - Results

AI BITES KEY LINKS
Website: https://www.ai-bites.net
YouTube: https://www.youtube.com/@AIBites
Twitter: https://twitter.com/ai_bites
Patreon: https://www.patreon.com/ai_bites
Github: https://github.com/ai-bites

artificial intelligence

AI

deep learning

machine learning

LLMs

generative AI

DeepSeek

DeepSeek R1

Reinforcement Learning

Comment