Speaker:Today, as part of our weekly paper reading, we are going to cover the topic " DeepSeek-R1: A Deep Dive into Next-Gen LLM Reasoning"
- Reasoning in LLMs: Explored the evolution of reasoning techniques in AI models, from instruction-based prompting to autonomous system-2 thinking.
- DeepSeek’s Model Innovations: Introduced DeepSeek’s approach to efficient AI training, including cold-start fine-tuning and reasoning-oriented reinforcement learning.
- Mixture of Experts (MoE): Discussed MoE’s evolution, DeepSeek’s shared experts approach, and future scalability to thousands of experts.
- Gradient Reward Propagation Optimization (GRPO): Explained DeepSeek’s reinforcement learning method that enhances reasoning without requiring human feedback.
- Fine-Tuning with LoRA: Covered fine-tuning techniques like LoRA, allowing cost-effective model adaptation with minimal resources.
- DeepSeek R1 Model: Detailed DeepSeek R1’s 70B MoE architecture, energy efficiency, and elimination of RLHF through Generalized Policy Optimization (GPO).
- Training Methodology: Highlighted DeepSeek’s 14.8T token dataset, fine-tuning with 150K CoT examples, and self-improving training methods.
- Evaluation & Reasoning Scoring: Analyzed reasoning chains, reward functions, and DeepSeek-Math’s potential for evaluating mathematical reasoning.
- Cost & Performance Comparison: Compared DeepSeek’s efficiency, cost-effectiveness ($2 per million tokens vs. OpenAI’s $60), and benchmark performance.
- Future Sessions: The next session will cover DeepSeek’s internal structure, advanced fine-tuning strategies, and inference optimizations.