DeepSeek-R1: A Deep Dive into Next-Gen LLM Reasoning

SupportVectors 749 lượt xem 2 months ago

Video Not Working? Fix It Now

Speaker:Today, as part of our weekly paper reading, we are going to cover the topic " DeepSeek-R1: A Deep Dive into Next-Gen LLM Reasoning"

- Reasoning in LLMs: Explored the evolution of reasoning techniques in AI models, from instruction-based prompting to autonomous system-2 thinking.

- DeepSeek’s Model Innovations: Introduced DeepSeek’s approach to efficient AI training, including cold-start fine-tuning and reasoning-oriented reinforcement learning.

- Mixture of Experts (MoE): Discussed MoE’s evolution, DeepSeek’s shared experts approach, and future scalability to thousands of experts.

- Gradient Reward Propagation Optimization (GRPO): Explained DeepSeek’s reinforcement learning method that enhances reasoning without requiring human feedback.

- Fine-Tuning with LoRA: Covered fine-tuning techniques like LoRA, allowing cost-effective model adaptation with minimal resources.

- DeepSeek R1 Model: Detailed DeepSeek R1’s 70B MoE architecture, energy efficiency, and elimination of RLHF through Generalized Policy Optimization (GPO).

- Training Methodology: Highlighted DeepSeek’s 14.8T token dataset, fine-tuning with 150K CoT examples, and self-improving training methods.

- Evaluation & Reasoning Scoring: Analyzed reasoning chains, reward functions, and DeepSeek-Math’s potential for evaluating mathematical reasoning.

- Cost & Performance Comparison: Compared DeepSeek’s efficiency, cost-effectiveness ($2 per million tokens vs. OpenAI’s $60), and benchmark performance.

- Future Sessions: The next session will cover DeepSeek’s internal structure, advanced fine-tuning strategies, and inference optimizations.

Comment