MENU

Fun & Interesting

DeepSeek-R1: Let us understand it in depth

SupportVectors 610 3 months ago
Video Not Working? Fix It Now

Speaker:Speaker : Asif Qamar Technology Leader | AI/Data Scientist | Computer Scientist | Educator | Theoretical Particle Physicist LinkedIn: https://www.linkedin.com/in/asifqamar/ Paper Title: DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning Paper Link: https://arxiv.org/abs/2501.12948 Highlights from Today's Session: - Reinforcement Learning Without Supervised Fine-Tuning – DeepSeek-R1 and R10 skip traditional supervised fine-tuning, relying solely on reinforcement learning (RL) for reasoning improvements. - Mixture-of-Experts (MoE) for Efficiency – DeepSeek models use MoE architectures to activate only a subset of parameters, reducing computational costs while maintaining high performance. - Competitive Reasoning Performance – Despite using only 37B active parameters, DeepSeek-R1 rivals GPT-4 in reasoning tasks, outperforming it in math benchmarks like AIME and MATH. - Cost-Effective AI Development – Training DeepSeek-R1 cost around $6M, significantly less than GPT-4’s estimated $100M, demonstrating efficient model scaling strategies. - Group-Based Reward System to Prevent Reward Hacking – Instead of a separate reward model, DeepSeek compares multiple responses and ranks them, using external verifiers for correctness in math and code. - Majority Voting for Accuracy – DeepSeek improves reasoning by generating multiple answers and selecting the most consistent response, enhancing reliability. - Challenges in General-Purpose Queries – Early versions struggled with readability, mixed languages, and format consistency, leading to refinements in DeepSeek-R10. - Distillation into Open-Source Models – DeepSeek-R1’s knowledge is distilled into models like LLaMA and Qwen, making high-performance AI more accessible to the open-source community. - Shift in AI Focus (2024-2025) – The AI industry is moving from pre-training to inference and optimization techniques, with DeepSeek’s reinforcement-learning-first approach setting a new paradigm. ---- Join over 2000 professionals who have developed expertise in AI/ML Connect with us on LinkedIn and stay updated - https://www.linkedin.com/company/support-vectors/ Become part of SupportVectors to learn about in-depth technical abilities and further your career. https://supportvectors.ai

Comment