Timestamps:
00:00 - Intro
01:12 - How It Works
08:27 - Performance Monitoring
10:27 - Setup Steps
20:55 - Running R1
22:34 - First Output
23:50 - Live Output
24:57 - Comparing Test Results
26:55 - Testing Output
28:50 - Closing Thoughts
Can you really run the full 671B parameter DeepSeek R1 model locally? In this video, we take on the challenge of running this massive model offline and on local hardware—all thanks to Unsloth AI's dynamic quantization technique, which compresses the model by up to 80% for more efficient execution.
We start by explaining how the dynamic quantization process works and how it allows us to run DeepSeek R1 on an enthusiast-grade system with at least 80GB of combined system memory. Next, we monitor performance, analyzing how the model utilizes system RAM and VRAM. Then, we walk through the full setup process using llama.cpp, covering key configurations that can be tricky for first-time users.
Once everything is ready, we put the model to the test—running it live, comparing outputs, and analyzing performance to see how well DeepSeek R1 performs locally.