Train Your Own Reasoning Model (DeepSeek Clone) Fast & With Only 7Gb Of VRAM

Machine Learning With Hamza 9,885 2 months ago

Video Not Working? Fix It Now

Hello everyone, I hope you're doing well! In this video, I show you how to fine-tune LLMs locally for the task of reasoning, using the reinforcement learning algorithm called GRPO. You can perform the fine tuning with a GPU of at least 7Gb of VRAM using the Unsloth fast fine-tuning python library. Used material links: Github Repo: https://github.com/Hmzbo/Fine-tune-LLMS-with-grp Hugging face post: https://huggingface.co/learn/cookbook/en/fine_tuning_llm_grpo_trl Unsloth notebooks: https://docs.unsloth.ai/get-started/unsloth-notebooks Let's connect: LinkedIn: https://bit.ly/3roXgQ2 GitHub: https://bit.ly/3CrfRRP Kaggle: https://bit.ly/3C1mqZD Twitter: https://bit.ly/3UR06e3 -------------------------------------------------------------- ♪ Song: Memories Artist: Owl Nest Music by: CreatorMix.com Video: https://youtu.be/mBVBmnNM-Cc -------------------------------------------------------------- If you have any question, suggestion, or remark. Feel free to leave it in a comment below! Until next time, stay safe! #mlwh 00:00 Intro 01:02 Explaining GRPO 08:03 Environment Setup guidelines 10:20 Data , Model & Reward functions 17:57 Training 21:24 Training results 23:47 Testing

Comment