Hello everyone, I hope you're doing well!
In this video, I show you how to fine-tune LLMs locally for the task of reasoning, using the reinforcement learning algorithm called GRPO. You can perform the fine tuning with a GPU of at least 7Gb of VRAM using the Unsloth fast fine-tuning python library.
Used material links:
Github Repo: https://github.com/Hmzbo/Fine-tune-LLMS-with-grp
Hugging face post: https://huggingface.co/learn/cookbook/en/fine_tuning_llm_grpo_trl
Unsloth notebooks: https://docs.unsloth.ai/get-started/unsloth-notebooks
Let's connect:
LinkedIn: https://bit.ly/3roXgQ2
GitHub: https://bit.ly/3CrfRRP
Kaggle: https://bit.ly/3C1mqZD
Twitter: https://bit.ly/3UR06e3
--------------------------------------------------------------
♪ Song: Memories
Artist: Owl Nest
Music by: CreatorMix.com
Video: https://youtu.be/mBVBmnNM-Cc
--------------------------------------------------------------
If you have any question, suggestion, or remark. Feel free to leave it in a comment below!
Until next time, stay safe!
#mlwh
00:00 Intro
01:02 Explaining GRPO
08:03 Environment Setup guidelines
10:20 Data , Model & Reward functions
17:57 Training
21:24 Training results
23:47 Testing