This course is designed to help beginners learn how to train a language model from start to finish. Imad will guide you through the whole process, using Moroccan Darija as an example.
In this course, you will learn:
- How to load text data
- How to train a tokenizer from scratch using the Byte Pair Encoding (BPE) method
- How to use the tokenizer to encode text data
- How the Transformer architecture works in language models
- How to pre-train a model
- How to create a supervised fine-tuning dataset
- How to fine-tune the model and build an AI assistant that you can chat with
You can find the slides, notebook, and scripts in this GitHub repository:
https://github.com/ImadSaddik/Train_Your_Language_Model_Course
The supervised fine-tuning dataset is available here:
https://github.com/ImadSaddik/BoDmaghDataset
https://huggingface.co/datasets/ImadSaddik/BoDmaghDataset
The tokenizers trained on AtlaSet can be found here:
https://github.com/ImadSaddik/DarijaTokenizers
You can access the AtlaSet on HuggingFace here:
https://huggingface.co/datasets/atlasia/Atlaset
To connect with Imad Saddik, check out his social accounts:
- LinkedIn: https://www.linkedin.com/in/imadsaddik/
- YouTube: https://www.youtube.com/@3CodeCampers
- Discord: imad_saddik
❤️ Support for this channel comes from our friends at Scrimba – the coding platform that's reinvented interactive learning: https://scrimba.com/freecodecamp
⭐️ Course Contents ⭐️
(0:00:00) About the Course
(0:03:03) Introduction
(0:07:24) Training Data
(0:15:33) Tokenization
(0:29:00) The Transformer Architecture
(0:52:21) Pre-training
(1:24:46) Fine-tuning Dataset
(1:33:05) Instruction Fine-tuning
(2:06:17) Fine-tuning with LoRA
(2:20:39) Let's Scale Everything
(3:09:40) Bonus
(3:27:10) Conclusion
🎉 Thanks to our Champion and Sponsor supporters:
👾 Drake Milly
👾 Ulises Moralez
👾 Goddard Tan
👾 David MG
👾 Matthew Springman
👾 Claudio
👾 Oscar R.
👾 jedi-or-sith
👾 Nattira Maneerat
👾 Justin Hual
--
Learn to code for free and get a developer job: https://www.freecodecamp.org
Read hundreds of articles on programming: https://freecodecamp.org/news