MENU

Fun & Interesting

Train Your Own LLM – Tutorial

freeCodeCamp.org 84,108 2 weeks ago
Video Not Working? Fix It Now

This course is designed to help beginners learn how to train a language model from start to finish. Imad will guide you through the whole process, using Moroccan Darija as an example. In this course, you will learn: - How to load text data - How to train a tokenizer from scratch using the Byte Pair Encoding (BPE) method - How to use the tokenizer to encode text data - How the Transformer architecture works in language models - How to pre-train a model - How to create a supervised fine-tuning dataset - How to fine-tune the model and build an AI assistant that you can chat with You can find the slides, notebook, and scripts in this GitHub repository: https://github.com/ImadSaddik/Train_Your_Language_Model_Course The supervised fine-tuning dataset is available here: https://github.com/ImadSaddik/BoDmaghDataset https://huggingface.co/datasets/ImadSaddik/BoDmaghDataset The tokenizers trained on AtlaSet can be found here: https://github.com/ImadSaddik/DarijaTokenizers You can access the AtlaSet on HuggingFace here: https://huggingface.co/datasets/atlasia/Atlaset To connect with Imad Saddik, check out his social accounts: - LinkedIn: https://www.linkedin.com/in/imadsaddik/ - YouTube: https://www.youtube.com/@3CodeCampers - Discord: imad_saddik ❤️ Support for this channel comes from our friends at Scrimba – the coding platform that's reinvented interactive learning: https://scrimba.com/freecodecamp ⭐️ Course Contents ⭐️ (0:00:00) About the Course (0:03:03) Introduction (0:07:24) Training Data (0:15:33) Tokenization (0:29:00) The Transformer Architecture (0:52:21) Pre-training (1:24:46) Fine-tuning Dataset (1:33:05) Instruction Fine-tuning (2:06:17) Fine-tuning with LoRA (2:20:39) Let's Scale Everything (3:09:40) Bonus (3:27:10) Conclusion 🎉 Thanks to our Champion and Sponsor supporters: 👾 Drake Milly 👾 Ulises Moralez 👾 Goddard Tan 👾 David MG 👾 Matthew Springman 👾 Claudio 👾 Oscar R. 👾 jedi-or-sith 👾 Nattira Maneerat 👾 Justin Hual -- Learn to code for free and get a developer job: https://www.freecodecamp.org Read hundreds of articles on programming: https://freecodecamp.org/news

Comment