MENU

Fun & Interesting

How to Build an LLM from Scratch | An Overview

Shaw Talebi 306,533 1 year ago
Video Not Working? Fix It Now

🗞️ Get exclusive access to AI resources and project ideas: https://the-data-entrepreneurs.kit.com/shaw 🧑‍🎓 Learn AI in 6 weeks by building it: https://maven.com/shaw-talebi/ai-builders-bootcamp -- This is the 6th video in a series on using large language models (LLMs) in practice. Here, I review key aspects of developing a foundation LLM based on the development of models such as GPT-3, Llama, Falcon, and beyond. More Resources: ▶️ Series Playlist: https://www.youtube.com/playlist?list=PLz-ep5RbHosU2hnz5ejezwaYpdMutMVB0📰 Read more: https://towardsdatascience.com/how-to-build-an-llm-from-scratch-8c477768f1f9?sk=18c351c5cae9ac89df682dd14736a9f3 [1] BloombergGPT: https://arxiv.org/pdf/2303.17564.pdf [2] Llama 2: https://ai.meta.com/research/publications/llama-2-open-foundation-and-fine-tuned-chat-models/ [3] LLM Energy Costs: https://www.statista.com/statistics/1384401/energy-use-when-training-llm-models/ [4] arXiv:2005.14165 [cs.CL] [5] Falcon 180b Blog: https://huggingface.co/blog/falcon-180b [6] arXiv:2101.00027 [cs.CL] [7] Alpaca Repo: https://github.com/gururise/AlpacaDataCleaned [8] arXiv:2303.18223 [cs.CL] [9] arXiv:2112.11446 [cs.CL] [10] arXiv:1508.07909 [cs.CL] [11] SentencePience: https://github.com/google/sentencepiece/tree/master [12] Tokenizers Doc: https://huggingface.co/docs/tokenizers/quicktour [13] arXiv:1706.03762 [cs.CL] [14] Andrej Karpathy Lecture: https://www.youtube.com/watch?v=kCc8FmEb1nY&t=5307s [15] Hugging Face NLP Course: https://huggingface.co/learn/nlp-course/chapter1/7?fw=pt [16] arXiv:1810.04805 [cs.CL] [17] arXiv:1910.13461 [cs.CL] [18] arXiv:1603.05027 [cs.CV] [19] arXiv:1607.06450 [stat.ML] [20] arXiv:1803.02155 [cs.CL] [21] arXiv:2203.15556 [cs.CL] [22] Trained with Mixed Precision Nvidia: https://docs.nvidia.com/deeplearning/performance/mixed-precision-training/index.html [23] DeepSpeed Doc: https://www.deepspeed.ai/training/ [24] https://paperswithcode.com/method/weight-decay [25] https://towardsdatascience.com/what-is-gradient-clipping-b8e815cdfb48 [26] arXiv:2001.08361 [cs.LG] [27] arXiv:1803.05457 [cs.AI] [28] arXiv:1905.07830 [cs.CL] [29] arXiv:2009.03300 [cs.CY] [30] arXiv:2109.07958 [cs.CL] [31] https://huggingface.co/blog/evaluating-mmlu-leaderboard [32] https://www.cs.toronto.edu/~hinton/absps/JMLRdropout.pdf -- Homepage: https://shawhintalebi.com/ Book a call: https://calendly.com/shawhintalebi Intro - 0:00 How much does it cost? - 1:30 4 Key Steps - 3:55 Step 1: Data Curation - 4:19 1.1: Data Sources - 5:31 1.2: Data Diversity - 7:45 1.3: Data Preparation - 9:06 Step 2: Model Architecture (Transformers) - 13:17 2.1: 3 Types of Transformers - 15:13 2.2: Other Design Choices - 18:27 2.3: How big do I make it? - 22:45 Step 3: Training at Scale - 24:20 3.1: Training Stability - 26:52 3.2: Hyperparameters - 28:06 Step 4: Evaluation - 29:14 4.1: Multiple-choice Tasks - 30:22 4.2: Open-ended Tasks - 32:59 What's next? - 34:31

Comment