This lesson illustrates an end-to-end example of fine-tuning a model using Axolotl to understand a domain-specific query language. Guest speakers include Wing Lian, creator of Axolotl and Zach Mueller lead developer on HuggingFace Accelerate.
Notes, slides, and additional resources: https://parlance-labs.com/education/fine_tuning_course/workshop_2.html
This is lesson of 2 of 4 course on applied fine-tuning:
1. When & Why to Fine-Tune: https://youtu.be/cPn0nHFsvFg
2. Fine-Tuning w/Axolotl: https://youtu.be/mmsa4wDsiy0
3. Instrumenting & Evaluating LLMs: https://youtu.be/SnbGD677_u0
4. Deploying Fine-Tuned LLMs: https://youtu.be/GzEcyBykkdo
Chapter Summaries:
*0:00 Overview*
*0:51 Small vs. Larger LLMs*
*3:47 Model Family*
*5:45 LoRA vs. Fine-tuning*
*9:54 QLoRA*
*14:35 Improving Data vs. Hyperparameters*
*15:47 What is Axolotl*
*21:45 Axolotl Config Files Walkthrough*
*27:23 Finetuning with Axolotl via CLI*
*30:37 Alpaca Dataset Template and Debugging Tools*
*36:06 Gradio App Demo*
*37:14 Honeycomb Case Study*
*39:51 Honeycomb Prompt Notebook*
*43:10 Writing Level 1 Evaluations*
*46:14 Generating Synthetic Data*
*49:45 Data and Config Files for Fine-tuning*
*53:40 Viewing Data After Preprocessing*
*57:31 Training with Axolotl*
*1:00:24 Model Sanity Checks*
*1:02:44 Level 2 Evaluations*
*1:07:17 Curating Data*
*1:11:09 Debugging Axolotl*
*1:13:37 Predicting Fine-tuning Time*
*1:16:34 GPU Memory Usage for Fine-tuning*
*1:18:49 Distributed Training*
*1:20:13 Fully Sharded Data Parallelism (FSDP)*
*1:21:50 Sharding Strategies*
*1:23:37 How to Split the Model*
*1:24:44 Offloading Parameters*
*1:27:43 What is Accelerate*
*1:29:25 Distributing Training with Accelerate*
*1:31:18 Using Accelerate in Code*
*1:33:05 Mixed Precision*
*1:35:40 FSDP vs. Deepspeed*
*1:38:10 FSDP and Deepspeed on Axolotl*
*1:42:07 Training on Modal*
*1:46:21 Using Modal to Fine-tune LLM with Axolotl*
*1:51:55 Inspecting Data with Notebook*
*1:53:00 Q&A Session*
*1:53:33 Determining Adapter Rank and Alpha*
*1:56:25 Custom Evaluation Metrics*
*1:59:29 Features of Lower-Level Libraries*
*2:02:14 4-Bit vs. Higher Precision*
*2:07:54 Making Models Deterministic*