Fine-Tuning a Pre-Trained LLM for AI Agent Tool Selection (Hugging Face Transformers Tutorial)

Programming with Devergo 204 2 weeks ago

Video Not Working? Fix It Now

This video shows a fully reproducible workflow for adapting a compact, open-weights language model so that it can decide which software tool to invoke in response to user requests. Aimed at ML engineers and applied researchers already familiar with the 🤗 Transformers ecosystem, the session delivers a concise, production-oriented example of single-task supervised fine-tuning (SFT). Full repo here: https://github.com/samugit83/TheGradientPath/tree/master/LLMFineTuning/SFT_HF_TOOL_CHOICE 🗺️ Tutorial Roadmap 🔄 Synthetic data generation • Build 10 000 (query, tool) pairs with a helper function—no manual labelling required. • Mark the tool slot with the control token [my_tool_selection] (no angle brackets needed). 🧹 Dataset preparation with datasets.Dataset • Assemble prompt / completion records and create deterministic train / validation splits. 📦 Loading the base model • Pull SmolLM2-135M (135 M parameters) straight from the Hugging Face Hub. 🔧 Tokenizer extension • Add the new control token and resize the model’s embedding matrix so it becomes learnable. ⚙️ Configuring & running TRL’s SFTTrainer • Review every key SFTConfig hyper-parameter: epochs, batch size, LR, warm-up, logging, evaluation cadence, checkpoints. • Monitor training and perform on-the-fly validation with greedy decoding. 📤 Model export & quick functional test • Save the fine-tuned weights and tokenizer. • Demonstrate the model selecting weather, calculator, and reminder tools on unseen prompts. 🎯 Key Take-Aways Schema-aware prompting – how a dedicated token turns a general LLM into a reliable tool router. Parameter-efficient training – SFT on a 4-bit-quantised model delivers strong results without large GPUs. Continuous evaluation – in-pipeline testing helps you catch over- or under-fitting before deployment. 🛠️ Prerequisites Python ≥ 3.10 transformers, datasets, trl, bitsandbytes, accelerate, torch (CUDA or Metal build) NVIDIA RTX 3060/3050, Apple M-series, or equivalent hardware #llm #huggingface #finetuning #aiengineering #transformers #python #ai #machinelearning #aiagents

Comment