Fine-Tuning Text Embeddings For Domain-specific Search (w/ Python)

Shaw Talebi 7,848 3 months ago

Video Not Working? Fix It Now

Get exclusive access to AI resources and project ideas: https://the-data-entrepreneurs.kit.com/shaw In this video, I walk through how to fine-tune a text embedding model for domain adaptation using the Sentence Transfomers Python library. Resources: 📰 Blog: https://shawhin.medium.com/fine-tuning-text-embeddings-f913b882b11c?source=friends_link&sk=41468a7c4b3c40d7edb714489889e028 💻 GitHub Repo: https://github.com/ShawhinT/YouTube-Blog/tree/main/LLMs/fine-tuning-embeddings 🤗 Model: https://huggingface.co/shawhin/distilroberta-ai-job-embeddings 💿 Dataset: https://huggingface.co/datasets/shawhin/ai-job-embedding-finetuning References: [1] https://youtu.be/Ylz779Op9Pw [2] https://youtu.be/sNa_uiqSlJo [3] https://youtu.be/4QHg8Ix8WWQ [4] https://sbert.net/docs/sentence_transformer/training_overview.html [5] https://sbert.net/docs/sentence_transformer/training_overview.html#best-base-embedding-models [6] https://sbert.net/docs/sentence_transformer/pretrained_models.html#semantic-search-models [7] https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativesrankingloss -- Homepage: https://www.shawhintalebi.com Intro - 0:00 RAG - 0:48 Problem with Vector Search - 2:25 Fine-tuning - 3:49 Why fine-tune? - 4:43 5 Steps for Fine-tuning Embeddings - 6:23 Example: Fine-tuning Embeddings on AI Jobs - 6:55 Step 1: Gather Positive (and Negative) Pairs - 7:53 Step 2: Pick a Pre-trained Model - 12:50 Step 3: Pick a Loss Function - 14:18 Step 4: Fine-tune the Model - 15:57 Step 5: Evaluate the Model - 18:00 What's Next? - 19:13

Comment