📜Get repo access at Trelis.com/ADVANCED-fine-tuning
Tip: If you subscribe here on YouTube, click the bell to be notified of new vids
💡 Need Technical or Market Assistance?
Book a Consult Here: https://forms.gle/wJXVZXwioKMktjyVA
🤝 Are You a Top Developer?
Work for Trelis: https://trelis.com/jobs/
💸 Starting a New Project/Venture?
Apply for a Trelis Grant: https://trelis.com/trelis-ai-grants/
📧 Get Trelis AI Tutorials by Email
Subscribe on Substack: https://trelis.substack.com
Video Links:
- slides: https://docs.google.com/presentation/d/1VOtBNgmz1gutHQbtyDHC8Tfxtj1Ychpn_-1pDLkGGuk/edit?usp=sharing
TIMESTAMPS:
0:00 Advanced Data Preparation Techniques
0:33 Video Overview
1:52 Synthetic Dataset Generation Goals
3:48 Synthetic Data Generation Pipeline
5:34 Document Ingestion Approaches (e.g. pdf to markdown) - comparing markitdown marker and Gemini
13:44 Chunking Approaches and Trade-offs
22:45 Question-Answer Pair Generation Approaches
31:56 Q-A pair visualization with embeddings or tags AND how to choose a model for synthetic data generation
44:29 How to create an Evaluation Dataset? Best Practice.
54:41 Preview of the upcoming fine-tuning video