LLM Evals: Common Mistakes

Hamel Husain 471 1 day ago

Video Not Working? Fix It Now

Shreya Shankar and Hamel Husain discuss common mistakes people make when creating domain-specific evals. LLM Evals Course for Engineers (35% Discount): http://bit.ly/eval-discount 00:51 Foundation model benchmarks are not the same as your application evals 03:00 Generic Evals Are Useless 04:00 Do not outsource labeling & prompting to non domain experts 09:28 You should make your own data annotation app 12:40 Your LLM prompts should be specific and grounded in error analysis 15:25 Use binary labels 18:57 Look at your data 23:41 Be careful of overfitting to test data 25:40 Do online tests

Comment