MENU

Fun & Interesting

Test Time Compute, Part 2: Verifiers

Trelis Research 1,957 7 months ago
Video Not Working? Fix It Now

➡️ Lifetime access to ADVANCED-inference Repo (incl. future additions): https://trelis.com/ADVANCED-inference/ ➡️ Runpod Affiliate Link: https://runpod.io?ref=jmfkcdio ➡️ One-click GPU templates: https://github.com/TrelisResearch/one-click-llms ➡️ Thumbnail made with this tutorial: https://youtu.be/ThKYjTdkyP8 ➡️ Test Time Compute, Part 1: Sampling and Chain of Thought - https://youtu.be/qJoy8U27NPo OTHER TRELIS LINKS: ➡️ Trelis Newsletter: https://blog.Trelis.com ➡️ Trelis Resources and Support: https://Trelis.com/About VIDEO LINKS: - Slides: https://docs.google.com/presentation/d/1k6Hqn1_HYZ0CVFj2S4B2kvggw6WZajP5eHaWhO77tBE/edit?usp=sharing - GSM8K: https://huggingface.co/datasets/openai/gsm8k - Chain of Thought Paper: https://arxiv.org/abs/2201.11903 - Let’s Verify Step by Step: https://arxiv.org/abs/2305.20050 - Large Language Monkeys: https://arxiv.org/abs/2407.21787 - Are more LLM calls all you need?: https://arxiv.org/abs/2403.02419 TIMESTAMPS: 0:00 Sampling and Verification 0:36 Training Compute vs Test Time Compute 1:50 Part 1 Recap: Sampling and Chain of Thought 3:30 Video Overview: Parallel Sampling and Filtering with Verifiers 4:33 How to sample multiple answers in parallel 7:08 Verifier Methods 11:05 Improving verifiers with fine-tuning or prompt optimisation 12:50 Output verifiers versus process verifiers 15:00 Majority Voting and Monte Carlo (MCTS) 16:40 Notebook Setup - Trelis.com/advanced-inference 18:55 Installation of vLLM (with guided decoding) 22:20 Loading Llama 3.2 1B (as opposed to 3B in Part 1) 23:58 Baseline Single-shot approach 28:48 Parallel sampling approach (Pass@n / perfect verifier) 32:00 Parallel sampling with a voting verifier (using vLLM guided decoding) 39:22 Prompt optimisation for verifiers 43:25 Parallel sampling with a scoring verifier (1-10) 46:45 Parallel sampling with a binary true/false scoring verifier 48:23 Llama 3.2 1B Results 51:29 Literature Review (Let’s Verify Step by Step, Large Language Monkeys, Are more LLM calls all you need? Tree of Thought) 59:30 Resources

Comment