➡️ Lifetime access to ADVANCED-inference Repo (incl. future additions): https://trelis.com/ADVANCED-inference/
➡️ Runpod Affiliate Link: https://runpod.io?ref=jmfkcdio
➡️ One-click GPU templates: https://github.com/TrelisResearch/one-click-llms
➡️ Thumbnail made with this tutorial: https://youtu.be/ThKYjTdkyP8
➡️ Test Time Compute, Part 1: Sampling and Chain of Thought - https://youtu.be/qJoy8U27NPo
OTHER TRELIS LINKS:
➡️ Trelis Newsletter: https://blog.Trelis.com
➡️ Trelis Resources and Support: https://Trelis.com/About
VIDEO LINKS:
- Slides: https://docs.google.com/presentation/d/1k6Hqn1_HYZ0CVFj2S4B2kvggw6WZajP5eHaWhO77tBE/edit?usp=sharing
- GSM8K: https://huggingface.co/datasets/openai/gsm8k
- Chain of Thought Paper: https://arxiv.org/abs/2201.11903
- Let’s Verify Step by Step: https://arxiv.org/abs/2305.20050
- Large Language Monkeys: https://arxiv.org/abs/2407.21787
- Are more LLM calls all you need?: https://arxiv.org/abs/2403.02419
TIMESTAMPS:
0:00 Sampling and Verification
0:36 Training Compute vs Test Time Compute
1:50 Part 1 Recap: Sampling and Chain of Thought
3:30 Video Overview: Parallel Sampling and Filtering with Verifiers
4:33 How to sample multiple answers in parallel
7:08 Verifier Methods
11:05 Improving verifiers with fine-tuning or prompt optimisation
12:50 Output verifiers versus process verifiers
15:00 Majority Voting and Monte Carlo (MCTS)
16:40 Notebook Setup - Trelis.com/advanced-inference
18:55 Installation of vLLM (with guided decoding)
22:20 Loading Llama 3.2 1B (as opposed to 3B in Part 1)
23:58 Baseline Single-shot approach
28:48 Parallel sampling approach (Pass@n / perfect verifier)
32:00 Parallel sampling with a voting verifier (using vLLM guided decoding)
39:22 Prompt optimisation for verifiers
43:25 Parallel sampling with a scoring verifier (1-10)
46:45 Parallel sampling with a binary true/false scoring verifier
48:23 Llama 3.2 1B Results
51:29 Literature Review (Let’s Verify Step by Step, Large Language Monkeys, Are more LLM calls all you need? Tree of Thought)
59:30 Resources