MENU

Fun & Interesting

Test Time Compute, Part 1: Sampling and Chain of Thought

Trelis Research 4,864 lượt xem 7 months ago
Video Not Working? Fix It Now

➡️ Lifetime access to ADVANCED-inference Repo (incl. future additions): https://trelis.com/ADVANCED-inference/
➡️ Runpod Affiliate Link: https://runpod.io?ref=jmfkcdio
➡️ One-click GPU templates: https://github.com/TrelisResearch/one-click-llms
➡️ Thumbnail made with this tutorial: https://youtu.be/ThKYjTdkyP8
➡️ Test Time Compute, Part 2: Verifiers - https://youtu.be/MvaUcc0mNOU

OTHER TRELIS LINKS:
➡️ Trelis Newsletter: https://blog.Trelis.com
➡️ Trelis Resources and Support: https://Trelis.com/About

VIDEO LINKS:
- Slides: https://docs.google.com/presentation/d/1xGvkZ_9n9d_XSaPWyTgJ7PtxbtA58uJQVJBsFUDsl-c/edit?usp=sharing
- Hotpot QA:https://huggingface.co/datasets/hotpotqa/hotpot_qa
- GSM8K: https://huggingface.co/datasets/openai/gsm8k
- Chain of Thought Paper: https://arxiv.org/abs/2201.11903

TIMESTAMPS:
0:00 OpenAI o1 type techniques for scaling test time compute
1:52 Video Overview (temperature, chain of thought)
2:17 Training compute versus test time compute
6:28 Why spend more compute on test time / inference?
10:50 Using verifiers to select the best answers
12:00 Exploring and critiquing/verifying answers during inference
15:02 Understanding Temperature for sampling
19:41 Should you set temperature to zero?
22:08 Beam search
23:30 Problems with setting a non-zero temperature
24:31 Using top p, top k, min p, and best of
27:36 Recap on choosing temperature for sampling
28:20 How to implement chain of thought
29:40 Setup for notebook run-through on gsm8k and hotpot qa
31:20 Using sampling and chain of thought on hotpotqa and gsm8k
31:47 Running vllm in a Jupyter notebook (allows for batching)
36:15 Scoring / Grading with OpenAI gpt4o-mini using regex enforcement
39:39 Multi-threading the scoring / grading for speed
40:30 Running the dataset multiple times to get the mean and mean absolute deviation of correct answers
41:29 Controlling sampling parameters (min p, top p, top k, beam search, temperature)
43:46 Running temperature / sampling ablations WITHOUT chain of thought
46:48 Chain of Thought Setup
49:02 Running ablations WITH chain of thought
50:44 GSM8K Results Charts
52:09 Hotpot QA Results Charts
53:09 Recommendations on sampling, temperature and chain of thought
55:17 Video resources

Comment