Scaling Test Time Compute: How o3-Style Reasoning Works (+ Open Source Implementation)

Adam Lucek 6,495 4 months ago

Video Not Working? Fix It Now

Is scaling test time compute the path to AGI? Resources: HF Blog - https://huggingface.co/spaces/HuggingFaceH4/blogpost-scaling-test-time-compute Search & Learn - https://github.com/huggingface/search-and-learn/tree/main/recipes SCoRE - https://arxiv.org/pdf/2409.12917 Scaling Laws - https://arxiv.org/pdf/2001.08361 Scaling Test Time Time Compute Optimally - https://arxiv.org/pdf/2408.03314 o1 Blog - https://openai.com/index/learning-to-reason-with-llms/ o3 Blog - https://arcprize.org/blog/oai-o3-pub-breakthrough Great resource that’s compiled a lot of o1 papers, documents, videos, and more: https://github.com/hijkzzz/Awesome-LLM-Strawberry Chapters: 00:00 - Introduction 01:17 - Scaling Pre Training Background 06:05 - The Idea Behind Scaling Test Time Compute 08:26 - Training Reasoning Models 13:23 - Open Source: Search & Verification Background 16:30 - Open Source: Verification Reward Models 18:39 - Open Source: Best-of-N 20:25 - Open Source: Beam Search 22:48 - Open Source: Diverse Verifier Tree Search 23:50 - Optimally Scaling Test Time Compute 25:42 - Running Test Time Compute Experiments 26:47 - Results: Llama 3.2 1B Instruct 28:35 - Results: Llama 3.2 1B ORPO 40k 31:54 - Discussion #ai #machinelearning #programming

Comment