π Access all top AIs for $10 on https://mammouth.ai/
Join My Newsletter for Regular AI Updates ππΌ
https://forwardfuture.ai
My Links π
ππ» Subscribe: https://www.youtube.com/@matthew_berman
ππ» Twitter: https://twitter.com/matthewberman
ππ» Discord: https://discord.gg/xxysSXBxFW
ππ» Patreon: https://patreon.com/MatthewBerman
ππ» Instagram: https://www.instagram.com/matthewberman_ai
ππ» Threads: https://www.threads.net/@matthewberman_ai
ππ» LinkedIn: https://www.linkedin.com/company/forward-future-ai
Media/Sponsorship Inquiries β
https://bit.ly/44TC45V
0:00 Intro: AI That Thinks BEFORE You Ask?
0:13 Introducing Sleep-Time Compute
0:59 The Problem with Standard Test-Time Compute (Cost & Latency)
2:58 Stateful LLM Applications (Code, Docs, Chat)
3:33 Sleep Time vs. Test Time (Diagram Explained)
4:51 Why Sleep-Time is More Cost-Effective
6:00 Defining Sleep-Time Compute
6:26 Sponsor: Mammoth (Generative AI Platform)
7:18 Paper Details: How They Tested Non-Reasoning Models
9:24 Benchmarking Sleep-Time (The Juggle Example)
10:05 Models Used (GPT-4o, Claude, DeepSeek, etc.)
10:25 Results: Non-Reasoning Models (Graphs)
12:18 Results: Reasoning Models (Graphs)
13:39 Sleep Time vs. Parallel Sampling (A Big Issue)
14:41 Scaling Sleep-Time Compute
15:45 Amortizing Cost Across Queries (Why it's Cheaper!)
16:48 Predictable Queries Benefit Most
18:04 Paper Summary & Future Directions
18:40 Outro & Newsletter