DO REASONING MODELS ACTUALLY SEARCH?

Machine Learning Street Talk 35,759 lượt xem 2 months ago

Video Not Working? Fix It Now

Join Prof. Subbarao Kambhampati and host Tim Scarfe for a deep dive into OpenAI's O1 model and the future of AI reasoning systems.

* How O1 likely uses reinforcement learning similar to AlphaGo, with hidden reasoning tokens that users pay for but never see
* The evolution from traditional Large Language Models to more sophisticated reasoning systems
* The concept of "fractal intelligence" in AI - where models work brilliantly sometimes but fail unpredictably
* Why O1's improved performance comes with substantial computational costs
* The ongoing debate between single-model approaches (OpenAI) vs hybrid systems (Google)
* The critical distinction between AI as an intelligence amplifier vs autonomous decision-maker

SPONSOR MESSAGES:
*
CentML offers competitive pricing for GenAI model deployment, with flexible options to suit a wide range of models, from small to large-scale deployments.
https://centml.ai/pricing/

Tufa AI Labs is a brand new research lab in Zurich started by Benjamin Crouzier focussed on o-series style reasoning and AGI. Are you interested in working on reasoning, or getting involved in their events?

Goto https://tufalabs.ai/
*

TOC:
1. O1 Architecture and Reasoning Foundations
[00:00:00] 1.1 Fractal Intelligence and Reasoning Model Limitations
[00:04:28] 1.2 LLM Evolution: From Simple Prompting to Advanced Reasoning
[00:14:28] 1.3 O1's Architecture and AlphaGo-like Reasoning Approach
[00:23:18] 1.4 Empirical Evaluation of O1's Planning Capabilities

2. Monte Carlo Methods and Model Deep-Dive
[00:29:30] 2.1 Monte Carlo Methods and MARCO-O1 Implementation
[00:31:30] 2.2 Reasoning vs. Retrieval in LLM Systems
[00:40:40] 2.3 Fractal Intelligence Capabilities and Limitations
[00:45:59] 2.4 Mechanistic Interpretability of Model Behavior
[00:51:41] 2.5 O1 Response Patterns and Performance Analysis

3. System Design and Real-World Applications
[00:59:30] 3.1 Evolution from LLMs to Language Reasoning Models
[01:06:48] 3.2 Cost-Efficiency Analysis: LLMs vs O1
[01:11:28] 3.3 Autonomous vs Human-in-the-Loop Systems
[01:16:01] 3.4 Program Generation and Fine-Tuning Approaches
[01:26:08] 3.5 Hybrid Architecture Implementation Strategies

Transcript: https://www.dropbox.com/scl/fi/d0ef4ovnfxi0lknirkvft/Subbarao.pdf?rlkey=l3rp29gs4hkut7he8u04mm1df&dl=0

REFS:
[00:02:00] Monty Python (1975)
Witch trial scene: flawed logical reasoning.
https://www.youtube.com/watch?v=zrzMhU_4m-g

[00:04:00] Cade Metz (2024)
Microsoft–OpenAI partnership evolution and control dynamics.
https://www.nytimes.com/2024/10/17/technology/microsoft-openai-partnership-deal.html

[00:07:25] Kojima et al. (2022)
Zero-shot chain-of-thought prompting ('Let's think step by step').
https://arxiv.org/pdf/2205.11916

[00:08:20] Subbarao / Stechly, K. et al. (2024)
Chain of Thoughtlessness? An Analysis of CoT in Planning (examines CoT prompts in classical planning tasks).
https://arxiv.org/abs/2405.04776

[00:12:50] DeepMind Research Team (2023)
Multi-bot game solving with external and internal planning.
https://deepmind.google/research/publications/139455/

[00:15:10] Silver et al. (2016)
AlphaGo's Monte Carlo Tree Search and Q-learning.
https://www.nature.com/articles/nature16961

[00:16:30] Kambhampati, S. et al. (2023)
Evaluates O1's planning in "Strawberry Fields" benchmarks.
https://arxiv.org/pdf/2410.02162

[00:29:30] Alibaba AIDC-AI Team (2023)
MARCO-O1: Chain-of-Thought + MCTS for improved reasoning.
https://arxiv.org/html/2411.14405

[00:31:30] Kambhampati, S. (2024)
Explores LLM "reasoning vs retrieval" debate.
https://arxiv.org/html/2403.04121v2

[00:37:35] Wei, J. et al. (2022)
Chain-of-thought prompting (introduces last-letter concatenation).
https://arxiv.org/pdf/2201.11903

[00:42:35] Barbero, F. et al. (2024)
Transformer attention and "information over-squashing."
https://arxiv.org/html/2406.04267v2

[00:46:05] Ruis, L. et al. (2023)
Influence functions to understand procedural knowledge in LLMs.
https://arxiv.org/html/2411.12580v1

[00:50:00] OpenAI (2023)
O1's reasoning capabilities vs persistent autoregressive tendencies.
https://arxiv.org/html/2410.01792v2

[00:56:35] The Surgeon Riddle (2014)
Gender bias puzzle testing model reasoning.
https://www.bu.edu/articles/2014/bu-research-riddle-reveals-the-depth-of-gender-bias/

[01:14:00] Chollet, F. (2023)
ARC challenge for general intelligence.
https://arcprize.org/arc

[01:16:15] Greenblatt. (2024)
50% SoTA on ARC-AGI using GPT-4o and Python sampling.
https://redwoodresearch.substack.com/p/getting-50-sota-on-arc-agi-with-gpt

[01:16:55] Wen-Ding Li, Kevin Ellis et al. (2023)
Combining induction and transduction for ARC reasoning.
https://arxiv.org/abs/2411.02272

PROGRAMS WITH COMMON SENSE McCarthy (1959)
https://www-formal.stanford.edu/jmc/mcc59.pdf (holy grail of AI/advice)

[01:22:45] Kierkegaard, S. (1843)
"Life understood backwards, lived forwards" philosophical quote.
https://plato.stanford.edu/entries/kierkegaard/

Comment