MENU

Fun & Interesting

What's Our Reward Function?

Video Not Working? Fix It Now

Dr. Jeff Beck discusses how our brains might process information and learn, contrasting it with how current AI, especially language models, operate. He suggests that while AI can seem intelligent by recognizing patterns (like identifying a famous test without actually solving it), it may lack genuine understanding because it doesn't have real-world, physical experiences like humans do. A major topic is the challenge of figuring out someone's true goals (their "reward function") separate from their understanding of the world (their "beliefs"), just by observing what they do. Beck argues this is fundamentally difficult, possibly impossible, which creates big problems for ensuring AI systems behave in ways we want (AI alignment). He emphasises that simply predicting the next word or action isn't the same as true intelligence or creativity, and highlights the importance of building systems that can genuinely create new things, much like how humans use engineering principles. SPONSOR MESSAGES: *** Tufa AI Labs is a brand new research lab in Zurich started by Benjamin Crouzier focussed on o-series style reasoning and AGI. They are hiring a Chief Engineer and ML engineers. Events in Zurich. Goto https://tufalabs.ai/ *** TRANSCRIPT: https://www.dropbox.com/scl/fi/vq91fbspln8favi7iqjd0/JEFFBECK.pdf?rlkey=wdrhuhgxo6ljiwmw690lcjcx7&dl=0 Dr. Jeff Beck: https://www.noumenal.ai/ TOC: 1. Neural Computation Fundamentals [00:00:00] 1.1 Bayesian Brain and Neural Computation Foundations [00:05:10] 1.2 Theory of Mind and LLM Capabilities [00:09:08] 1.3 Systems Engineering and Object Decomposition [00:15:19] 1.4 Language and Shared Mental Models 2. Cognitive Processing and Language Models [00:18:50] 2.1 Information Processing Bottleneck in Human Cognition [00:22:54] 2.2 Language Models and Understanding Limitations [00:25:25] 2.3 Scientific Abstraction and Markov Blankets [00:31:01] 2.4 Mathematical Reality and Scientific Realism 3. AI Systems and Value Alignment [00:32:15] 3.1 Neuroscience Research Methodology and Statistical Regularities [00:34:17] 3.2 Bayesian Modeling and Free Energy Principle [00:40:08] 3.3 AI Reward Functions and Human Value Alignment [00:45:00] 3.4 Belief Formation Systems and AI Implementation Extended version on patreon: https://www.patreon.com/posts/jeff-beck-125455115 REFS: [00:00:15] Bayesian inference in neural computation, Wei Ji Ma, Jeffrey M Beck, Peter E Latham, Alexandre Pouget https://www.nature.com/articles/nn1790 [00:01:50] Noumenal Labs AI research initiative, Jeff Beck https://arxiv.org/html/2502.13161v1 [00:05:25] LLM performance on theory of mind tasks, Michal Kosinski https://arxiv.org/abs/2302.02083 [00:16:25] Bayesian Mechanics: A Physics of and by Beliefs, Maxwell J. D. Ramstead, Dalton A. R. Sakthivadivel, Conor Heins, Magnus Koudahl, Beren Millidge, Lancelot Da Costa, Brennan Klein, Karl J. Friston https://arxiv.org/abs/2205.11543 [00:17:10] Building Human-like Communicative Intelligence: A Grounded Perspective, Marina Dubova https://arxiv.org/abs/2201.02734 [00:20:45] Why do we live at 10 bits/s?, Markus Meister, Jieyu Zheng https://www.sciencedirect.com/science/article/abs/pii/S0896627324008080 [00:22:25] Steven Piantadosi, Tim Scarfe https://colala.berkeley.edu/people/piantadosi/ [00:23:35] Mad Libs, Leonard Stern and Roger Price https://en.wikipedia.org/wiki/Mad_Libs [00:25:25] The Brain Abstracted, Mazviita Chirimuuta https://mitpress.mit.edu/9780262548045/the-brain-abstracted/ [00:28:30] Markov blankets in biological systems, Karl Friston https://royalsocietypublishing.org/doi/10.1098/rsif.2017.0792 [00:31:05] Mathematical Platonism, Øystein Linnebo https://plato.stanford.edu/entries/platonism-mathematics/ [00:32:15] Critique of Gabor patches use, Thomas Tsao https://pmc.ncbi.nlm.nih.gov/articles/PMC9564096/ [00:35:55] MaxEnt and MaxCal principles in statistical physics, Steve Pressé et al. https://journals.aps.org/rmp/abstract/10.1103/RevModPhys.85.1115 [00:40:10] Reward is enough, David Silver, Satinder Singh, Doina Precup, Richard S. Sutton https://www.sciencedirect.com/science/article/pii/S0004370221000862 [00:45:40] Modeling Human Beliefs about AI Behavior for Scalable Oversight, Leon Lang and Patrick Forré https://www.arxiv.org/pdf/2502.21262

Comment