How do we ensure that humans can continue to oversee increasingly powerful AI systems?
Research on Amplified Oversight explores how #ai can enhance human ability to train, evaluate, and monitor #aimodels —even as these systems become more capable and surpass human performance in certain domains.
In this talk, Sophie Bridgers and Rishub Jain from @googledeepmind argue that a key goal of Amplified Oversight is achieving human-AI complementarity—leveraging the strengths of both AI and humans to create a stronger oversight signal than using AI or human raters alone. This is fundamentally a human-computer interaction (HCI) problem, and they share insights from HCI research that can inform better oversight strategies.
Sophie and Rishub also discuss two promising approaches for achieving complementarity:
- AI rating assistance (giving humans an AI that can help to evaluate model outputs);
- hybridization (combining AI ratings and human ratings).
The researchers find that combining these previously isolated approaches helps keep humans in the loop, even as AI surpasses human performance.
0:00 Introduction
0:39 The Problem: Amplified (Scalable) Oversight
2:45 Solution: Human-AI Complementarity
4:41 What Are Rater Assistance and Hybridization?
6:02 Lessons from HCI
10:19 Hybridization May Enable Impactful Assistance
11:39 Ongoing Research: Experiments
13:02 Results: Confidence-Based Hybridization
15:10 Rater Assistance on "Human Set"
16:40 More Prescriptive Assistance Leads to Over-Reliance
19:02 The Future of Human Oversight
22:15 Maintaining Downstream User Engagement
23:17 Looking Ahead
#humanaiinteraction #hci #aioversight #amplifiedoversight #deepmind #googledeepmind #airesearch #aigovernance #aiethics #aialignment #riseofai #uncontrollableai #humanvsmachine #artificialintelligence #machineconsciousness #llms #aireasoning #aimodel #machinelearning #reinforcementlearning #airesearch #techtalk #techtalks #aitalks #aitalk #science
Social Links:
Newsletter: https://buzzrobot.substack.com/
X: https://x.com/sopharicks
Slack: https://join.slack.com/t/buzzrobot/shared_invite/zt-2s067rv7n-guPIMGe62rbp9ncxdnOUfQ