Fellows' Spotlight

Advanced AI systems may have many novel ways of credibly committing to and enforcing agreements in strategic interactions. But even when players can make arbitrary joint commitments, it may be unclear whether a particular agreement actually constitutes a Pareto improvement. Players might not know or might not agree on how the game would be played by default, and strategic posturing might prevent them from revealing their true beliefs (cf. equilibrium selection).
In this talk, Nathaniel will discuss his work on Safe Pareto Improvements (SPIs), an approach which circumvents these obstacles. Rather than agreeing on particular outcomes, we seek commitments which leave the game strategically equivalent ('isomorphic') to the original, but with payoffs that constitute a Pareto improvement. Nathaniel will give geometric characterisations for when and how such SPIs exist, describe ongoing work on affordances which enable them in larger classes of games, and conclude by discussing some open questions for future SPI research.
Speakers

Nathaniel Sauerberg (University of Texas at Austin)

Discussants
Time

17:00–18:00 UTC 26 March 2026

Links
This seminar has now finished
Register Here

Nathaniel is a computer science PhD student at the University of Texas at Austin, where he's advised by Sriram Vishwananth. He's interested in game-theoretic approaches to cooperative AI and AI safety. His research is concerned with ways that strategic interactions involving advanced AI systems could differ from traditional game theory (among humans and companies), and when and how these differences can be leveraged to ensure the interactions go well. He was previously a summer research fellow at the Center on Longterm Risk (CLR), a visiting scholar at the Foundations of Cooperative AI Lab (FOCAL) at Carnegie Mellon University, and a scholar in the ML Alignment and Theory Scholars (MATS) programme.

Safe Pareto Improvements: Cooperative Commitments without Compromise