Grants Awarded by the Cooperative AI Foundation

We aim to provide an up-to-date summary of the Cooperative AI Foundation's (CAIF) grants, though please note that some recently approved grants or recent outputs from projects may be missing. Grants are listed in chronological order, from earliest to latest. This page was last updated on 1 Sep 2025.

‍

Make sure to also explore our recent partnerships with the PIBBS Fellowship and MATS program.

‍

‍Foundations of Cooperative AI Lab (FOCAL)

Vincent Conitzer

USD 500,000

2021-2025

Carnegie Mellon University

This grant supports the establishment of the new research lab FOCAL, which aims to lay the foundations of decision and game theory that is relevant for increasing the ability of advanced machine agents to cooperate. The research at FOCAL builds a fundamental understanding of how we can avoid catastrophic cooperation failure between AI systems. Alongside its research activities, the lab also contributes outreach activities such as workshops, online seminar series, and visitor programs.

‍

Selected outputs:

‍

Machine Learning in Multi-Agent Settings

Jakob Foerster and Christian Schroeder de Witt

GBP 166,370

2021-2023

University of Oxford

‍

This grant helps support the establishment of the Foerster Lab for AI Research (FLAIR) at the University of Oxford, which is focused broadly on the issue of machine learning in multi-agent settings. Specifically, the grant enabled the addition of an initial postdoctoral researcher to the group – Christian Schroeder de Witt – helping the lab to scale faster. FLAIR’s work concentrates on settings in which agents have to take into account, and possibly even influence, the learning of others so as to cooperate more effectively. This includes both AI agents, but also humans, whose diverse strategies and norms can be challenging for AI systems to conform with. Additional emphasis is paid to real-world applications, and the scaling of these ideas by combining multi-agent learning with agent-based models.

‍

Selected outputs:

‍

‍FLAIR website

‍

A Cooperative AI Contest with Melting Pot

Dylan Hadfield-Menell

USD 134,175

2023-2024

Massachusetts Institute of Technology

‍

This grant supported a Cooperative AI contest that was run as part of NeurIPS 2023, with the aim to develop a benchmark to assess cooperative intelligence in multi-agent learning, and specifically how well agents can adapt their cooperative skills to interact with novel partners in unforeseen situations. The contest was based on a pre-existing evaluation suite for multi-agent reinforcement learning called Melting Pot, but with new content created specifically for the contest. These mixed-motive scenarios tested capabilities such as coordination, bargaining and enforcement/commitment, which are all important for successful cooperation. The contest received 672 submissions from 117 teams, competing over a $10,000 prize pool. The announcement of the winners, summary of the contest and top submissions, as well as a panel of top cooperative AI researchers was hosted in person at NeurIPS 2023.

‍

Selected outputs:

‍

Scaling Opponent Shaping

Akbir Khan

GBP 10,000

2023

University College London

‍

Opponent shaping can be used to avoid collectively bad outcomes in mixed-motive games by making decisions that guide the opponents’ learning towards better outcomes. This project evaluated existing methods' performance with new learners and over new environments, and expanded them to more complex games. In related but independent work, scenarios from the multi-agent reinforcement learning evaluation suite Melting Pot were reimplemented in a less computationally expensive version, making them more accessible for use as a benchmark by the wider research community.

‍

Selected outputs:

‍

Integrating Intention Into Cooperative AI

Joseph Halpern

USD 123,682

2023-2025

Cornell University

‍

Understanding the intentions of other agents is important for successful cooperation. This project aims to develop a useful definition of intent, including collective intent, which would be a prerequisite for cooperation between agents to achieve a joint objective. Such a shared intent would have to build on beliefs about the other agent's intentions and future actions. The project will also explore how to design mechanisms for agents to signal intentions, and for agents to be able to reward or punish each other for the reliability of their signals.

‍

Selected outputs:

‍

Conceptualizing Collective Cooperative Intelligence

Wolfram Barfuss

EUR 172,000

2023-2025

University of Bonn

‍

This project aims to identify when and how agents learn to cooperate spontaneously without the algorithm designer’s explicit intent. To achieve this, a complex systems perspective on reinforcement learning will be applied to large-scale public good games. Such games have been used to describe the dynamics for real-world cooperation challenges such as climate change mitigation and other social dilemmas in which individual incentives do not align with the collective interest. This project, in particular, focuses on how the collective of agents affects the cooperativeness of the individual and vice versa.

‍

Selected outputs:

‍

Collective Cooperative Intelligence (forthcoming in PNAS)

‍

Designing Robust Cooperative AI Systems

Matthias Gerstgrasser and David Parkes

USD 140,000

2024

Harvard University

‍

This project aims to develop methods to promote cooperation among agents. The focus lies on Stackelberg equilibria, in which one agent (a “leader”) commits to a strategy, and wants this to promote cooperation amongst others. The leader could be the designer of the game or else an agent who acts directly in the environment. A new methodology for solving the resulting learning problem will be developed and evaluated, including applications on fostering cooperation in economic environments. The aim is to advance the state of the art in theory and algorithms for learning Stackelberg equilibria in multi-agent reinforcement learning, and their application to solving mixed-motive cooperation problems.

‍

Selected outputs:

‍

Policy Aggregation

Ariel Procaccia

USD 233,264

2024-2025

Harvard University

‍

This project explores value alignment of AI systems with a group of individuals rather than with a single individual. The aim is to design policy aggregation methods whose output policy is beneficial with respect to the entire group of stakeholders. The preferences of the stakeholders are learned by observation of behaviour (using a technique called inverse reinforcement learning). Two different approaches to aggregation are studied – voting and Nash welfare – both of which avoid key difficulties with the interpersonal comparison of preference strength. In the voting approach the aggregation arises from a ranking of alternative actions for each stakeholder, while the Nash welfare approach uses the product of stakeholder utilities. The aggregation algorithms will be evaluated both from the perspective of computational feasibility and from subjective assessments of the behaviour that the aggregated policy generates.

‍

Selected outputs:

‍

ACES: Action Explanation through Counterfactual Simulation

Tobias Gerstenberg and Dorsa Sadigh

USD 500,000

2024-2025

Stanford University

‍

This project will develop human-interpretable computational models of cooperation and competition, exploring scenarios in which agents help or hinder each other and asking human participants to evaluate what happened. The researchers will study increasingly capable agents and explore their interactions in simulated environments. The key hypothesis is that human judgments of helping and hindering are sensitive to what causal role an agent played, and what its actions reveal about its intentions. This is an interdisciplinary project involving both psychology and computer science. It builds on previous work that has employed counterfactual simulation models for capturing causal judgments in the physical domain as well as on Bayesian models of theory of mind for social inference.

‍

Selected outputs:

‍

Cooperation and Negotiation in Large Language Models

Natasha Jacques and Sergey Levine

USD 450,000

2023-2025

University of Washington and Berkeley

‍

The recent wave of rapid progress of large language models (LLMs) has demonstrated that they can be incredibly powerful. This project aims to investigate the cooperative capabilities and tendencies of such models. A more thorough understanding of these capabilities could make it possible to defend against models that are capable of deception or coercion, and develop better algorithms for achieving cooperation in conversational settings.A benchmark environment will be developed focused on studying cooperative capabilities of LLMs in conversational settings with humans, where core capabilities related to cooperation in language (negotiation, deception, modelling other agents, and moral reasoning) could be measured and evaluated.

‍

Selected outputs:

‍

Quick and Safe Adaptation to New Teams

Eugene Vinitsky

USD 150,974

2024 - 2025

New York University

‍

This project explores how to enhance an AI agent’s ability to learn the norms, conventions, and preferences of other agents in order to rapidly adapt and cooperate more effectively. It proposes creating a population of diverse and capable agent strategies that an agent can learn through a limited amount of interaction (known as a k-shot setting). To encourage rapid adaptation, the learning agent will be constrained to prioritise strategies that are easier to learn and coordinate with, guided by their description length. The approach will be evaluated in the game of Welfare Diplomacy, focusing on the agent's ability to form stable, high-welfare coalitions with unknown partners and its robustness to exploitative strategies.

‍

Measuring Cooperation Among Competing AI Algorithms

Mithun Chakraborty and Michael P. Wellman

USD 347,424

2024-2026

University of Michigan

‍

This project aims to rigorize, standardise and expedite the task of evaluating cooperativeness of new and emerging AI agents. The focus is on mixed-motive domains that may involve both humans and AI agents, and the work will cover both MARL and LLM-based agents. Metrics will be focused on outcomes, so that it is not just to what extent agents regard the welfare of others that is valued, but also how creative and competent they are in promoting it.

‍

Emergent Norms for Sustainable Cooperation

Max Kleiman-Weiner

‍

USD 293,574

2024-2026

University of Washington

‍

The aim of this project is to study what, when, and how cooperative norms emerge from self-interested AI agents, with a focus on the role of communication in developing and sustaining cooperative norms. This involves developing cooperative benchmark environments (e.g., Governance of the Commons Simulation [GovSim]) for LLMs and other language-compatible reinforcement learning agents inspired by game-theoretic work on public good games and common pool resource problems.

‍

Other-Regarding Goals

Mia Taylor

GBP 74,344

2025-2026

Center on Long-Term Risk

‍

This project addresses the risk of AI agents acquiring unintended goals, with a focus on other-regarding goals (such as spite) which take into account the preferences of other agents. It aims to investigate whether greater representation of behaviors consistent with a particular goal in the training data make it more likely that a model acquires that goal during subsequent reinforcement learning. The purpose is to understand how to develop training schemes that select for cooperative dispositions.

‍

AI Coercive Capabilities: Concepts and Measurements

Sophia Hatz

639,830 SEK

2025-2026

Uppsala University

‍

This project addresses the dual-use capabilities underlying coercion in AI systems. Strong coercive capabilities could lead to large-scale societal harms through misuse. Conversely, some of the capabilities enabling coercion are also essential for fostering cooperation, such as increasing the credibility of commitments. With these challenges in mind, this project aims to develop practical ways to measure these capabilities and model the risks associated with different levels of coercive capabilities.

‍

This is the first early-career track grant awarded by the Cooperative AI Foundation. Sophia Hatz is an Associate Professor at the Department of Peace and Conflict Research (Uppsala University). She leads the Working Group on International AI Governance, within the Alva Myrdal Center for Nuclear Disarmament.

‍

AI for Humanitarian Crisis Negotiation and Beyond

Finale Doshi Velez

USD 213,707

2024-2027

Harvard University

‍

This project addresses the challenge of supporting human decision-makers in complex, multi-party negotiations for societal benefit, particularly in humanitarian crises. While these scenarios could be studied using traditional coalition building games (CBGs) focused on optimal coalition structures, this project recognises the limitations of such approaches, especially the lack of focus on iterative formation and the prioritisation of humanitarian goals across multiple negotiation rounds. To address this, the project will build upon a CBG framework, using MARL to develop coalition formation strategies for multiple goals and LLMs to synthesise and extract key information from unstructured negotiation case files. The project will then test this AI-assisted negotiation method with both lay users in synthetic scenarios as well as with teams of real frontline negotiators.

‍

Governing the risks that the interaction between AI agents may present to international peace and security

SIPRI

373,000 SEK

2025

‍

The Stockholm International Peace Research Institute (SIPRI) is conducting a scoping study focused on the risks that the interaction between AI agents may present to international peace and security. The aim of the study is to raise awareness on the topic in diplomatic circles dedicated to international security, and to inform the design of a potential followup project on how cooperation challenges related to agentic AI ought to be governed at the multilateral level.

September 1, 2025

Cecilia Elena Tilli

Associate Director (Research and Grants)

Rebecca Eddington

Grants Awarded by the Cooperative AI Foundation

‍Foundations of Cooperative AI Lab (FOCAL)

Vincent Conitzer

Machine Learning in Multi-Agent Settings

Jakob Foerster and Christian Schroeder de Witt

A Cooperative AI Contest with Melting Pot

Dylan Hadfield-Menell

Scaling Opponent Shaping

Akbir Khan

Integrating Intention Into Cooperative AI

Joseph Halpern

Conceptualizing Collective Cooperative Intelligence

Wolfram Barfuss

Designing Robust Cooperative AI Systems

Matthias Gerstgrasser and David Parkes

Policy Aggregation

Ariel Procaccia

ACES: Action Explanation through Counterfactual Simulation

Tobias Gerstenberg and Dorsa Sadigh

Cooperation and Negotiation in Large Language Models

Natasha Jacques and Sergey Levine

Quick and Safe Adaptation to New Teams

Eugene Vinitsky

Measuring Cooperation Among Competing AI Algorithms

Mithun Chakraborty and Michael P. Wellman

Emergent Norms for Sustainable Cooperation

Max Kleiman-Weiner

Other-Regarding Goals

Mia Taylor

AI Coercive Capabilities: Concepts and Measurements

Sophia Hatz

AI for Humanitarian Crisis Negotiation and Beyond

Finale Doshi Velez

Governing the risks that the interaction between AI agents may present to international peace and security

SIPRI

Associate Director (Research and Grants)

Grants and Events Officer

Related stories