Grants Awarded by the Cooperative AI Foundation

The Cooperative AI Foundation has provided a number of grants to support research on cooperative AI for the benefit of all. Summaries of these grants can be found below, and new grant applications are encouraged.

We aim to provide an up-to-date summary of the Cooperative AI Foundation's (CAIF) grants, though please note that some recently approved grants or recent outputs from projects may be missing. Grants are listed in chronological order, from earliest to latest. This page was last updated on 20 April 2026.

Make sure to also explore our recent partnerships with the PIBBS Fellowship and MATS program.

Foundations of Cooperative AI Lab (FOCAL)

Vincent Conitzer

USD 500,000

2021-2025

Carnegie Mellon University

This grant supports the establishment of the new research lab FOCAL, which aims to lay the foundations of decision and game theory that is relevant for increasing the ability of advanced machine agents to cooperate. The research at FOCAL builds a fundamental understanding of how we can avoid catastrophic cooperation failure between AI systems. Alongside its research activities, the lab also contributes outreach activities such as workshops, online seminar series, and visitor programs.

Selected outputs:

Machine Learning in Multi-Agent Settings

Jakob Foerster and Christian Schroeder de Witt

GBP 166,370

2021-2023

University of Oxford

This grant helps support the establishment of the Foerster Lab for AI Research (FLAIR) at the University of Oxford, which is focused broadly on the issue of machine learning in multi-agent settings. Specifically, the grant enabled the addition of an initial postdoctoral researcher to the group – Christian Schroeder de Witt – helping the lab to scale faster. FLAIR’s work concentrates on settings in which agents have to take into account, and possibly even influence, the learning of others so as to cooperate more effectively. This includes both AI agents, but also humans, whose diverse strategies and norms can be challenging for AI systems to conform with. Additional emphasis is paid to real-world applications, and the scaling of these ideas by combining multi-agent learning with agent-based models.

Selected outputs:

A Cooperative AI Contest with Melting Pot

Dylan Hadfield-Menell

USD 134,175

2023-2024

Massachusetts Institute of Technology

This grant supported a Cooperative AI contest that was run as part of NeurIPS 2023, with the aim to develop a benchmark to assess cooperative intelligence in multi-agent learning, and specifically how well agents can adapt their cooperative skills to interact with novel partners in unforeseen situations. The contest was based on a pre-existing evaluation suite for multi-agent reinforcement learning called Melting Pot, but with new content created specifically for the contest. These mixed-motive scenarios tested capabilities such as coordination, bargaining, and enforcement/commitment, which are all important for successful cooperation. The contest received 672 submissions from 117 teams, competing over a $10,000 prize pool. The announcement of the winners, summary of the contest and top submissions, as well as a panel of top cooperative AI researchers was hosted in person at NeurIPS 2023.

Selected outputs:

Scaling Opponent Shaping

Akbir Khan

GBP 10,000

2023

University College London

Opponent shaping can be used to avoid collectively bad outcomes in mixed-motive games by making decisions that guide the opponents’ learning towards better outcomes. This project evaluated existing methods' performance with new learners and over new environments, and expanded them to more complex games. In related but independent work, scenarios from the multi-agent reinforcement learning evaluation suite Melting Pot were reimplemented in a less computationally expensive version, making them more accessible for use as a benchmark by the wider research community.

Selected outputs:

Integrating Intention Into Cooperative AI

Joseph Halpern

USD 123,682

2023-2025

Cornell University

Understanding the intentions of other agents is important for successful cooperation. This project aims to develop a useful definition of intent, including collective intent, which would be a prerequisite for cooperation between agents to achieve a joint objective. Such a shared intent would have to build on beliefs about the other agent's intentions and future actions. The project will also explore how to design mechanisms for agents to signal intentions, and for agents to be able to reward or punish each other for the reliability of their signals.

Selected outputs:

Conceptualising Collective Cooperative Intelligence

Wolfram Barfuss

EUR 172,000

2023-2025

University of Bonn

This project aims to identify when and how agents learn to cooperate spontaneously without the algorithm designer’s explicit intent. To achieve this, a complex systems perspective on reinforcement learning will be applied to large-scale public good games. Such games have been used to describe the dynamics for real-world cooperation challenges such as climate change mitigation and other social dilemmas in which individual incentives do not align with the collective interest. This project, in particular, focuses on how the collective of agents affects the cooperativeness of the individual and vice versa. 

Selected outputs:

Designing Robust Cooperative AI Systems

Matthias Gerstgrasser and David Parkes

USD 140,000

2024

Harvard University

This project aims to develop methods to promote cooperation among agents. The focus lies on Stackelberg equilibria, in which one agent (a “leader”) commits to a strategy, and wants this to promote cooperation amongst others. The leader could be the designer of the game or else an agent who acts directly in the environment. A new methodology for solving the resulting learning problem will be developed and evaluated, including applications on fostering cooperation in economic environments. The aim is to advance the state of the art in theory and algorithms for learning Stackelberg equilibria in multi-agent reinforcement learning, and their application to solving mixed-motive cooperation problems.

Selected outputs:

Policy Aggregation

Ariel Procaccia

USD 233,264

2024-2025

Harvard University

This project explores value alignment of AI systems with a group of individuals rather than with a single individual. The aim is to design policy aggregation methods whose output policy is beneficial with respect to the entire group of stakeholders. The preferences of the stakeholders are learned by observation of behaviour (using a technique called inverse reinforcement learning). Two different approaches to aggregation are studied – voting and Nash welfare – both of which avoid key difficulties with the interpersonal comparison of preference strength. In the voting approach the aggregation arises from a ranking of alternative actions for each stakeholder, while the Nash welfare approach uses the product of stakeholder utilities. The aggregation algorithms will be evaluated both from the perspective of computational feasibility and from subjective assessments of the behaviour that the aggregated policy generates.

Selected outputs:

ACES: Action Explanation Through Counterfactual Simulation

Tobias Gerstenberg and Dorsa Sadigh

USD 500,000

2024-2025

Stanford University

This project will develop human-interpretable computational models of cooperation and competition, exploring scenarios in which agents help or hinder each other and asking human participants to evaluate what happened. The researchers will study increasingly capable agents and explore their interactions in simulated environments. The key hypothesis is that human judgments of helping and hindering are sensitive to what causal role an agent played, and what its actions reveal about its intentions. This is an interdisciplinary project involving both psychology and computer science. It builds on previous work that has employed counterfactual simulation models for capturing causal judgments in the physical domain as well as on Bayesian models of theory of mind for social inference.

Selected outputs:

Cooperation and Negotiation in Large Language Models

Natasha Jacques and Sergey Levine

USD 450,000

2023-2025

University of Washington and Berkeley

The recent wave of rapid progress of large language models (LLMs) has demonstrated that they can be incredibly powerful. This project aims to investigate the cooperative capabilities and tendencies of such models. A more thorough understanding of these capabilities could make it possible to defend against models that are capable of deception or coercion, and develop better algorithms for achieving cooperation in conversational settings.A benchmark environment will be developed focused on studying cooperative capabilities of LLMs in conversational settings with humans, where core capabilities related to cooperation in language (negotiation, deception, modelling other agents, and moral reasoning) could be measured and evaluated.

Selected outputs:

Quick and Safe Adaptation to New Teams

Eugene Vinitsky

USD 150,974

2024-2025

New York University

This project explores how to enhance an AI agent’s ability to learn the norms, conventions, and preferences of other agents in order to rapidly adapt and cooperate more effectively. It proposes creating a population of diverse and capable agent strategies that an agent can learn through a limited amount of interaction (known as a k-shot setting). To encourage rapid adaptation, the learning agent will be constrained to prioritise strategies that are easier to learn and coordinate with, guided by their description length. The approach will be evaluated in the game of Welfare Diplomacy, focusing on the agent's ability to form stable, high-welfare coalitions with unknown partners and its robustness to exploitative strategies.

Measuring Cooperation Among Competing AI Algorithms

Mithun Chakraborty and Michael P. Wellman

USD 347,424

2024-2026

University of Michigan

This project aims to rigorise, standardise and expedite the task of evaluating cooperativeness of new and emerging AI agents. The focus is on mixed-motive domains that may involve both humans and AI agents, and the work will cover both MARL and LLM-based agents. Metrics will be focused on outcomes, so that it is not just to what extent agents regard the welfare of others that is valued, but also how creative and competent they are in promoting it.

Selected outputs:

Emergent Norms for Sustainable Cooperation

Max Kleiman-Weiner

USD 293,574

2024-2026

University of Washington

The aim of this project is to study what, when, and how cooperative norms emerge from self-interested AI agents, with a focus on the role of communication in developing and sustaining cooperative norms. This involves developing cooperative benchmark environments (e.g., Governance of the Commons Simulation [GovSim]) for LLMs and other language-compatible reinforcement learning agents inspired by game-theoretic work on public good games and common pool resource problems.

Other-Regarding Goals

Mia Taylor

GBP 74,344

2025-2026

Center on Long-Term Risk

This project addresses the risk of AI agents acquiring unintended goals, with a focus on other-regarding goals (such as spite) which take into account the preferences of other agents. It aims to investigate whether greater representation of behaviours consistent with a particular goal in the training data make it more likely that a model acquires that goal during subsequent reinforcement learning. The purpose is to understand how to develop training schemes that select for cooperative dispositions.

AI Coercive Capabilities: Concepts and Measurements

Sophia Hatz

639,830 SEK

2025-2026

Uppsala University

This project addresses the dual-use capabilities underlying coercion in AI systems. Strong coercive capabilities could lead to large-scale societal harms through misuse. Conversely, some of the capabilities enabling coercion are also essential for fostering cooperation, such as increasing the credibility of commitments. With these challenges in mind, this project aims to develop practical ways to measure these capabilities and model the risks associated with different levels of coercive capabilities.

This grant was awarded through our early-career track, which supports research projects primarily carried out by a single individual for up to 12 months. Compared to a regular application, our early-career track assessment also considers to what extent the grant would further the career of a promising researcher. Applicants are usually within 2–3 years of their PhD (or similar career stage).

AI for Humanitarian Crisis Negotiation and Beyond

Finale Doshi Velez

USD 213,707

2024-2027

Harvard University

This project addresses the challenge of supporting human decision-makers in complex, multi-party negotiations for societal benefit, particularly in humanitarian crises. While these scenarios could be studied using traditional coalition building games (CBGs) focused on optimal coalition structures, this project recognises the limitations of such approaches, especially the lack of focus on iterative formation and the prioritisation of humanitarian goals across multiple negotiation rounds. To address this, the project will build upon a CBG framework, using MARL to develop coalition formation strategies for multiple goals and LLMs to synthesise and extract key information from unstructured negotiation case files. The project will then test this AI-assisted negotiation method with both lay users in synthetic scenarios as well as with teams of real frontline negotiators.

Governing the Risks That the Interaction Between AI Agents May Present to International Peace and Security

SIPRI

373,000 SEK

2025

The Stockholm International Peace Research Institute (SIPRI) is conducting a scoping study focused on the risks that the interaction between AI agents may present to international peace and security. The aim of the study is to raise awareness on the topic in diplomatic circles dedicated to international security, and to inform the design of a potential followup project on how cooperation challenges related to agentic AI ought to be governed at the multilateral level.

Selected outputs:

Strategic Compression for Adaptive Coordination

Sandy Tanwisuth

USD 15,000

2025

University of California, Berkeley

This project addresses miscoordination in mixed-motive multi-agent systems by developing a tractable alternative to full policy modelling of co-players. While initially proposing a learning architecture for compressing co-player behaviour into compact representations, the research evolved toward a more foundational question: when is an agent justified in acting on a simplified model of others? The work formalises strategic abstraction as an epistemic problem, identifying which distinctions between agents matter for decision-making, then clarifying when those critical distinctions are preserved by an abstraction, and when acting on a compressed representation becomes unjustified. This produced a workshop paper accepted at NeurIPS 2025 ARLET, which formalises how small differences in value estimates can signal when an agent's understanding is fragile, introducing decision margins to flag situations where confident action is unwarranted and deferral is safer.

This grant was awarded through our early-career track, which supports research projects primarily carried out by a single individual for up to 12 months. Compared to a regular application, our early-career track assessment also considers to what extent the grant would further the career of a promising researcher. Applicants are usually within 2–3 years of their PhD (or similar career stage).

Selected outputs:

Hierarchical Theories of Time and Cooperative AI

Kunal Jha

USD 64,763

2025-2026

University of Washington

This project investigates how AI systems can treat time as a strategic resource in multi-agent interaction, and how decision-making speed may signal cooperative intent. Standard multi-agent frameworks abstract away the temporal dimension: environments wait for agents to act, so thinking time carries no cost or social meaning. Yet behavioural research with humans shows that response speed correlates with cooperation rates and people infer others' intentions from how long they take to decide. The project develops a model of how agents can infer cooperative intentions by conditioning on other agents’ decision-making time, drawing on resource rationality (a framework that balances utility maximisation against cognitive effort costs). Experiments in a modified version of Welfare Diplomacy compare non-reasoning LLMs, reasoning LLMs, and a proposed hybrid architecture across escalating levels of mutual awareness of decision speed, tested with both AI and human participants.

This grant was awarded through our early-career track, which supports research projects primarily carried out by a single individual for up to 12 months. Compared to a regular application, our early-career track assessment also considers to what extent the grant would further the career of a promising researcher. Applicants are usually within 2–3 years of their PhD (or similar career stage).

Selected outputs:

Enhancing Shareholder Democracy Using AI-Mediated Representation

Michiel Bakker

USD 121,506

2025-2026

MIT

This project investigates whether LLMs can accurately represent diverse human preferences in high-stakes collective decision-making, using shareholder democracy as a testbed. Asset managers typically vote on behalf of investors with minimal input; few investors vote directly; and other attempts to innovate have failed to capture nuanced preferences. This project develops an AI-mediated representation framework in which LLMs predict individual voting preferences on shareholder proposals based on elicited values, generate explanations for their recommendations, and incorporate feedback to improve over time. It also explores whether LLMs can model how shareholders would vote given more time and information, and whether multi-LLM deliberation can generate new proposals with broader support. The approach is evaluated through qualitative interviews, controlled studies measuring prediction accuracy and trust across demographic groups, and a real-world pilot with retail investors. 

Selected outputs:

The Habermas Machine: a public tool to help humans find common ground

Christopher Summerfield

GBP 366,070

2025-2027

University of Oxford

This project aims to build a publicly available implementation of the Habermas Machine, an AI mediator that helps groups of people with diverse views reach agreement on contested issues. The system generates candidate 'group statements' that participants are collectively most likely to endorse. It uses a generative model, a reward model to predict each user's preferences, and democratic selection via ranked-choice voting, with refinement through user critique. Despite interest from governments and civil society groups, the original version remains in a proprietary codebase built on an earlier generation of LLM. This project rebuilds the system on university infrastructure using modern frontier models, with rigorous safety and fairness testing covering bias, faithfulness, accuracy, and strategy-proofness. The original results have been fully replicated; a public-facing website is under development and a proof-of-concept application to climate policy design in California is planned.

Deliberative Technology in Conflict Contexts

Search for Common Ground

USD 280,000

2025-2027

University of Oxford

This project evaluates how AI-assisted deliberative technology can promote effective cooperation in conflict-affected contexts, testing three models of consensus-building with young people in a country affected by conflict: human-facilitated dialogue, online deliberation using bridging-based ranking with LLM synthesis, and the Habermas Machine approach (using the new version developed by the team supported by our Habermas Machine grant). Around 300 participants drawn from a large online youth platform used in the country are assigned across the three models, with an independent evaluation group assessing the resulting consensus statements. The research compares both the quality of consensus outputs and participant experience across models, including perceived fairness, legitimacy, and whether participants felt heard across divides. The project also investigates best practices for building participants’ trust in deliberative processes enabled by AI, an essential condition for scaling these approaches in peacebuilding and policy contexts.

April 20, 2026

Cecilia Elena Tilli
Associate Director (Research and Grants)
Rebecca Eddington
Grants and Events Officer