We aim to provide an up-to-date summary of the Cooperative AI Foundation's (CAIF) grants, though please note that some recently approved grants or recent outputs from projects may be missing. Grants are listed in chronological order, from earliest to latest. This page was last updated on 1 Sep 2025.
Make sure to also explore our recent partnerships with the PIBBS Fellowship and MATS program.
USD 500,000
2021-2025
Carnegie Mellon University
This grant supports the establishment of the new research lab FOCAL, which aims to lay the foundations of decision and game theory that is relevant for increasing the ability of advanced machine agents to cooperate. The research at FOCAL builds a fundamental understanding of how we can avoid catastrophic cooperation failure between AI systems. Alongside its research activities, the lab also contributes outreach activities such as workshops, online seminar series, and visitor programs.
Selected outputs:
GBP 166,370
2021-2023
University of Oxford
This grant helps support the establishment of the Foerster Lab for AI Research (FLAIR) at the University of Oxford, which is focused broadly on the issue of machine learning in multi-agent settings. Specifically, the grant enabled the addition of an initial postdoctoral researcher to the group – Christian Schroeder de Witt – helping the lab to scale faster. FLAIR’s work concentrates on settings in which agents have to take into account, and possibly even influence, the learning of others so as to cooperate more effectively. This includes both AI agents, but also humans, whose diverse strategies and norms can be challenging for AI systems to conform with. Additional emphasis is paid to real-world applications, and the scaling of these ideas by combining multi-agent learning with agent-based models.
Selected outputs:
USD 134,175
2023-2024
Massachusetts Institute of Technology
This grant supported a Cooperative AI contest that was run as part of NeurIPS 2023, with the aim to develop a benchmark to assess cooperative intelligence in multi-agent learning, and specifically how well agents can adapt their cooperative skills to interact with novel partners in unforeseen situations. The contest was based on a pre-existing evaluation suite for multi-agent reinforcement learning called Melting Pot, but with new content created specifically for the contest. These mixed-motive scenarios tested capabilities such as coordination, bargaining and enforcement/commitment, which are all important for successful cooperation. The contest received 672 submissions from 117 teams, competing over a $10,000 prize pool. The announcement of the winners, summary of the contest and top submissions, as well as a panel of top cooperative AI researchers was hosted in person at NeurIPS 2023.
Selected outputs:
GBP 10,000
2023
University College London
Opponent shaping can be used to avoid collectively bad outcomes in mixed-motive games by making decisions that guide the opponents’ learning towards better outcomes. This project evaluated existing methods' performance with new learners and over new environments, and expanded them to more complex games. In related but independent work, scenarios from the multi-agent reinforcement learning evaluation suite Melting Pot were reimplemented in a less computationally expensive version, making them more accessible for use as a benchmark by the wider research community.
Selected outputs:
USD 123,682
2023-2025
Cornell University
Understanding the intentions of other agents is important for successful cooperation. This project aims to develop a useful definition of intent, including collective intent, which would be a prerequisite for cooperation between agents to achieve a joint objective. Such a shared intent would have to build on beliefs about the other agent's intentions and future actions. The project will also explore how to design mechanisms for agents to signal intentions, and for agents to be able to reward or punish each other for the reliability of their signals.
Selected outputs:
EUR 172,000
2023-2025
University of Bonn
This project aims to identify when and how agents learn to cooperate spontaneously without the algorithm designer’s explicit intent. To achieve this, a complex systems perspective on reinforcement learning will be applied to large-scale public good games. Such games have been used to describe the dynamics for real-world cooperation challenges such as climate change mitigation and other social dilemmas in which individual incentives do not align with the collective interest. This project, in particular, focuses on how the collective of agents affects the cooperativeness of the individual and vice versa.
Selected outputs:
USD 140,000
2024
Harvard University
This project aims to develop methods to promote cooperation among agents. The focus lies on Stackelberg equilibria, in which one agent (a “leader”) commits to a strategy, and wants this to promote cooperation amongst others. The leader could be the designer of the game or else an agent who acts directly in the environment. A new methodology for solving the resulting learning problem will be developed and evaluated, including applications on fostering cooperation in economic environments. The aim is to advance the state of the art in theory and algorithms for learning Stackelberg equilibria in multi-agent reinforcement learning, and their application to solving mixed-motive cooperation problems.
Selected outputs:
USD 233,264
2024-2025
Harvard University
This project explores value alignment of AI systems with a group of individuals rather than with a single individual. The aim is to design policy aggregation methods whose output policy is beneficial with respect to the entire group of stakeholders. The preferences of the stakeholders are learned by observation of behaviour (using a technique called inverse reinforcement learning). Two different approaches to aggregation are studied – voting and Nash welfare – both of which avoid key difficulties with the interpersonal comparison of preference strength. In the voting approach the aggregation arises from a ranking of alternative actions for each stakeholder, while the Nash welfare approach uses the product of stakeholder utilities. The aggregation algorithms will be evaluated both from the perspective of computational feasibility and from subjective assessments of the behaviour that the aggregated policy generates.
Selected outputs:
USD 500,000
2024-2025
Stanford University
This project will develop human-interpretable computational models of cooperation and competition, exploring scenarios in which agents help or hinder each other and asking human participants to evaluate what happened. The researchers will study increasingly capable agents and explore their interactions in simulated environments. The key hypothesis is that human judgments of helping and hindering are sensitive to what causal role an agent played, and what its actions reveal about its intentions. This is an interdisciplinary project involving both psychology and computer science. It builds on previous work that has employed counterfactual simulation models for capturing causal judgments in the physical domain as well as on Bayesian models of theory of mind for social inference.
Selected outputs:
USD 450,000
2023-2025
University of Washington and Berkeley
The recent wave of rapid progress of large language models (LLMs) has demonstrated that they can be incredibly powerful. This project aims to investigate the cooperative capabilities and tendencies of such models. A more thorough understanding of these capabilities could make it possible to defend against models that are capable of deception or coercion, and develop better algorithms for achieving cooperation in conversational settings.A benchmark environment will be developed focused on studying cooperative capabilities of LLMs in conversational settings with humans, where core capabilities related to cooperation in language (negotiation, deception, modelling other agents, and moral reasoning) could be measured and evaluated.
Selected outputs:
USD 150,974
2024 - 2025
New York University
This project explores how to enhance an AI agent’s ability to learn the norms, conventions, and preferences of other agents in order to rapidly adapt and cooperate more effectively. It proposes creating a population of diverse and capable agent strategies that an agent can learn through a limited amount of interaction (known as a k-shot setting). To encourage rapid adaptation, the learning agent will be constrained to prioritise strategies that are easier to learn and coordinate with, guided by their description length. The approach will be evaluated in the game of Welfare Diplomacy, focusing on the agent's ability to form stable, high-welfare coalitions with unknown partners and its robustness to exploitative strategies.
USD 347,424
2024-2026
University of Michigan
This project aims to rigorize, standardise and expedite the task of evaluating cooperativeness of new and emerging AI agents. The focus is on mixed-motive domains that may involve both humans and AI agents, and the work will cover both MARL and LLM-based agents. Metrics will be focused on outcomes, so that it is not just to what extent agents regard the welfare of others that is valued, but also how creative and competent they are in promoting it.
USD 293,574
2024-2026
University of Washington
The aim of this project is to study what, when, and how cooperative norms emerge from self-interested AI agents, with a focus on the role of communication in developing and sustaining cooperative norms. This involves developing cooperative benchmark environments (e.g., Governance of the Commons Simulation [GovSim]) for LLMs and other language-compatible reinforcement learning agents inspired by game-theoretic work on public good games and common pool resource problems.
GBP 74,344
2025-2026
Center on Long-Term Risk
This project addresses the risk of AI agents acquiring unintended goals, with a focus on other-regarding goals (such as spite) which take into account the preferences of other agents. It aims to investigate whether greater representation of behaviors consistent with a particular goal in the training data make it more likely that a model acquires that goal during subsequent reinforcement learning. The purpose is to understand how to develop training schemes that select for cooperative dispositions.
639,830 SEK
2025-2026
Uppsala University
This project addresses the dual-use capabilities underlying coercion in AI systems. Strong coercive capabilities could lead to large-scale societal harms through misuse. Conversely, some of the capabilities enabling coercion are also essential for fostering cooperation, such as increasing the credibility of commitments. With these challenges in mind, this project aims to develop practical ways to measure these capabilities and model the risks associated with different levels of coercive capabilities.
This is the first early-career track grant awarded by the Cooperative AI Foundation. Sophia Hatz is an Associate Professor at the Department of Peace and Conflict Research (Uppsala University). She leads the Working Group on International AI Governance, within the Alva Myrdal Center for Nuclear Disarmament.
USD 213,707
2024-2027
Harvard University
This project addresses the challenge of supporting human decision-makers in complex, multi-party negotiations for societal benefit, particularly in humanitarian crises. While these scenarios could be studied using traditional coalition building games (CBGs) focused on optimal coalition structures, this project recognises the limitations of such approaches, especially the lack of focus on iterative formation and the prioritisation of humanitarian goals across multiple negotiation rounds. To address this, the project will build upon a CBG framework, using MARL to develop coalition formation strategies for multiple goals and LLMs to synthesise and extract key information from unstructured negotiation case files. The project will then test this AI-assisted negotiation method with both lay users in synthetic scenarios as well as with teams of real frontline negotiators.
373,000 SEK
2025
The Stockholm International Peace Research Institute (SIPRI) is conducting a scoping study focused on the risks that the interaction between AI agents may present to international peace and security. The aim of the study is to raise awareness on the topic in diplomatic circles dedicated to international security, and to inform the design of a potential followup project on how cooperation challenges related to agentic AI ought to be governed at the multilateral level.
September 1, 2025