
We aim to provide an up-to-date summary of the Cooperative AI Foundation's (CAIF) grants, though please note that some recently approved grants or recent outputs from projects may be missing. Grants are listed in chronological order, from earliest to latest. This page was last updated on 20 April 2026.
Make sure to also explore our recent partnerships with the PIBBS Fellowship and MATS program.
USD 500,000
2021-2025
Carnegie Mellon University
This grant supports the establishment of the new research lab FOCAL, which aims to lay the foundations of decision and game theory that is relevant for increasing the ability of advanced machine agents to cooperate. The research at FOCAL builds a fundamental understanding of how we can avoid catastrophic cooperation failure between AI systems. Alongside its research activities, the lab also contributes outreach activities such as workshops, online seminar series, and visitor programs.
Selected outputs:
GBP 166,370
2021-2023
University of Oxford
This grant helps support the establishment of the Foerster Lab for AI Research (FLAIR) at the University of Oxford, which is focused broadly on the issue of machine learning in multi-agent settings. Specifically, the grant enabled the addition of an initial postdoctoral researcher to the group – Christian Schroeder de Witt – helping the lab to scale faster. FLAIR’s work concentrates on settings in which agents have to take into account, and possibly even influence, the learning of others so as to cooperate more effectively. This includes both AI agents, but also humans, whose diverse strategies and norms can be challenging for AI systems to conform with. Additional emphasis is paid to real-world applications, and the scaling of these ideas by combining multi-agent learning with agent-based models.
Selected outputs:
USD 134,175
2023-2024
Massachusetts Institute of Technology
This grant supported a Cooperative AI contest that was run as part of NeurIPS 2023, with the aim to develop a benchmark to assess cooperative intelligence in multi-agent learning, and specifically how well agents can adapt their cooperative skills to interact with novel partners in unforeseen situations. The contest was based on a pre-existing evaluation suite for multi-agent reinforcement learning called Melting Pot, but with new content created specifically for the contest. These mixed-motive scenarios tested capabilities such as coordination, bargaining, and enforcement/commitment, which are all important for successful cooperation. The contest received 672 submissions from 117 teams, competing over a $10,000 prize pool. The announcement of the winners, summary of the contest and top submissions, as well as a panel of top cooperative AI researchers was hosted in person at NeurIPS 2023.
Selected outputs:
GBP 10,000
2023
University College London
Opponent shaping can be used to avoid collectively bad outcomes in mixed-motive games by making decisions that guide the opponents’ learning towards better outcomes. This project evaluated existing methods' performance with new learners and over new environments, and expanded them to more complex games. In related but independent work, scenarios from the multi-agent reinforcement learning evaluation suite Melting Pot were reimplemented in a less computationally expensive version, making them more accessible for use as a benchmark by the wider research community.
Selected outputs:
USD 123,682
2023-2025
Cornell University
Understanding the intentions of other agents is important for successful cooperation. This project aims to develop a useful definition of intent, including collective intent, which would be a prerequisite for cooperation between agents to achieve a joint objective. Such a shared intent would have to build on beliefs about the other agent's intentions and future actions. The project will also explore how to design mechanisms for agents to signal intentions, and for agents to be able to reward or punish each other for the reliability of their signals.
Selected outputs:
EUR 172,000
2023-2025
University of Bonn
This project aims to identify when and how agents learn to cooperate spontaneously without the algorithm designer’s explicit intent. To achieve this, a complex systems perspective on reinforcement learning will be applied to large-scale public good games. Such games have been used to describe the dynamics for real-world cooperation challenges such as climate change mitigation and other social dilemmas in which individual incentives do not align with the collective interest. This project, in particular, focuses on how the collective of agents affects the cooperativeness of the individual and vice versa.
Selected outputs:
USD 140,000
2024
Harvard University
This project aims to develop methods to promote cooperation among agents. The focus lies on Stackelberg equilibria, in which one agent (a “leader”) commits to a strategy, and wants this to promote cooperation amongst others. The leader could be the designer of the game or else an agent who acts directly in the environment. A new methodology for solving the resulting learning problem will be developed and evaluated, including applications on fostering cooperation in economic environments. The aim is to advance the state of the art in theory and algorithms for learning Stackelberg equilibria in multi-agent reinforcement learning, and their application to solving mixed-motive cooperation problems.
Selected outputs:
USD 233,264
2024-2025
Harvard University
This project explores value alignment of AI systems with a group of individuals rather than with a single individual. The aim is to design policy aggregation methods whose output policy is beneficial with respect to the entire group of stakeholders. The preferences of the stakeholders are learned by observation of behaviour (using a technique called inverse reinforcement learning). Two different approaches to aggregation are studied – voting and Nash welfare – both of which avoid key difficulties with the interpersonal comparison of preference strength. In the voting approach the aggregation arises from a ranking of alternative actions for each stakeholder, while the Nash welfare approach uses the product of stakeholder utilities. The aggregation algorithms will be evaluated both from the perspective of computational feasibility and from subjective assessments of the behaviour that the aggregated policy generates.
Selected outputs:
USD 500,000
2024-2025
Stanford University
This project will develop human-interpretable computational models of cooperation and competition, exploring scenarios in which agents help or hinder each other and asking human participants to evaluate what happened. The researchers will study increasingly capable agents and explore their interactions in simulated environments. The key hypothesis is that human judgments of helping and hindering are sensitive to what causal role an agent played, and what its actions reveal about its intentions. This is an interdisciplinary project involving both psychology and computer science. It builds on previous work that has employed counterfactual simulation models for capturing causal judgments in the physical domain as well as on Bayesian models of theory of mind for social inference.
Selected outputs:
USD 450,000
2023-2025
University of Washington and Berkeley
The recent wave of rapid progress of large language models (LLMs) has demonstrated that they can be incredibly powerful. This project aims to investigate the cooperative capabilities and tendencies of such models. A more thorough understanding of these capabilities could make it possible to defend against models that are capable of deception or coercion, and develop better algorithms for achieving cooperation in conversational settings.A benchmark environment will be developed focused on studying cooperative capabilities of LLMs in conversational settings with humans, where core capabilities related to cooperation in language (negotiation, deception, modelling other agents, and moral reasoning) could be measured and evaluated.
Selected outputs:
USD 150,974
2024-2025
New York University
This project explores how to enhance an AI agent’s ability to learn the norms, conventions, and preferences of other agents in order to rapidly adapt and cooperate more effectively. It proposes creating a population of diverse and capable agent strategies that an agent can learn through a limited amount of interaction (known as a k-shot setting). To encourage rapid adaptation, the learning agent will be constrained to prioritise strategies that are easier to learn and coordinate with, guided by their description length. The approach will be evaluated in the game of Welfare Diplomacy, focusing on the agent's ability to form stable, high-welfare coalitions with unknown partners and its robustness to exploitative strategies.
USD 347,424
2024-2026
University of Michigan
This project aims to rigorise, standardise and expedite the task of evaluating cooperativeness of new and emerging AI agents. The focus is on mixed-motive domains that may involve both humans and AI agents, and the work will cover both MARL and LLM-based agents. Metrics will be focused on outcomes, so that it is not just to what extent agents regard the welfare of others that is valued, but also how creative and competent they are in promoting it.
Selected outputs:
USD 293,574
2024-2026
University of Washington
The aim of this project is to study what, when, and how cooperative norms emerge from self-interested AI agents, with a focus on the role of communication in developing and sustaining cooperative norms. This involves developing cooperative benchmark environments (e.g., Governance of the Commons Simulation [GovSim]) for LLMs and other language-compatible reinforcement learning agents inspired by game-theoretic work on public good games and common pool resource problems.
GBP 74,344
2025-2026
Center on Long-Term Risk
This project addresses the risk of AI agents acquiring unintended goals, with a focus on other-regarding goals (such as spite) which take into account the preferences of other agents. It aims to investigate whether greater representation of behaviours consistent with a particular goal in the training data make it more likely that a model acquires that goal during subsequent reinforcement learning. The purpose is to understand how to develop training schemes that select for cooperative dispositions.
639,830 SEK
2025-2026
Uppsala University
This project addresses the dual-use capabilities underlying coercion in AI systems. Strong coercive capabilities could lead to large-scale societal harms through misuse. Conversely, some of the capabilities enabling coercion are also essential for fostering cooperation, such as increasing the credibility of commitments. With these challenges in mind, this project aims to develop practical ways to measure these capabilities and model the risks associated with different levels of coercive capabilities.
This grant was awarded through our early-career track, which supports research projects primarily carried out by a single individual for up to 12 months. Compared to a regular application, our early-career track assessment also considers to what extent the grant would further the career of a promising researcher. Applicants are usually within 2–3 years of their PhD (or similar career stage).
USD 213,707
2024-2027
Harvard University
This project addresses the challenge of supporting human decision-makers in complex, multi-party negotiations for societal benefit, particularly in humanitarian crises. While these scenarios could be studied using traditional coalition building games (CBGs) focused on optimal coalition structures, this project recognises the limitations of such approaches, especially the lack of focus on iterative formation and the prioritisation of humanitarian goals across multiple negotiation rounds. To address this, the project will build upon a CBG framework, using MARL to develop coalition formation strategies for multiple goals and LLMs to synthesise and extract key information from unstructured negotiation case files. The project will then test this AI-assisted negotiation method with both lay users in synthetic scenarios as well as with teams of real frontline negotiators.
373,000 SEK
2025
The Stockholm International Peace Research Institute (SIPRI) is conducting a scoping study focused on the risks that the interaction between AI agents may present to international peace and security. The aim of the study is to raise awareness on the topic in diplomatic circles dedicated to international security, and to inform the design of a potential followup project on how cooperation challenges related to agentic AI ought to be governed at the multilateral level.
Selected outputs:
USD 15,000
2025
University of California, Berkeley
This project addresses miscoordination in mixed-motive multi-agent systems by developing a tractable alternative to full policy modelling of co-players. While initially proposing a learning architecture for compressing co-player behaviour into compact representations, the research evolved toward a more foundational question: when is an agent justified in acting on a simplified model of others? The work formalises strategic abstraction as an epistemic problem, identifying which distinctions between agents matter for decision-making, then clarifying when those critical distinctions are preserved by an abstraction, and when acting on a compressed representation becomes unjustified. This produced a workshop paper accepted at NeurIPS 2025 ARLET, which formalises how small differences in value estimates can signal when an agent's understanding is fragile, introducing decision margins to flag situations where confident action is unwarranted and deferral is safer.
This grant was awarded through our early-career track, which supports research projects primarily carried out by a single individual for up to 12 months. Compared to a regular application, our early-career track assessment also considers to what extent the grant would further the career of a promising researcher. Applicants are usually within 2–3 years of their PhD (or similar career stage).
Selected outputs:
USD 64,763
2025-2026
University of Washington
This project investigates how AI systems can treat time as a strategic resource in multi-agent interaction, and how decision-making speed may signal cooperative intent. Standard multi-agent frameworks abstract away the temporal dimension: environments wait for agents to act, so thinking time carries no cost or social meaning. Yet behavioural research with humans shows that response speed correlates with cooperation rates and people infer others' intentions from how long they take to decide. The project develops a model of how agents can infer cooperative intentions by conditioning on other agents’ decision-making time, drawing on resource rationality (a framework that balances utility maximisation against cognitive effort costs). Experiments in a modified version of Welfare Diplomacy compare non-reasoning LLMs, reasoning LLMs, and a proposed hybrid architecture across escalating levels of mutual awareness of decision speed, tested with both AI and human participants.
This grant was awarded through our early-career track, which supports research projects primarily carried out by a single individual for up to 12 months. Compared to a regular application, our early-career track assessment also considers to what extent the grant would further the career of a promising researcher. Applicants are usually within 2–3 years of their PhD (or similar career stage).
Selected outputs:
USD 121,506
2025-2026
MIT
This project investigates whether LLMs can accurately represent diverse human preferences in high-stakes collective decision-making, using shareholder democracy as a testbed. Asset managers typically vote on behalf of investors with minimal input; few investors vote directly; and other attempts to innovate have failed to capture nuanced preferences. This project develops an AI-mediated representation framework in which LLMs predict individual voting preferences on shareholder proposals based on elicited values, generate explanations for their recommendations, and incorporate feedback to improve over time. It also explores whether LLMs can model how shareholders would vote given more time and information, and whether multi-LLM deliberation can generate new proposals with broader support. The approach is evaluated through qualitative interviews, controlled studies measuring prediction accuracy and trust across demographic groups, and a real-world pilot with retail investors.
Selected outputs:
GBP 366,070
2025-2027
University of Oxford
This project aims to build a publicly available implementation of the Habermas Machine, an AI mediator that helps groups of people with diverse views reach agreement on contested issues. The system generates candidate 'group statements' that participants are collectively most likely to endorse. It uses a generative model, a reward model to predict each user's preferences, and democratic selection via ranked-choice voting, with refinement through user critique. Despite interest from governments and civil society groups, the original version remains in a proprietary codebase built on an earlier generation of LLM. This project rebuilds the system on university infrastructure using modern frontier models, with rigorous safety and fairness testing covering bias, faithfulness, accuracy, and strategy-proofness. The original results have been fully replicated; a public-facing website is under development and a proof-of-concept application to climate policy design in California is planned.
USD 280,000
2025-2027
University of Oxford
This project evaluates how AI-assisted deliberative technology can promote effective cooperation in conflict-affected contexts, testing three models of consensus-building with young people in a country affected by conflict: human-facilitated dialogue, online deliberation using bridging-based ranking with LLM synthesis, and the Habermas Machine approach (using the new version developed by the team supported by our Habermas Machine grant). Around 300 participants drawn from a large online youth platform used in the country are assigned across the three models, with an independent evaluation group assessing the resulting consensus statements. The research compares both the quality of consensus outputs and participant experience across models, including perceived fairness, legitimacy, and whether participants felt heard across divides. The project also investigates best practices for building participants’ trust in deliberative processes enabled by AI, an essential condition for scaling these approaches in peacebuilding and policy contexts.
April 20, 2026

