CAIF’s mission is to support research that will improve the cooperative intelligence of advanced AI for the benefit of all, although even defining cooperative intelligence is not straightforward, as we discuss further further below in the first supporting document associated with this call. Importantly, our focus is on AI systems deployed by self-interested human principals. Thus, the goal of Cooperative AI is not to support the development of selflessly altruistic AI systems, except insofar as self-interested principals regard using these systems as in their interest. Rather, our mission is based on the hypothesis that many improvements in the ability of self-interested actors to cooperate will make humanity as a whole better off.
We are ultimately interested in the development of AI systems that can assist humans in their interactions with one another, and autonomously cooperate with a variety of agents with a range of different preferences. These variations give rise to several different contexts in which we might wish to evaluate how cooperative an AI system is, such as:
Finally, we note that there are many other dimensions along which the training and evaluation of agents may vary, such as centralized vs. decentralized training, online vs. offline learning, and whether agents have an explicit utility or reward function.
Here we briefly describe the two research directions on which we invite proposals. More in-depth discussion of each direction is provided in the supporting documents further below. These documents highlight aspects of these research directions that we currently believe to be most important, but do not cover all problems that successful proposals might address.
Proposed projects should fall under one (or both) of the research directions listed above. More specific guidance can be found in the supporting documents for each research direction. Applicants should submit a 2-5 page proposal including a budget (references do not count towards this page limit) and their CV(s). Please note that our default policy is to limit indirect costs (overheads) to 10% of the total grant value.
Applications will be accepted on a rolling basis, but we encourage timely submission in order to maximize the chance of funding. Proposals will be evaluated in accordance with our grantmaking principles. Anyone is eligible to apply, and we welcome applications from disciplines outside of computer science.
We are grateful to Asya Bergal, Jakob Foerster, Gillian Hadfield, Joe Halpern, Natasha Jaques, Joel Leibo, and Caspar Oesterheld for providing feedback on previous versions of this call for proposals and the supporting documents below. We also wish to thank Noam Brown, Vince Conitzer, Allan Dafoe, José Hernández-Orallo, Max Kleiman-Weiner, Kamal Ndousse, and Rohin Shah for previous helpful discussions on these topics.
We first review some challenges for rigorously defining and measuring cooperative intelligence. Research proposals might address these questions, or raise new conceptual or methodological challenges for the assessment of cooperative intelligence. We then list several other directions in the measurement and evaluation for Cooperative AI: measuring individual cooperative capabilities, diagnosing cooperation failures, and evaluation without explicit utility functions. This list does not necessarily exhaust the research directions that CAIF would be interested in funding.
We are ultimately interested in ensuring cooperation for the benefit of all, and successful cooperation involves joint action resulting in greater social welfare. Thus, we wish to measure properties of agents that allow us to predict the extent to which those agents will be able to work together to improve their social welfare in a wide variety of circumstances. Note that by “social welfare” we mean some formal measure of how well an outcome satisfies all agents’ preferences – for instance, the sum or product of all agents’ payoffs.
To illustrate the challenges involved in defining and measuring cooperative intelligence, we introduce the following example of a working definition, based on the Legg and Hutter (2007) definition of intelligence as an agent’s ability to achieve its goals in a wide variety of environments:
Cooperative intelligence is an agent’s ability to achieve their goals in ways that also promote social welfare, in a wide range of environments and with a wide range of other agents.
The notion of cooperative intelligence gestured at in this possible definition has some features that make it challenging to more formally define and measure, as we highlight below. Conceptual work is needed to come up with definitions of cooperative intelligence and related concepts that address these issues. Technical work is needed to measure them, as this may require innovative experimental design.
The role of intentions. In many circumstances, achieving high welfare requires joint action, as opposed to creating spillover benefits for other agents via independent action. For example, consider beavers building a dam, which incidentally creates a pond for fish to live in, despite the fact that the beavers did not intend to create a pond for the fish – i.e., even if there were no fish, the beavers' actions would have been the same (we thank Allan Dafoe for this example). This means the fact that the animals attained high welfare in this setting is not good evidence of their ability to attain high welfare in settings that do require joint effort (as when, for example, solving a social dilemma).
Byproduct cases are one reason that some writers have emphasized the causal or intentional aspects of cooperation. For example, Paternotte (2014) writes that “[a] first obvious fact is that the cooperative nature of a set of individual actions is underdetermined by observable behavior”. West et al. (2007) distinguish between several concepts in evolutionary biology – including mutualism, mutual benefit, cooperation, altruism, and fitness – and define cooperation as “a behavior which [sic] provides a benefit to another individual (recipient), and is selected for because of its beneficial effect on the recipient”. One example of early-stage work aimed at addressing this issue is given in Jesse Clifton and Sammy Martin's New Directions in Cooperative AI seminar, but other approaches, such as those based on a formal definition of intention (Halpern and Kleiman-Weiner, 2018), may also prove fruitful.
Dependence on the distribution of other agents. In the working definition above, cooperatively intelligent agents achieve high social welfare via joint action with a wide range of other agents. How should we choose the distribution of other agents on which cooperative intelligence is measured? One answer to this question can be found in the work of Hernández-Orallo et al. (2011), who propose the “Darwin-Wallace distribution” – a distribution over agents constructed by running a particular evolutionary process – for the measurement of intelligence in multi-agent settings. In the benchmark suite Melting Pot (Leibo et al. 2021; Joel Leibo’s New Directions in Cooperative AI seminar), agents are evaluated against specially-constructed “background populations”.
An important consideration in the choice of the distribution of agents against which cooperative intelligence should be measured is the existence of multiple equilibria. Complex multi-agent environments typically exhibit multiple equilibria, including multiple Pareto-optimal equilibria (as demonstrated, for example, by the folk theorems for repeated games). Thus, if we only evaluate an agent against agents playing according to the same equilibrium, we may severely overestimate its ability to coordinate with a wider range of agents.
Underdetermination of “high social welfare”. The preceding definition relies on a measure of social welfare, but there are many competing ways of measuring this concept. Different welfare functions make different tradeoffs between equality and total payoffs, and different equilibria may maximize different social welfare functions. This means that the measurement of cooperative intelligence may require normative judgements. Work on this problem could draw from the rich literatures on social choice theory, welfare economics, political philosophy, and cooperative bargaining, among many other fields (Thomson, 1994).
Tradeoffs between exploitability and welfare. Agents can cooperate (in the sense of attaining high-social welfare outcomes) with a wider variety of agents if they are willing to make themselves more exploitable. For instance, consider a negotiator who always accepts the first offer. Deterring exploitation is part of achieving one’s goals, but may come at the cost of lower social welfare. How should exploitability be traded off against an agent’s ability to attain high-social welfare outcomes with a wider variety of counterparts? This problem is discussed by Stastny et al. (2021) under the name “(cooperative) robustness-exploitability tradeoff”.
Cooperation vs. coercion. Distinguishing between coercion and cooperation can be challenging (Schelling 1980; Nozick 1969), as coercion may lead to improvements in social welfare, such as when potential defectors from a socially optimal outcome are threatened with punishment. Should a definition of cooperative intelligence discount cases where social welfare is improved due to coercion? If so, how precisely ought we modify the definition above? Moreover, how might we even detect coercion in AI systems, given its apparent similarity to cooperation in certain contexts?
In the previous section, we focused on the definition and measurement of a one-dimensional notion of cooperative intelligence, but there are many other directions that may prove fruitful in evaluating the cooperation-relevant features of AI systems.
Measuring specific cooperative capabilities. For instance, we may want to know how skilled an agent is at communication, coordination, modeling other agents, or overcoming informational problems, bargaining problems, and social dilemmas. We are also interested in metrics such as the rate at which these competencies are acquired.
Diagnosing failures of cooperation. In addition to measures of cooperative capabilities, diagnostic tools could prove invaluable for understanding the circumstances under which a particular group of agents exhibits significant failures to cooperate. This includes tools for interpreting agents’ cooperation-relevant representations, such as their beliefs about other agents.
Evaluation based on relative performance. In a multi-agent setting, the evaluation of agents based on their relative performance can make a game more competitive than intended, and incentivize harming other agents. For instance, evolutionary models show that competition between distantly related individuals or groups can select for spitefulness and aggression (Hamilton 1970; Gardner and West 2004; Choi and Bowles 2007). Similarly, a benchmark environment in which teams of agents directly compete against one another may incentivize agents to harm one another. One important research direction is therefore to develop methods for evaluation that avoid these incentives.
Evaluation without explicit utility functions. Because social welfare is a function of agents’ individual utilities, an operationalization of cooperative intelligence in terms of social welfare is only straightforwardly applicable to real-world AI systems when those systems have explicitly-specified utility functions. This could include, for example, reinforcement learning agents trained on a hard-coded reward signal, but may not be as easily extended to systems trained via unsupervised learning or human feedback. One possible direction for measuring the cooperative capabilities of agents without an explicit utility function would be to elicit human judgements. Work here could draw on methods from the well-established field of preference elicitation, and from more recent work on learning from human preferences (Christiano et al. 2017; Stiennon et al. 2020; Askell et al. 2021; Ouyang et al. 2022).
Using human feedback in this way raises new methodological and normative issues. For example, while humans are excellent cooperators in many respects, they are not ideally cooperatively intelligent. This means that we may not always want to treat human judgements about cooperative capability as “ground truth”. How, then, should human judges be instructed on how to evaluate AI systems’ cooperativeness? One direction could be to study systematic mistakes that impede human cooperation (see, e.g., Caputo 2013 for a review in the context of negotiation) and propose methods for correcting these biases in AI training regimes that require human feedback. Another direction might seek to address the fact that there is variability in human judgements of fairness (Henrich 2000; Oosterbeek, Sloof, and van de Kuilen 2004).
Cooperation-specific solution concepts. Developing new solution concepts (in which cooperation might play a central role, for instance; cf. Rabin’s (1993) “fairness equilibrium” and Rong’s and Halpern’s (2010) “cooperative equilibrium”) could enrich our understanding of what constitutes behavior that is both rational and cooperative.
Group-level capabilities. So far we have discussed the cooperative capabilities of individual agents. However, successful cooperation depends on the behavior of many agents, and individuals may be limited in their ability to increase the degree of cooperation. Thus, it may often be more helpful to think about the cooperative intelligence of a group. For one such example, see Woolley et al.’s (2010) work on the measurement of collective intelligence. This idea is also related to the aforementioned distinction between recommendations to a single AI developer vs. a collection of AI developers.
In addition to the general evaluation criteria for this call for proposals described further above, we provide the following guidance.
Tasks for training and evaluating a variety of cooperative competencies, in a variety of AI paradigms, will be crucial to the success of Cooperative AI. We first describe the features of environments and datasets that we believe to be particularly important. We then give two examples of the kinds of projects CAIF is interested in funding: multi-agent learning environments, and datasets and evaluation protocols for large language models (LLMs).
We are particularly interested in supporting the development of tasks (environments and datasets) with the following features.
Diversity of challenges for cooperation. Tasks will ideally require a host of cooperative competencies, including communication, coordination, modeling other agents, overcoming informational problems, bargaining problems, and social dilemmas, as well as support the ability to test against a variety of different kinds of agents.
Testing generalization. We want AI systems to acquire cooperative competencies that generalize to a wide set of environments and counterpart agents. This is one reason to prefer benchmarks that consist of a suite of tasks, such as Melting Pot, over benchmarks consisting of a single challenge task (such as Hanabi or Diplomacy; see below for background on these environments).
Encouragement of creative solutions. Improvements in cooperation can often be obtained by changing the “rules of the game”. For instance, agents with a high degree of mutual transparency can take advantage of program equilibrium-type approaches to resolving social dilemmas (Tennenholtz 2004; Vincent Conitzer’s New Directions in Cooperative AI seminar). Ideally, new environments will allow researchers to exercise significant creativity in designing agents to be more cooperatively intelligent.
Evaluation and learning from human feedback. Due to the difficulty of specifying desirable behavior via an explicit reward function in many domains, learning from human feedback will plausibly be an increasingly important approach to the design of AI systems (Christiano et al. 2017; Stiennon et al. 2020; Leike et al. 2018). It is therefore valuable to have tasks that can support the collection of human feedback (e.g., by including human-understandable tasks and interfaces). See also the discussion of “[e]valuation without explicit utility functions” in the preceding document.
Forward-compatibility with more advanced AI techniques. Given our interest in advanced AI systems, we would like infrastructure that supports the introduction of new environments that are compatible with new AI techniques. For instance, LLMs could play a key role in the development of more generally intelligent systems. Ideal multi-agent learning environments should avoid precluding the use of new techniques, such as incorporating language model-based communication.
Clear and precise evaluation protocols. It should be clear to researchers how the methods they develop will be evaluated on the task. On the other hand, having clear and precise evaluation protocols may be in tension with allowing creative approaches to improving cooperation. In very open-ended settings, “fuzzier” evaluation protocols, such as the judgements of a human jury, might be preferable.
Accessibility and user-friendliness. Ideal benchmarks will be easy for researchers to use, such as by avoiding the requirement of large amounts of compute and/or engineering talent in order to make research contributions, where possible.
Comparability with human players. A helpful feature of some existing benchmarks (such as Hanabi and Diplomacy) is that human experts provide a clear point of comparison. Moreover, in many domains we may be concerned with the ability of AI systems to cooperate with humans, rather than other AI systems.
Incentivizing differential progress. Given our focus on differential progress in cooperative intelligence (Jesse Clifton and Sammy Martin's New Directions in Cooperative AI seminar), we will prioritize supporting environments and datasets that facilitate the training and evaluation of differentially cooperative capabilities. To this end, it is important to consider the following questions.
On the other hand, environments will only spur differential progress in cooperative intelligence if it is non-trivial for self-interested agents to cooperate in them. For example, in sequential social dilemmas (Leibo et al. 2017) self-interested agents can maximize their score by (conditionally) cooperating, but training agents to engage in conditional cooperation rather than defection is not easy.
There are several existing environments relevant to evaluating the cooperative capabilities of contemporary multi-agent learning algorithms.
CAIF is interested in supporting work to develop new environments that will improve our ability to evaluate the cooperative capabilities of multi-agent learning algorithms. These may be modifications to existing environments (e.g., modifications to Diplomacy to make it a mixed-motive rather than zero-sum game) or wholly new environments (e.g., environments appropriate for inclusion in the Melting Pot suite, or a new “grand challenge” problem for Cooperative AI).
Large language models (LLMs) could play a significant role on the frontier of AI capabilities in the coming years (Brown et al. 2020; Bommasani et al. 2021). The kinds of datasets and evaluation protocols that might facilitate Cooperative AI research using LLMs include the following.
Negotiations. In the short-term, negotiation provides an interesting test case for certain cooperative capabilities (understanding other agents’ interests, and appropriately trading off one’s interests with other agents’). In the long run, AI systems may assist with high-stakes negotiations on behalf of humans (e.g., between large firms, or perhaps even states), and avoiding failure in those kinds of interactions will generally be in humanity’s interest. There are a number of existing datasets of simple natural language negotiations – including negotiations over how to divide a set of items, or over the price of an item for sale on Craigslist (Yarats and Lewis 2018; Lewis et al. 2017; He et al. 2018) – but there has been little work on negotiation with respect to LLMs.
Human judgements about cooperatively intelligent behavior. Evaluating the ability of LLMs (and other powerful models) to understand and reproduce human judgements about cooperatively intelligent behavior could be important to safely integrating these systems into society and settings involving other AI systems. Related efforts include Askell et al. (2021)’s study of the behavior of language models on the ETHICS dataset of Hendrycks et al. (2020), which collects natural language descriptions of different situations and human judgements about, for example, whether some behavior was morally appropriate. In combination with the preceding topic, one potentially useful dataset could contain human judgements of different aspects of the negotiators’ behavior. For example, Chawla et al. (2021) collect human annotations of simple natural language negotiations, which include things like “coordination”, “empathy”, “self-need”, and “other-need”.
Finally, two features that might make datasets for Cooperative AI research on LLMs particularly useful are the following.