5 Cooperative Infrastructure
Required Content 3hrs 45mins • All Content 6hrs 55mins
Humans in many ways excel at cooperation. In addition to our biological “built-in” prosociality, humans have also evolved complex and sophisticated mechanisms that underpin our cooperative abilities, such as:
- Reward systems (recognising and incentivising good cooperators)
- Social norms and rules (both formal laws and informal expectations)
- Reputation systems (we remember and share information about who can be trusted)
- Punishment mechanisms (from social shaming to legal consequences)
These mechanisms are not properties of individuals, but can instead be thought of as different kinds of cooperative infrastructure or institutions.
By the end of the section, you should be able to:
- Explain what is meant by centralised and decentralised cooperative infrastructure.
- Identify examples of cooperative infrastructure in human societies and discuss how similar infrastructure could be applied to multi-agent AI systems.
- Explain and provide examples of mechanisms which help enforce norms in a multi-agent system.
- Describe how systems for agent identification and reputation could support cooperation among AI-Human multi-agent systems.
The next resource, section 4.4 of Open Problems in Cooperative AI, introduces this important sub-area of cooperative AI research which deals with cooperative infrastructure for AI agents. Note that the words infrastructure and institutions are used somewhat interchangeably in this context.
This kind of work is often inspired by human cooperation, so if you want to gain a bit more background on cooperative infrastructure in human societies, you might also want to read the optional resource which compares human cooperation to that of other natural species.
Tooltip Text
The distinction between centralised and decentralised infrastructure is important. Centralised infrastructure can often be more tangible and easier to grasp; in human societies this could be exemplified by a system of laws and law enforcement or a religious organisation with clear leadership and organisational hierarchies. These are typically things that people associate with the word institutions.
Tooltip Text
This takes us to a concept from economics research: mechanism design. The next resource explains how mechanism design theory deals with the engineering of mechanisms - or infrastructure - that make (strategic, economic) agents behave in a way that generates desirable outcomes.
Tooltip Text
While the video focuses on mechanism design in the context of economic theory for human societies, mechanism design can also be applied to a system of AI agents to make them behave in a desired way - for example, to achieve sustainable cooperative behaviour. An important feature here is that mechanism design is top down and assumes that the mechanism designer has sufficient control over the agents’ environment to “set the rules of the game”.
Tooltip Text
There are different ways to measure aggregate welfare in a system of agents that you might want to use to compare different centralised infrastructure or mechanisms against. For example, there’s utilitarian welfare which equals the sum of all the individuals’ welfare, and egalitarian welfare which equals the minimum individual welfare in the system. Assuming we can approximate the welfare of each individual, what are some other measures of welfare you can think of? Do you have a preferred measure, and why?
Tooltip Text
Imagine there are 3 candidates, A, B and C, and 3 voters, 1, 2 and 3. Voters cast their votes by ranking the candidates, e.g. A > B > C, and then the winner is decided as follows: for each candidate, count up the number of candidates they are preferred to across the voters e.g. for the votes A > B > C and C > A > B, A is preferred to 3 candidates in total; the winning candidate is the one with the highest number of candidates they are preferred to; in the event of a tie, the candidate highest in alphabetical order is selected e.g. if A and C tie, then A wins. A voting rule is strategyproof if each player is not strictly incentivised to cast a vote which does not truthfully represent their preferences given knowledge of the votes that others will cast e.g. if voter 3 knows what orderings voters 1 and 2 will cast, among voter 3's best options is to cast their true preference order for the candidates. By way of a counterexample, show that the voting rule presented here is not strategyproof?
Tooltip Text
Notice the parallel between the mechanism design and opponent shaping, which was introduced in the previous section. Both of these are approaches for shaping the policies of agents, but they rely on different levels of control - in opponent shaping, the influence is exerted only through the action space of the shaping agent, while mechanism design relies on some significant level of control over the environment.
The next resource introduces adaptive mechanism design as an approach for promoting cooperation among AI agents.
Tooltip Text
If adaptive mechanism design could be scaled up to important real world settings - large numbers of complex agents in complex environments - this could be a way to avoid cooperation failures. A problem with this kind of approach however, is that it requires an environment that has some kind of centralised control. Furthermore, even when such an entity exists, it might not always be motivated to implement cooperation-promoting mechanisms.
Tooltip Text
The next couple of readings demonstrate examples of centralised cooperative infrastructure. The first one proposes a mediator that can optimise social welfare and the second builds on the idea of formal contracting from economics to overcome diverging incentives.
Tooltip Text
If you are interested to see how mediators are formalised then we recommend the section ‘Problem setup’ of the above resource.
Tooltip Text
The following optional piece experimentally tests ideas from ‘Formal Contracts Mitigate Social Dilemmas in Multi-Agent RL’ on LLM-agents.
Tooltip Text
As noted, the ability to actually implement centralised cooperative infrastructure is a key crux for the impact of such solutions, and this is an important reason to also explore decentralised solutions. A key concept here - again inspired by studies of human cooperation - is social norms and normative systems. The next piece discusses definitions of normative multi-agent systems and introduces relevant research questions.
Tooltip Text
It is well established that the rational choice in a single round of prisoner’s dilemma is to defect and confess the crime, but imagine now a version where the two prisoners are members of a criminal community where there is a strong social norm against such behaviour. This changes things–the prisoners are much more likely to cooperate (stay silent) in this scenario. How can this be understood as a rational behaviour if both prisoners are egotistical agents optimising only for their own benefit? What do you think would happen if one of them chose to defect? Write out a more representative payoff matrix for this situation given the context.
Tooltip Text
As the previous resource noted, “Norms are not necessarily created by a single legislator, they can also emerge spontaneously, or be negotiated among the agents” - this is another way of saying that norms can be established in a decentralised way. While this makes such systems an interesting alternative to centralised approaches, one might wonder how such norms could actually emerge or be established among AI agents. In the remainder of this section we’ll look at the enforcement of decentralised norms, and infrastructure that could support this, and see an example of LLM-agents learning to utilise indirect reciprocity to enforce cooperative norms.
Norms always rely on enforcement, typically through different kinds of decentralised sanctioning of norm-breaking behaviour. Remember how cooperation can become sustainable in a prisoner’s dilemma if the game is repeated, as it is possible to punish the opponent for defections. Sanctioning can be thought of as a crowd-sourcing of such punishment, where the interaction might not be repeated with the same specific individual but rather with different individuals of the same group.
In the following paper agents can see a history of all the sanctioning events and they each learn their own classifier to judge behaviors as “approved” or “disapproved.”
Tooltip Text
We have previously noted the parallels between opponent shaping, where one agent shapes the learning of another agent, and mechanism design, where the mechanism designer shapes the learning and behaviour of one or many agents in an environment. Decentralised norm enforcement can be thought of as a third variation; here, the group collectively shapes the learning of the members.
The next resource is a paper on the emergence of cooperative infrastructure among LLM agents, connecting normative systems to evolutionary game theory. In this work, authors show how Claude-based agents manage to successfully utilise information given to them about the past behaviour of others and eventually start to exhibit indirect reciprocity.
Tooltip Text
'Introduction' and 'Discussion'
Tooltip Text
Think of an example of a norm among humans and answer the following questions: where did the norm come from; how is it represented (formally or otherwise) in the society it is active; how is the behaviour of people evaluated for adherence to the norm; what are the consequences for not adhering to the norm.
Tooltip Text
Most studies on norms, especially in AI, are focussed on norms that are very clearly beneficial for cooperation or coordination. The following optional resource makes a case for the functionality of norms that are ostensibly redundant to the overall fitness of a group.
Tooltip Text
Effective norm enforcement can be supported by complementary, centralised infrastructure. Reliable systems for agent ID’s have for example been proposed as a solution that could make multi-agent interactions significantly safer, as they would make reputation systems and norm enforcement more tractable. A system for agent ID’s could be facilitated centrally, e.g. by governments, and then used for decentralised enforcement of norms e.g. by agents avoiding interactions with those that don’t have a verifiable ID, or with those who have a history of bad behaviour.
The final resource of this section is a short talk on why agent infrastructure in general and agent ID’s in particular could be important for AI safety. This talk takes a broader safety perspective, not only focusing on multiagent safety. If you want to go more in depth, we also include a paper by the same author as an optional resource.
Tooltip Text
There are three main types of approaches to agent training and deployment, the terminology for which stems from Multi-agent Reinforcement Learning: centralised training and execution, centralised training for decentralised execution, and decentralised training and execution. (For an explanation of these categories, we would recommend page 2 of An Introduction to Centralized Training for Decentralized Execution in Cooperative Multi-Agent Reinforcement Learning.) You were introduced to various training schemes in the last chapter which fall predominantly in the latter two categories. What are the key technical challenges in setting up truly centralised training programs among AI agents produced by multiple different companies?
Tooltip Text
