Cooperative AI

Strengthening Agent Infrastructure

What It Is and Why It Matters

Protocols and infrastructure for agent interaction, such as A2A, are emerging rapidly but tend to prioritise utility over security. The network effects driving their adoption and the nature of lock-in in digital infrastructure imply that we cannot afford to make safety an afterthought, but also that new infrastructure that attempts to replace increasingly entrenched incumbents is a less tractable direction. In this call, therefore, our focus is on stress-testing and strengthening existing agent infrastructure. This includes understanding and improving the safety-relevant properties of agent infrastructure and protocols (either theoretically or by empirical stress-testing in realistic scenarios), as well as providing additional support for features such as identity, reputation, accountability, provenance, commitment, or verifiable attributes, which have safety and governance implications (Chan et al., 2025). The distinctive properties of AI agents – including that they can be copied, modified, simulated, deleted, inspected, or deployed at scale – complicate familiar approaches to these problems while also enabling new solutions (Conitzer & Oesterheld, 2023).

Specific Work We Would Like to Fund

Agent identity, authentication, and admission control. Understanding the requirements and challenges for agent IDs vis-à-vis current deployments (Chan et al., 2024), including update and revocation procedures compatible with agents being copied, modified, or merged, and the uses of proof-of-agent or proof-of-human credentials. Investigating what changes to standard cryptographic trust protocols are necessary for platform-side vetting and access control across heterogeneous agents.
‍
Verifiable attributes, actions, and provenance. Methods for agents to reveal and verify their properties, resources, authorisation scope, and outputs, including zero-knowledge techniques where such information is strategically sensitive. Watermarking and scalable proofs of inference for attributing outputs to specific agents (e.g., Kirchenbauer et al., 2023; Sun et al., 2024).
‍
Reputation, accountability, and dispute resolution. Reputation system design for AI agents: how reputations are represented, what behavioural inputs feed them, how those inputs are aggregated, and how the resulting signals are made robust to gaming and manipulation. Infrastructure for tracking relationships and incidents across agent populations. Protocols for dispute resolution, renegotiation, and graceful termination when agreements break down.
‍
Commitments and delegation. Methods for credible commitment without third-party enforcement, not just via cryptographic tools but also by delegating to sub-agents whose code or other properties can be checked (Tennenholtz, 2004). We are especially interested in provisions for mutually conditional commitments, scope attenuation, multi-principal support, verifiable revocation, and other approaches that can reduce downside risks. Contract compliance monitoring and defences against delegation risks such as Sybil attacks and threats/malicious delegates.

Key Considerations

Please see the guidelines on research areas and out-of-scope topics.

References

Chan, Alan, Kevin Wei, Sihao Huang, Nitarshan Rajkumar, Elija Perrier, Seth Lazar, Gillian K. Hadfield, and Markus Anderljung (2025). "Infrastructure for AI Agents". arXiv:2501.10114.

‍

Chan, Alan, Noam Kolt, Peter Wills, Usman Anwar, Christian Schroeder de Witt, Nitarshan Rajkumar, Lewis Hammond, David Krueger, Lennart Heim, and Markus Anderljung (2024). "IDs for AI Systems". arXiv:2406.12137.

‍

Conitzer, Vincent, and Caspar Oesterheld (2023). "Foundations of Cooperative AI". In Proceedings of the 37th AAAI Conference on Artificial Intelligence, pp. 15359–15367.

‍

Kirchenbauer, John, Jonas Geiping, Yuxin Wen, Jonathan Katz, Ian Miers, and Tom Goldstein (2023). "A Watermark for Large Language Models". In Proceedings of the 40th International Conference on Machine Learning (PMLR 202), pp. 17061–17084.

‍

Sun, Haochen, Jason Li, and Hongyang Zhang (2024). "zkLLM: Zero Knowledge Proofs for Large Language Models". In Proceedings of the 2024 ACM SIGSAC Conference on Computer and Communications Security (CCS '24), pp. 4405–4419.

‍

Tennenholtz, Moshe (2004). "Program Equilibrium". Games and Economic Behavior (49:2), pp. 363–373.

Priority Research Areas

Sandboxes and Testbeds

Learn More →

The Science of Agent Networks

Learn More →

Multi-Agent Oversight and Control

Learn More →

Secondary Research Areas

No items found.