Adversarial Policies Beat Superhuman Go AIs

Abstract

We attack the state-of-the-art Go-playing AI system, KataGo, by training adversarial policies that play against frozen KataGo victims. Our attack achieves a >99% win rate when KataGo uses no tree-search, and a >77% win rate when KataGo uses enough search to be superhuman. Notably, our adversaries do not win by learning to play Go better than KataGo – in fact, our adversaries are easily beaten by human amateurs. Instead, our adversaries win by tricking KataGo into making serious blunders. Our results demonstrate that even superhuman AI systems may harbor surprising failure modes. Example games are available at goattack.far.ai.

Publication
International Conference on Machine Learning (ICML)
Tony Wang
Tony Wang
PhD Student

Tony Wang is a PhD student in the Computer Science and Artificial Intelligence Laboratory at the Massachusetts Institute of Technology (MIT), where he is advised by Nir Shavit. Tony’s research focuses on adversarial robustness, mechanistic interpretability and scaling laws. Tony collaborates with Adam Gleave and others at FAR. For more information, see his website.

Adam Gleave
Adam Gleave
CEO and President of the Board

Adam Gleave is the CEO of FAR. He completed his PhD in artificial intelligence (AI) at UC Berkeley, advised by Stuart Russell. His goal is to develop techniques necessary for advanced automated systems to verifiably act according to human preferences, even in situations unanticipated by their designer. He is particularly interested in improving methods for value learning, and robustness of deep RL. For more information, visit his website.

Tom Tseng
Tom Tseng
Research Engineer

Tom Tseng is a research engineer at FAR. Tom previously worked as a software engineer at Gather and Cruise. He has a master’s degree from MIT and a bachelor’s degree from Carnegie Mellon University.

Kellin Pelrine
Kellin Pelrine
PhD Candidate

Kellin Pelrine is a PhD candidate at McGill University advised by Reihaneh Rabbany. He is also a member of the Mila AI Institute and the Centre for the Study of Democratic Citizenship. His main interests are in developing machine learning methods to leverage all available data and exploring how we can ensure methods will work as well in practice as on paper, with a particular focus on social good applications. Kellin collaborates with Adam Gleave at FAR, and previously worked with us as a Research Scientist Intern.

Nora Belrose
Nora Belrose

Nora Belrose was a Research Engineer at FAR. Prior to joining FAR, Nora worked on applying deep learning to the task of detecting calcified arteries in mammograms at the startup CureMetrix. Nora has also made numerous open-source contributions, including developing a library, Classroom, implementing deep RL from human preferences.

Joseph Miller
Joseph Miller
Research Engineer

Joseph has a bachelors degree in Mathematics and Computer Science from the University of Warwick. Before joining FAR he worked as a software engineer for various startups, most recently at the data privacy company Privitar. Last year he created the text-to-image website hypnogram.xyz.