Abstract
We attack the state-of-the-art Go-playing AI system, KataGo, by training adversarial policies that play against frozen KataGo victims. Our attack achieves a >99% win rate when KataGo uses no tree-search, and a >77% win rate when KataGo uses enough search to be superhuman. Notably, our adversaries do not win by learning to play Go better than KataGo – in fact, our adversaries are easily beaten by human amateurs. Instead, our adversaries win by tricking KataGo into making serious blunders. Our results demonstrate that even superhuman AI systems may harbor surprising failure modes. Example games are available at goattack.far.ai.
Publication
International Conference on Machine Learning (ICML)
PhD Student
Tony Wang is a PhD student in the Computer Science and Artificial Intelligence Laboratory at the Massachusetts Institute of Technology (MIT), where he is advised by Nir Shavit. Tony’s research focuses on adversarial robustness. Tony collaborates with Adam Gleave and others at FAR.AI. For more information, see his website.
CEO and President of the Board
Adam Gleave is the CEO of FAR.AI. He completed his PhD in artificial intelligence (AI) at UC Berkeley, advised by Stuart Russell. His goal is to develop techniques necessary for advanced automated systems to verifiably act according to human preferences, even in situations unanticipated by their designer. He is particularly interested in improving methods for value learning, and robustness of deep RL. For more information, visit his website.
Research Engineer
Tom Tseng is a research engineer at FAR.AI. Tom previously worked as a software engineer at Gather and Cruise. He has a master’s degree from MIT and a bachelor’s degree from Carnegie Mellon University.
PhD Candidate
Kellin Pelrine is a PhD candidate at McGill University advised by Reihaneh Rabbany. He is also a member of the Mila AI Institute and the Centre for the Study of Democratic Citizenship. His main interests are in developing machine learning methods to leverage all available data and exploring how we can ensure methods will work as well in practice as on paper, with a particular focus on social good applications. Kellin collaborates with Adam Gleave at FAR.AI, and previously worked with us as a Research Scientist Intern.
Nora Belrose was a Research Engineer at FAR.AI. Prior to joining FAR.AI, Nora worked on applying deep learning to the task of detecting calcified arteries in mammograms at the startup CureMetrix. Nora has also made numerous open-source contributions, including developing a library, Classroom, implementing deep RL from human preferences.
Research Engineer
Joseph has a bachelors degree in Mathematics and Computer Science from the University of Warwick. Before joining FAR.AI he worked as a software engineer for various startups, most recently at the data privacy company Privitar. Last year he created the text-to-image website hypnogram.xyz.