Alex Spies

Member of Technical Staff

FAR.AI

Alex Spies is a Research Scientist at FAR.AI. He works on mitigating deceptive behavior in frontier models, with a particular interest in applied interpretability and scalable oversight. He received his PhD on Interpretable Representations in Neural Networks from Imperial College London, advised by Alessandra Russo and Murray Shanahan, and has published on mechanistic interpretability, causal world models in transformers, and object-centric relational reasoning. He also held a JSPS Doctoral Fellowship under Katsumi Inoue at Japan's National Institute of Informatics during his PhD. Alongside research, Alex led research on two AI Safety Camp projects and served as a Technical Advisor for the Pivotal Fellowship; he welcomes inquiries from researchers seeking to contribute to technical alignment.

NEWs & publications

No items found.

publications:

No studies available yet.

Research

Our research explores a portfolio
of high-potential agendas.

Events

Our events bring together
global leaders in AI.

Programs

Our programs build the field of trustworthy and secure AI