Alex Spies

Member of Technical Staff

FAR.AI

Alex Spies is a Research Scientist at FAR.AI. He works on mitigating deceptive behavior in frontier models, with a particular interest in applied interpretability and scalable oversight. He received his PhD on Interpretable Representations in Neural Networks from Imperial College London, advised by Alessandra Russo and Murray Shanahan, and has published on mechanistic interpretability, causal world models in transformers, and object-centric relational reasoning. He also held a JSPS Doctoral Fellowship under Katsumi Inoue at Japan's National Institute of Informatics during his PhD. Alongside research, Alex led research on two AI Safety Camp projects and served as a Technical Advisor for the Pivotal Fellowship; he welcomes inquiries from researchers seeking to contribute to technical alignment.

NEWs & publications

NEWs & publications

No items found.
No items found.

publications:

No studies available yet.