Stefan Heimersheim

Member of Technical Staff

Previously at Apollo Research, he conducted foundational work in mechanistic interpretability—including parameter decomposition methods and studies of activation plateaus—as well as applied projects using interpretability to detect deception in LLMs.

He holds a PhD in Astronomy from the University of Cambridge, where he focused on 21 cm cosmology and Bayesian inference. He is a co-organiser of the NeurIPS 2025 Mechanistic Interpretability workshop.

NEWs & publications

NEWs & publications

No items found.
Training Reliable Activation Probes With a Handful of Positive Examples
September 30, 2025
training-reliable-activation-probes-with-a-handful-of-positive-examples
Transformers Don’t Need LayerNorm at Inference Time: Scaling LayerNorm Removal to GPT-2 XL and Implications for Mechanistic Interpretability
September 30, 2025
transformers-dont-need-layernorm-at-inference-time
Compressed Computation is (probably) not Computation in Superposition
December 6, 2025
compressed-computation-is-probably-not-computation-in-superposition
Towards Automated Circuit Discovery for Mechanistic Interpretability
July 4, 2023
towards-automated-circuit-discovery-for-mechanistic-interpretability