Chris Cundy

Research Scientist

Chris Cundy is a Research Scientist at FAR.AI. He is interested in how to detect and avoid misaligned behavior induced during training.

He received his PhD in Computer Science at Stanford University, advised by Stefano Ermon. During his PhD he published on topics including Causal Inference, Reinforcement Learning, and Large Language Models.

He has previously worked at CHAI, FHI, and was a winner of the OpenAI preparedness challenge.

NEWs & publications

Mind the Mitigation Gap: Why AI Companies Must Report Both Pre- and Post-Mitigation Safety Evaluations
May 23, 2025
mind-the-mitigation-gap
AI Companies Should Report Pre- and Post-Mitigation Safety Evaluations
ai-companies-should-report-pre--and-post-mitigation-safety-evaluations
​​A Toolkit for Estimating the Safety-Gap between Safety Trained and Helpful Only LLMs
May 22, 2025
a-toolkit-for-estimating-the-safety-gap-between-safety-trained-and-helpful-only-llms
​​A Toolkit for Estimating the Safety-Gap between Safety Trained and Helpful Only LLMs
a-toolkit-for-estimating-the-safety-gap-between-safety-trained-and-helpful-only-llms
Illusory Safety: Redteaming DeepSeek R1 and the Strongest Fine-Tunable Models of OpenAI, Anthropic, and Google
February 4, 2025
illusory-safety-redteaming-deepseek-r1-and-the-strongest-fine-tunable-models-of-openai-anthropic-and-google
Pacing Outside the Box: RNNs Learn to Plan in Sokoban
July 24, 2024
pacing-outside-the-box-rnns-learn-to-plan-in-sokoban
Planning behavior in a recurrent neural network that plays Sokoban
planning-behavior-in-a-recurrent-neural-network-that-plays-sokoban
​​A Toolkit for Estimating the Safety-Gap between Safety Trained and Helpful Only LLMs
May 22, 2025
a-toolkit-for-estimating-the-safety-gap-between-safety-trained-and-helpful-only-llms
​​A Toolkit for Estimating the Safety-Gap between Safety Trained and Helpful Only LLMs
a-toolkit-for-estimating-the-safety-gap-between-safety-trained-and-helpful-only-llms
Illusory Safety: Redteaming DeepSeek R1 and the Strongest Fine-Tunable Models of OpenAI, Anthropic, and Google
February 4, 2025
illusory-safety-redteaming-deepseek-r1-and-the-strongest-fine-tunable-models-of-openai-anthropic-and-google
Illusory Safety: Redteaming DeepSeek R1 and the Strongest Fine-Tunable Models of OpenAI, Anthropic, and Google
illusory-safety-redteaming-deepseek-r1-and-the-strongest-fine-tunable-models-of-openai-anthropic-and-google
AI Companies Should Report Pre- and Post-Mitigation Safety Evaluations
March 17, 2025
ai-companies-should-report-pre--and-post-mitigation-safety-evaluations