Chris Cundy

Member of Technical Staff

Chris Cundy is a Research Scientist at FAR.AI. He is interested in how to detect and avoid misaligned behavior induced during training.

He received his PhD in Computer Science at Stanford University, advised by Stefano Ermon. During his PhD he published on topics including Causal Inference, Reinforcement Learning, and Large Language Models.

He has previously worked at CHAI, FHI, and was a winner of the OpenAI preparedness challenge.

NEWs & publications

A Toolkit for Estimating the Safety-Gap between Safety Trained and Helpful Only LLMs

July 31, 2025

safety-gap-toolkit

The Safety Gap Toolkit: Evaluating Hidden Dangers of Open-Source Models

the-safety-gap-toolkit-evaluating-hidden-dangers-of-open-source-models

Why does training on insecure code make models broadly misaligned?

June 17, 2025

why-does-training-on-insecure-code-make-models-broadly-misaligned

Why does training on insecure code make models broadly misaligned?

why-does-training-on-insecure-code-make-models-broadly-misaligned

Avoiding AI Deception: Lie Detectors can either Induce Honesty or Evasion

June 4, 2025

avoiding-ai-deception

Preference Learning with Lie Detectors can Induce Honesty or Evasion

preference-learning-with-lie-detectors-can-induce-honesty-or-evasion

Illusory Safety: Redteaming DeepSeek R1 and the Strongest Fine-Tunable Models of OpenAI, Anthropic, and Google

February 4, 2025

illusory-safety-redteaming-deepseek-r1-and-the-strongest-fine-tunable-models-of-openai-anthropic-and-google

Jailbreak-Tuning: Models Efficiently Learn Jailbreak Susceptibility

jailbreak-tuning-models-efficiently-learn-jailbreak-susceptibility

Pacing Outside the Box: RNNs Learn to Plan in Sokoban

July 24, 2024

pacing-outside-the-box-rnns-learn-to-plan-in-sokoban

Planning behavior in a recurrent neural network that plays Sokoban

planning-behavior-in-a-recurrent-neural-network-that-plays-sokoban

The Safety Gap Toolkit: Evaluating Hidden Dangers of Open-Source Models

July 8, 2025

the-safety-gap-toolkit-evaluating-hidden-dangers-of-open-source-models

Why does training on insecure code make models broadly misaligned?

June 17, 2025

why-does-training-on-insecure-code-make-models-broadly-misaligned

Why does training on insecure code make models broadly misaligned?

why-does-training-on-insecure-code-make-models-broadly-misaligned

Preference Learning with Lie Detectors can Induce Honesty or Evasion

June 5, 2025

preference-learning-with-lie-detectors-can-induce-honesty-or-evasion

Avoiding AI Deception: Lie Detectors can either Induce Honesty or Evasion

avoiding-ai-deception

Illusory Safety: Redteaming DeepSeek R1 and the Strongest Fine-Tunable Models of OpenAI, Anthropic, and Google

February 4, 2025

illusory-safety-redteaming-deepseek-r1-and-the-strongest-fine-tunable-models-of-openai-anthropic-and-google

Illusory Safety: Redteaming DeepSeek R1 and the Strongest Fine-Tunable Models of OpenAI, Anthropic, and Google

illusory-safety-redteaming-deepseek-r1-and-the-strongest-fine-tunable-models-of-openai-anthropic-and-google

publications:

The Safety Gap Toolkit: Evaluating Hidden Dangers of Open-Source Models

July 8, 2025

Research

Our research explores a portfolio
of high-potential agendas.

Events

Our events bring together
global leaders in AI.

Programs

Our programs build the field of trustworthy and secure AI

Research

Our research explores a portfolio
of high-potential agendas.

Events

Our events bring together
global leaders in AI.

Programs

Our programs build the field of trustworthy and secure AI