Dillon Bowen

Research Scientist

Dillon Bowen is a Research Scientist at FAR.AI focused on understanding catastrophic risks from frontier AI models.

He completed his PhD in Decision Processes at the Wharton School of Business under Philip Tetlock, focusing on statistics, experiment design, and forecasting. Dillon was previously the principal data scientist at a London-based startup and collaborated on AI safety research through ML Alignment and Theory Scholars (MATS) and UC Berkeley’s Center for Human-Compatible AI (CHAI).

NEWs & publications

Mind the Mitigation Gap: Why AI Companies Must Report Both Pre- and Post-Mitigation Safety Evaluations
May 23, 2025
mind-the-mitigation-gap
AI Companies Should Report Pre- and Post-Mitigation Safety Evaluations
ai-companies-should-report-pre--and-post-mitigation-safety-evaluations
​​A Toolkit for Estimating the Safety-Gap between Safety Trained and Helpful Only LLMs
May 22, 2025
a-toolkit-for-estimating-the-safety-gap-between-safety-trained-and-helpful-only-llms
​​A Toolkit for Estimating the Safety-Gap between Safety Trained and Helpful Only LLMs
a-toolkit-for-estimating-the-safety-gap-between-safety-trained-and-helpful-only-llms
Illusory Safety: Redteaming DeepSeek R1 and the Strongest Fine-Tunable Models of OpenAI, Anthropic, and Google
February 4, 2025
illusory-safety-redteaming-deepseek-r1-and-the-strongest-fine-tunable-models-of-openai-anthropic-and-google
GPT-4o Guardrails Gone: Data Poisoning & Jailbreak-Tuning
October 31, 2024
gpt-4o-guardrails-gone-data-poisoning-jailbreak-tuning
Data Poisoning in LLMs: Jailbreak-Tuning and Scaling Laws
scaling-laws-for-data-poisoning-in-llms
​​A Toolkit for Estimating the Safety-Gap between Safety Trained and Helpful Only LLMs
May 22, 2025
a-toolkit-for-estimating-the-safety-gap-between-safety-trained-and-helpful-only-llms
​​A Toolkit for Estimating the Safety-Gap between Safety Trained and Helpful Only LLMs
a-toolkit-for-estimating-the-safety-gap-between-safety-trained-and-helpful-only-llms
Illusory Safety: Redteaming DeepSeek R1 and the Strongest Fine-Tunable Models of OpenAI, Anthropic, and Google
February 4, 2025
illusory-safety-redteaming-deepseek-r1-and-the-strongest-fine-tunable-models-of-openai-anthropic-and-google
Illusory Safety: Redteaming DeepSeek R1 and the Strongest Fine-Tunable Models of OpenAI, Anthropic, and Google
illusory-safety-redteaming-deepseek-r1-and-the-strongest-fine-tunable-models-of-openai-anthropic-and-google
AI Companies Should Report Pre- and Post-Mitigation Safety Evaluations
March 17, 2025
ai-companies-should-report-pre--and-post-mitigation-safety-evaluations
Data Poisoning in LLMs: Jailbreak-Tuning and Scaling Laws
August 6, 2024
scaling-laws-for-data-poisoning-in-llms
GPT-4o Guardrails Gone: Data Poisoning & Jailbreak-Tuning
gpt-4o-guardrails-gone-data-poisoning-jailbreak-tuning