Dillon Bowen

Member of Technical Staff

Dillon Bowen is a Research Scientist at FAR.AI focused on understanding catastrophic risks from frontier AI models.

He completed his PhD in Decision Processes at the Wharton School of Business under Philip Tetlock, focusing on statistics, experiment design, and forecasting. Dillon was previously the principal data scientist at a London-based startup and collaborated on AI safety research through ML Alignment and Theory Scholars (MATS) and UC Berkeley’s Center for Human-Compatible AI (CHAI).

NEWs & publications

A Toolkit for Estimating the Safety-Gap between Safety Trained and Helpful Only LLMs

July 31, 2025

safety-gap-toolkit

The Safety Gap Toolkit: Evaluating Hidden Dangers of Open-Source Models

the-safety-gap-toolkit-evaluating-hidden-dangers-of-open-source-models

Illusory Safety: Redteaming DeepSeek R1 and the Strongest Fine-Tunable Models of OpenAI, Anthropic, and Google

February 4, 2025

illusory-safety-redteaming-deepseek-r1-and-the-strongest-fine-tunable-models-of-openai-anthropic-and-google

Jailbreak-Tuning: Models Efficiently Learn Jailbreak Susceptibility

jailbreak-tuning-models-efficiently-learn-jailbreak-susceptibility

GPT-4o Guardrails Gone: Data Poisoning & Jailbreak-Tuning

October 31, 2024

gpt-4o-guardrails-gone-data-poisoning-jailbreak-tuning

Data Poisoning in LLMs: Jailbreak-Tuning and Scaling Laws

scaling-laws-for-data-poisoning-in-llms

Jailbreak-Tuning: Models Efficiently Learn Jailbreak Susceptibility

July 15, 2025

jailbreak-tuning-models-efficiently-learn-jailbreak-susceptibility

The Safety Gap Toolkit: Evaluating Hidden Dangers of Open-Source Models

July 8, 2025

the-safety-gap-toolkit-evaluating-hidden-dangers-of-open-source-models

Illusory Safety: Redteaming DeepSeek R1 and the Strongest Fine-Tunable Models of OpenAI, Anthropic, and Google

February 4, 2025

illusory-safety-redteaming-deepseek-r1-and-the-strongest-fine-tunable-models-of-openai-anthropic-and-google

Illusory Safety: Redteaming DeepSeek R1 and the Strongest Fine-Tunable Models of OpenAI, Anthropic, and Google

illusory-safety-redteaming-deepseek-r1-and-the-strongest-fine-tunable-models-of-openai-anthropic-and-google

Data Poisoning in LLMs: Jailbreak-Tuning and Scaling Laws

August 6, 2024

scaling-laws-for-data-poisoning-in-llms

GPT-4o Guardrails Gone: Data Poisoning & Jailbreak-Tuning

gpt-4o-guardrails-gone-data-poisoning-jailbreak-tuning

publications:

Jailbreak-Tuning: Models Efficiently Learn Jailbreak Susceptibility

July 15, 2025

The Safety Gap Toolkit: Evaluating Hidden Dangers of Open-Source Models

July 8, 2025

Research

Our research explores a portfolio
of high-potential agendas.

Events

Our events bring together
global leaders in AI.

Programs

Our programs build the field of trustworthy and secure AI

Research

Our research explores a portfolio
of high-potential agendas.

Events

Our events bring together
global leaders in AI.

Programs

Our programs build the field of trustworthy and secure AI