Kellin Pelrine

Member of Technical Staff

Kellin Pelrine leads FAR.AI's Integrity team, which aims to make AI both trustworthy and secure. The team works to uncover, understand, and solve AI risks across misuse, misalignment, and loss of control. Some of their current agendas span red-teaming, tampering, persuasion, evaluation, and demos. The Integrity team has discovered vulnerabilities and helped improve safeguards of models from every frontier AI company.
Previously, Kellin co-founded multiple startups, completed a PhD in AI at McGill University and the Mila AI Institute, and completed master’s degrees in Economics (Yale University) and Applied Mathematics (University of Colorado Boulder).

NEWs & publications

Frontier LLMs Attempt to Persuade into Harmful Topics

August 21, 2025

attempt-to-persuade-eval

It's the Thought that Counts: Evaluating the Attempts of Frontier LLMs to Persuade on Harmful Topics

its-the-thought-that-counts-evaluating-the-attempts-of-frontier-llms-to-persuade-on-harmful-topics

Illusory Safety: Redteaming DeepSeek R1 and the Strongest Fine-Tunable Models of OpenAI, Anthropic, and Google

February 4, 2025

illusory-safety-redteaming-deepseek-r1-and-the-strongest-fine-tunable-models-of-openai-anthropic-and-google

Jailbreak-Tuning: Models Efficiently Learn Jailbreak Susceptibility

jailbreak-tuning-models-efficiently-learn-jailbreak-susceptibility

GPT-4o Guardrails Gone: Data Poisoning & Jailbreak-Tuning

October 31, 2024

gpt-4o-guardrails-gone-data-poisoning-jailbreak-tuning

Data Poisoning in LLMs: Jailbreak-Tuning and Scaling Laws

scaling-laws-for-data-poisoning-in-llms

Beyond the Board: Exploring AI Robustness Through Go

June 18, 2024

beyond-the-board-exploring-ai-robustness-through-go

Can Go AIs be adversarially robust?

can-go-ais-be-adversarially-robust

We Found Exploits in GPT-4’s Fine-tuning & Assistants APIs

December 21, 2023

we-found-exploits-in-gpt-4s-fine-tuning-assistants-apis

Exploiting Novel GPT-4 APIs

exploiting-novel-gpt-4-apis

Even Superhuman Go AIs Have Surprising Failure Modes

July 15, 2023

even-superhuman-go-ais-have-surprising-failure-modes

Adversarial Policies Beat Superhuman Go AIs

adversarial-policies-beat-superhuman-go-ais

Jailbreak-Tuning: Models Efficiently Learn Jailbreak Susceptibility

July 15, 2025

jailbreak-tuning-models-efficiently-learn-jailbreak-susceptibility

Accidental Misalignment: Fine-Tuning Language Models Induces Unexpected Vulnerability

May 22, 2025

accidental-misalignment-fine-tuning-language-models-induces-unexpected-vulnerability

It's the Thought that Counts: Evaluating the Attempts of Frontier LLMs to Persuade on Harmful Topics

June 3, 2025

its-the-thought-that-counts-evaluating-the-attempts-of-frontier-llms-to-persuade-on-harmful-topics

Frontier LLMs Attempt to Persuade into Harmful Topics

attempt-to-persuade-eval

Illusory Safety: Redteaming DeepSeek R1 and the Strongest Fine-Tunable Models of OpenAI, Anthropic, and Google

February 4, 2025

illusory-safety-redteaming-deepseek-r1-and-the-strongest-fine-tunable-models-of-openai-anthropic-and-google

Illusory Safety: Redteaming DeepSeek R1 and the Strongest Fine-Tunable Models of OpenAI, Anthropic, and Google

illusory-safety-redteaming-deepseek-r1-and-the-strongest-fine-tunable-models-of-openai-anthropic-and-google

Data Poisoning in LLMs: Jailbreak-Tuning and Scaling Laws

August 6, 2024

scaling-laws-for-data-poisoning-in-llms

GPT-4o Guardrails Gone: Data Poisoning & Jailbreak-Tuning

gpt-4o-guardrails-gone-data-poisoning-jailbreak-tuning

Can Go AIs be adversarially robust?

June 18, 2024

can-go-ais-be-adversarially-robust

Even Superhuman Go AIs Have Surprising Failure Modes

even-superhuman-go-ais-have-surprising-failure-modes

Exploiting Novel GPT-4 APIs

December 21, 2023

exploiting-novel-gpt-4-apis

We Found Exploits in GPT-4’s Fine-tuning & Assistants APIs

we-found-exploits-in-gpt-4s-fine-tuning-assistants-apis

Adversarial Policies Beat Superhuman Go AIs

January 9, 2023

adversarial-policies-beat-superhuman-go-ais

Beyond the Board: Exploring AI Robustness Through Go

beyond-the-board-exploring-ai-robustness-through-go

publications:

Jailbreak-Tuning: Models Efficiently Learn Jailbreak Susceptibility

July 15, 2025

Accidental Misalignment: Fine-Tuning Language Models Induces Unexpected Vulnerability

May 22, 2025

Research

Our research explores a portfolio
of high-potential agendas.

Events

Our events bring together
global leaders in AI.

Programs

Our programs build the field of trustworthy and secure AI

Research

Our research explores a portfolio
of high-potential agendas.

Events

Our events bring together
global leaders in AI.

Programs

Our programs build the field of trustworthy and secure AI