Kellin Pelrine

Research Scientist

Kellin Pelrine is a Research Scientist at FAR.AI, where he leads cross-functional teams focused on building trustworthy and secure AI. Some of his projects include measuring and mitigating the risks of AI-driven persuasion and manipulation, uncovering and addressing vulnerabilities in frontier model safeguards such as jailbreak-tuning, and creating interactive demos that allow non-technical audiences to engage directly with AI security challenges.

Previously, Kellin co-founded multiple startups and completed master’s degrees in Economics (Yale University) and Applied Mathematics (University of Colorado Boulder). He will complete his PhD at McGill University and the Mila AI Institute in summer 2025.

NEWs & publications

Illusory Safety: Redteaming DeepSeek R1 and the Strongest Fine-Tunable Models of OpenAI, Anthropic, and Google
February 4, 2025
illusory-safety-redteaming-deepseek-r1-and-the-strongest-fine-tunable-models-of-openai-anthropic-and-google
GPT-4o Guardrails Gone: Data Poisoning & Jailbreak-Tuning
October 31, 2024
gpt-4o-guardrails-gone-data-poisoning-jailbreak-tuning
Data Poisoning in LLMs: Jailbreak-Tuning and Scaling Laws
scaling-laws-for-data-poisoning-in-llms
Beyond the Board: Exploring AI Robustness Through Go
December 21, 2023
beyond-the-board-exploring-ai-robustness-through-go
Can Go AIs be adversarially robust?
can-go-ais-be-adversarially-robust
We Found Exploits in GPT-4’s Fine-tuning & Assistants APIs
December 21, 2023
we-found-exploits-in-gpt-4s-fine-tuning-assistants-apis
Exploiting Novel GPT-4 APIs
exploiting-novel-gpt-4-apis
Even Superhuman Go AIs Have Surprising Failure Modes
July 15, 2023
even-superhuman-go-ais-have-surprising-failure-modes
Adversarial Policies Beat Superhuman Go AIs
adversarial-policies-beat-superhuman-go-ais
Illusory Safety: Redteaming DeepSeek R1 and the Strongest Fine-Tunable Models of OpenAI, Anthropic, and Google
February 4, 2025
illusory-safety-redteaming-deepseek-r1-and-the-strongest-fine-tunable-models-of-openai-anthropic-and-google
Illusory Safety: Redteaming DeepSeek R1 and the Strongest Fine-Tunable Models of OpenAI, Anthropic, and Google
illusory-safety-redteaming-deepseek-r1-and-the-strongest-fine-tunable-models-of-openai-anthropic-and-google
Data Poisoning in LLMs: Jailbreak-Tuning and Scaling Laws
August 6, 2024
scaling-laws-for-data-poisoning-in-llms
GPT-4o Guardrails Gone: Data Poisoning & Jailbreak-Tuning
gpt-4o-guardrails-gone-data-poisoning-jailbreak-tuning
Can Go AIs be adversarially robust?
June 18, 2024
can-go-ais-be-adversarially-robust
Even Superhuman Go AIs Have Surprising Failure Modes
even-superhuman-go-ais-have-surprising-failure-modes
Exploiting Novel GPT-4 APIs
December 21, 2023
exploiting-novel-gpt-4-apis
We Found Exploits in GPT-4’s Fine-tuning & Assistants APIs
we-found-exploits-in-gpt-4s-fine-tuning-assistants-apis
Adversarial Policies Beat Superhuman Go AIs
January 9, 2023
adversarial-policies-beat-superhuman-go-ais
Beyond the Board: Exploring AI Robustness Through Go
beyond-the-board-exploring-ai-robustness-through-go

publications:

No studies available yet.