Kellin Pelrine

PhD Candidate | McGill University

Kellin Pelrine is a research scientist at FAR.AI and a PhD candidate at McGill University and the Mila institute. Kellin leads work towards making AI a transformatively positive force (instead of a potentially catastrophically negative one) on our ability to find reliable information and build knowledge. He also leads projects on exposing, understanding, and solving vulnerabilities of frontier models. Kellin collaborates with Adam Gleave at FAR.AI, and previously worked with us as a Research Scientist Intern.

NEWs & publications

Illusory Safety: Redteaming DeepSeek R1 and the Strongest Fine-Tunable Models of OpenAI, Anthropic, and Google
February 4, 2025
illusory-safety-redteaming-deepseek-r1-and-the-strongest-fine-tunable-models-of-openai-anthropic-and-google
GPT-4o Guardrails Gone: Data Poisoning & Jailbreak-Tuning
October 31, 2024
gpt-4o-guardrails-gone-data-poisoning-jailbreak-tuning
Data Poisoning in LLMs: Jailbreak-Tuning and Scaling Laws
scaling-laws-for-data-poisoning-in-llms
Beyond the Board: Exploring AI Robustness Through Go
December 21, 2023
beyond-the-board-exploring-ai-robustness-through-go
Can Go AIs be adversarially robust?
can-go-ais-be-adversarially-robust
We Found Exploits in GPT-4’s Fine-tuning & Assistants APIs
December 21, 2023
we-found-exploits-in-gpt-4s-fine-tuning-assistants-apis
Exploiting Novel GPT-4 APIs
exploiting-novel-gpt-4-apis
Even Superhuman Go AIs Have Surprising Failure Modes
July 15, 2023
even-superhuman-go-ais-have-surprising-failure-modes
Adversarial Policies Beat Superhuman Go AIs
adversarial-policies-beat-superhuman-go-ais
Illusory Safety: Redteaming DeepSeek R1 and the Strongest Fine-Tunable Models of OpenAI, Anthropic, and Google
February 4, 2025
illusory-safety-redteaming-deepseek-r1-and-the-strongest-fine-tunable-models-of-openai-anthropic-and-google
Illusory Safety: Redteaming DeepSeek R1 and the Strongest Fine-Tunable Models of OpenAI, Anthropic, and Google
illusory-safety-redteaming-deepseek-r1-and-the-strongest-fine-tunable-models-of-openai-anthropic-and-google
Data Poisoning in LLMs: Jailbreak-Tuning and Scaling Laws
August 6, 2024
scaling-laws-for-data-poisoning-in-llms
GPT-4o Guardrails Gone: Data Poisoning & Jailbreak-Tuning
gpt-4o-guardrails-gone-data-poisoning-jailbreak-tuning
Can Go AIs be adversarially robust?
June 18, 2024
can-go-ais-be-adversarially-robust
Even Superhuman Go AIs Have Surprising Failure Modes
even-superhuman-go-ais-have-surprising-failure-modes
Exploiting Novel GPT-4 APIs
December 21, 2023
exploiting-novel-gpt-4-apis
We Found Exploits in GPT-4’s Fine-tuning & Assistants APIs
we-found-exploits-in-gpt-4s-fine-tuning-assistants-apis
Adversarial Policies Beat Superhuman Go AIs
January 9, 2023
adversarial-policies-beat-superhuman-go-ais
Beyond the Board: Exploring AI Robustness Through Go
beyond-the-board-exploring-ai-robustness-through-go

publications:

No studies available yet.