Yawen Duan

NEWs & publications

February 4, 2025

Illusory Safety: Redteaming DeepSeek R1 and the Strongest Fine-Tunable Models of OpenAI, Anthropic, and Google

February 4, 2025

illusory-safety-redteaming-deepseek-r1-and-the-strongest-fine-tunable-models-of-openai-anthropic-and-google

Jailbreak-Tuning: Models Efficiently Learn Jailbreak Susceptibility

jailbreak-tuning-models-efficiently-learn-jailbreak-susceptibility

Illusory Safety: Redteaming DeepSeek R1 and the Strongest Fine-Tunable Models of OpenAI, Anthropic, and Google

February 4, 2025

illusory-safety-redteaming-deepseek-r1-and-the-strongest-fine-tunable-models-of-openai-anthropic-and-google

Illusory Safety: Redteaming DeepSeek R1 and the Strongest Fine-Tunable Models of OpenAI, Anthropic, and Google

illusory-safety-redteaming-deepseek-r1-and-the-strongest-fine-tunable-models-of-openai-anthropic-and-google

Adversarial Policies Beat Superhuman Go AIs

January 9, 2023

adversarial-policies-beat-superhuman-go-ais

Beyond the Board: Exploring AI Robustness Through Go

beyond-the-board-exploring-ai-robustness-through-go

No studies available yet.

Our research explores a portfolio
of high-potential agendas.

Our events bring together
global leaders in AI.

Our programs build the field of trustworthy and secure AI

Our research explores a portfolio
of high-potential agendas.

Our events bring together
global leaders in AI.

Our programs build the field of trustworthy and secure AI