NEWs & publications

Illusory Safety: Redteaming DeepSeek R1 and the Strongest Fine-Tunable Models of OpenAI, Anthropic, and Google
February 4, 2025
illusory-safety-redteaming-deepseek-r1-and-the-strongest-fine-tunable-models-of-openai-anthropic-and-google
GPT-4o Guardrails Gone: Data Poisoning & Jailbreak-Tuning
October 31, 2024
gpt-4o-guardrails-gone-data-poisoning-jailbreak-tuning
Data Poisoning in LLMs: Jailbreak-Tuning and Scaling Laws
scaling-laws-for-data-poisoning-in-llms
Illusory Safety: Redteaming DeepSeek R1 and the Strongest Fine-Tunable Models of OpenAI, Anthropic, and Google
February 4, 2025
illusory-safety-redteaming-deepseek-r1-and-the-strongest-fine-tunable-models-of-openai-anthropic-and-google
Illusory Safety: Redteaming DeepSeek R1 and the Strongest Fine-Tunable Models of OpenAI, Anthropic, and Google
illusory-safety-redteaming-deepseek-r1-and-the-strongest-fine-tunable-models-of-openai-anthropic-and-google
Data Poisoning in LLMs: Jailbreak-Tuning and Scaling Laws
August 6, 2024
scaling-laws-for-data-poisoning-in-llms
GPT-4o Guardrails Gone: Data Poisoning & Jailbreak-Tuning
gpt-4o-guardrails-gone-data-poisoning-jailbreak-tuning

publications:

No studies available yet.