NEWs & publications
Illusory Safety: Redteaming DeepSeek R1 and the Strongest Fine-Tunable Models of OpenAI, Anthropic, and Google
                      February 4, 2025
                      illusory-safety-redteaming-deepseek-r1-and-the-strongest-fine-tunable-models-of-openai-anthropic-and-google
                      Jailbreak-Tuning: Models Efficiently Learn Jailbreak Susceptibility
                        jailbreak-tuning-models-efficiently-learn-jailbreak-susceptibility
                      GPT-4o Guardrails Gone: Data Poisoning & Jailbreak-Tuning
                      October 31, 2024
                      gpt-4o-guardrails-gone-data-poisoning-jailbreak-tuning
                      Data Poisoning in LLMs: Jailbreak-Tuning and Scaling Laws
                        scaling-laws-for-data-poisoning-in-llms
                      Jailbreak-Tuning: Models Efficiently Learn Jailbreak Susceptibility
                      July 15, 2025
                      jailbreak-tuning-models-efficiently-learn-jailbreak-susceptibility
                      Illusory Safety: Redteaming DeepSeek R1 and the Strongest Fine-Tunable Models of OpenAI, Anthropic, and Google
                      February 4, 2025
                      illusory-safety-redteaming-deepseek-r1-and-the-strongest-fine-tunable-models-of-openai-anthropic-and-google
                      Illusory Safety: Redteaming DeepSeek R1 and the Strongest Fine-Tunable Models of OpenAI, Anthropic, and Google
                        illusory-safety-redteaming-deepseek-r1-and-the-strongest-fine-tunable-models-of-openai-anthropic-and-google
                      Data Poisoning in LLMs: Jailbreak-Tuning and Scaling Laws
                      August 6, 2024
                      scaling-laws-for-data-poisoning-in-llms
                      GPT-4o Guardrails Gone: Data Poisoning & Jailbreak-Tuning
                        gpt-4o-guardrails-gone-data-poisoning-jailbreak-tuning