NEWs & publications
Layered AI Defenses Have Holes: Vulnerabilities and Key Recommendations
July 2, 2025
defense-in-depth
STACK: Adversarial Attacks on LLM Safeguard Pipelines
stack-adversarial-attacks-on-llm-safeguard-pipelines
ClearHarm: A more challenging jailbreak dataset
June 23, 2025
clearharm-a-more-challenging-jailbreak-dataset
ClearHarm: A more challenging jailbreak dataset
clearharm-a-more-challenging-jailbreak-dataset
Does Robustness Improve with Scale?
July 23, 2024
does-robustness-improve-with-scale
Exploring Scaling Trends in LLM Robustness
exploring-scaling-trends-in-llm-robustness
Beyond the Board: Exploring AI Robustness Through Go
December 21, 2023
beyond-the-board-exploring-ai-robustness-through-go
Can Go AIs be adversarially robust?
can-go-ais-be-adversarially-robust
Even Superhuman Go AIs Have Surprising Failure Modes
July 15, 2023
even-superhuman-go-ais-have-surprising-failure-modes
Adversarial Policies Beat Superhuman Go AIs
adversarial-policies-beat-superhuman-go-ais
STACK: Adversarial Attacks on LLM Safeguard Pipelines
July 2, 2025
stack-adversarial-attacks-on-llm-safeguard-pipelines
Layered AI Defenses Have Holes: Vulnerabilities and Key Recommendations
defense-in-depth
ClearHarm: A more challenging jailbreak dataset
June 23, 2025
clearharm-a-more-challenging-jailbreak-dataset
ClearHarm: A more challenging jailbreak dataset
clearharm-a-more-challenging-jailbreak-dataset
Exploring Scaling Trends in LLM Robustness
July 26, 2024
exploring-scaling-trends-in-llm-robustness
Does Robustness Improve with Scale?
does-robustness-improve-with-scale
Can Go AIs be adversarially robust?
June 18, 2024
can-go-ais-be-adversarially-robust
Even Superhuman Go AIs Have Surprising Failure Modes
even-superhuman-go-ais-have-surprising-failure-modes
Inverse Scaling: When Bigger Isn't Better
June 15, 2023
inverse-scaling-when-bigger-isnt-better
Adversarial Policies Beat Superhuman Go AIs
January 9, 2023
adversarial-policies-beat-superhuman-go-ais
Beyond the Board: Exploring AI Robustness Through Go
beyond-the-board-exploring-ai-robustness-through-go