ClearHarm: A more challenging jailbreak dataset

June 23, 2025

abstract

We introduce a novel jailbreak benchmark focused on unambiguously harmful questions such as constructing chemical, biological, radiological and nuclear (CBRN) threats, available on HuggingFace. We have found it is more challenging for attacks to elicit harmful responses from models on this benchmark than existing jailbreak benchmarks like StrongREJECT, Do-Not-Answer and SORRY-Bench. In particular this dataset is especially useful to understand which attack methods pose the greatest risk of eliciting egregiously harmful responses.

Research

Our research explores a portfolio
of high-potential agendas.

Events

Our events bring together
global leaders in AI.

Programs

Our programs build the field of trustworthy and secure AI