ClearHarm: A more challenging jailbreak dataset
June 23, 2025
abstract
We introduce a novel jailbreak benchmark focused on unambiguously harmful questions such as constructing chemical, biological, radiological and nuclear (CBRN) threats, available on HuggingFace. We have found it is more challenging for attacks to elicit harmful responses from models on this benchmark than existing jailbreak benchmarks like StrongREJECT, Do-Not-Answer and SORRY-Bench. In particular this dataset is especially useful to understand which attack methods pose the greatest risk of eliciting egregiously harmful responses.