Sleeper Agents

Summary

Ethan Perez shows how large language models can hide harmful behaviors, even under safety training.

SESSION Transcript