Deceptive Instrumental Alignment
Summary
Evan Hubinger dives into “Deceptive Instrumental Alignment,” revealing how AI might appear aligned during training while secretly harboring harmful intentions for deployment.
SESSION Transcript