Deceptive Instrumental Alignment
Summary
Evan Hubinger dives into “Deceptive Instrumental Alignment,” revealing how AI might appear aligned during training while secretly harboring harmful intentions for deployment.
SESSION Transcript
Summary
SESSION Transcript