Deceptive Instrumental Alignment

Summary

Evan Hubinger dives into “Deceptive Instrumental Alignment,” revealing how AI might appear aligned during training while secretly harboring harmful intentions for deployment.

SESSION Transcript