News
Publications
Labs
Events
Jobs
Donate
About
Team
SAIF
Transparency
Contact Us
Drake Thomas
Latest
Catastrophic Goodhart: regularizing RLHF with KL divergence does not mitigate heavy-tailed reward misspecification
Cite
×