NEWs & publications
No items found.
Open Problems in Mechanistic Interpretability
January 27, 2025
open-problems-in-mechanistic-interpretability
Pretraining Language Models with Human Preferences
February 16, 2023
pretraining-language-models-with-human-preferences
RL with KL penalties is better viewed as Bayesian inference
August 8, 2022
rl-with-kl-penalties-is-better-viewed-as-bayesian-inference