Subversion Strategy Eval: Can language models statelessly strategize to subvert control protocols?

Summary

Charlie Griffin, Alex Mallen

SESSION Transcript