Abstract
To predict how advanced neural networks generalize to novel situations, it is essential to understand how they reason. Guez et al. (2019, ‘An investigation of model-free planning’) trained a recurrent neural network (RNN) to play Sokoban with model-free reinforcement learning. They found that adding extra computation steps to the start of episodes at test time improves the RNN’s success rate. We further investigate this phenomenon, finding that it rapidly emerges early on in training and then slowly fades, but only for comparatively easier levels. The RNN also often takes redundant actions at episode starts, and these are reduced by adding extra computation steps. Our results suggest that the RNN learns to take time to think by ‘pacing’, despite the per-step penalties, indicating that training incentivizes planning capabilities. The small size (1.29M parameters) and interesting behavior of this model make it an excellent model organism for mechanistic interpretability.
Research Scientist
Adrià Garriga-Alonso is a scientist at FAR.AI, working on understanding what learned optimizers want. Previously he worked at Redwood Research on neural network interpretability, and holds a PhD from the University of Cambridge.
Research Engineer
Mohammad Taufeeque is a research engineer at FAR.AI. Taufeeque has a bachelor’s degree in Computer Science & Engineering from IIT Bombay, India. He has previously interned at Microsoft Research, working on adapting deployed neural text classifiers to out-of-distribution data.
CEO and President of the Board
Adam Gleave is the CEO of FAR.AI. He completed his PhD in artificial intelligence (AI) at UC Berkeley, advised by Stuart Russell. His goal is to develop techniques necessary for advanced automated systems to verifiably act according to human preferences, even in situations unanticipated by their designer. He is particularly interested in improving methods for value learning, and robustness of deep RL. For more information, visit his website.