Uncovering Latent Human Wellbeing in Language Model Embeddings

Abstract

Do language models implicitly learn a concept of human wellbeing? We explore this through the ETHICS Utilitarianism task, assessing if scaling enhances pretrained models’ representations. Our initial finding reveals that, without any prompt engineering or finetuning, the leading principal component from OpenAI’s text-embedding-ada-002 achieves 73.9% accuracy. This closely matches the 74.6% of BERT-large finetuned on the entire ETHICS dataset, suggesting pretraining conveys some understanding about human wellbeing. Next, we consider four language model families, observing how Utilitarianism accuracy varies with increased parameters. We find performance is nondecreasing with increased model size when using sufficient numbers of principal components.

ChengCheng Tan
ChengCheng Tan
Senior Communications Specialist

ChengCheng is a Senior Communications Specialist at FAR AI. She loves working with artificial intelligence, is passionate about lifelong learning, and is devoted to making technology education accessible to a wider audience. She brings over 20 years of experience consulting in UI/UX research, design and programming. At Stanford University, her MS in Computer Science specializing in Human Computer Interaction was advised by Terry Winograd and funded by the NSF. Prior to this, she studied computational linguistics at UCLA and graphic design at the Laguna College of Art + Design.

Adam Gleave
Adam Gleave
CEO and President of the Board

Adam Gleave is the CEO of FAR. He completed his PhD in artificial intelligence (AI) at UC Berkeley, advised by Stuart Russell. His goal is to develop techniques necessary for advanced automated systems to verifiably act according to human preferences, even in situations unanticipated by their designer. He is particularly interested in improving methods for value learning, and robustness of deep RL. For more information, visit his website.

Scott Emmons
Scott Emmons
PhD Candidate

Scott Emmons is a PhD candidate in the Department of Electrical Engineering and Computer Sciences at the University of California, Berkeley. He is advised by Stuart Russell and works with the Center for Human-Compatible AI to help ensure that increasingly powerful artificial intelligence systems are robustly beneficial. For more information, visit his website.