Research

FAR.Research delivers technical breakthroughs to improve the safety and security of frontier AI systems.

Complex problems demand complex solutions.

FAR.AI conducts research to address fundamental artificial intelligence (AI) safety challenges. We rapidly explore a diverse portfolio of technical research agendas, de-risking and scaling up only the most promising solutions. We share our research outputs through peer-reviewed publications, via partnerships with governmental AI safety institutes, and through red-teaming engagements for leading AI companies.

As AI systems become more powerful and integrated into society, it’s critical to ensure they remain trustworthy, secure, and aligned with human interests. The stakes are high, yet currently, there are no known techniques capable of providing high-confidence guarantees of safety for frontier AI systems.

FAR.Research is dedicated to delivering the novel technical breakthroughs needed to mitigate the potential risks posed by frontier AI. As a non-profit research institute, we leverage our unique flexibility to focus on critical research directions that may be too large or resource-intensive for academia and often overlooked by the commercial sector due to their lack of immediate profitability.

Our Impact

We drive change through incubating research, scaling safety solutions, and informing policy.

Incubating

We derisk and develop innovative solutions to trustworthy & secure AI. Through incubating research, we share key insights, research roadmaps, and tools needed for the broader research community to identify and make progress.

Scaling

We scale up the most promising safety solutions via in-house research, external collaborations, and targeted grantmaking. We facilitate rapid adoption of our findings by working with frontier model developers through red-teaming and other exercises.

Informing

Our research provides expert insights informing policy and public discussion. Our work has been cited in congressional testimony and mainstream media. In this way, we contribute to the establishment of technical standards that guide the development of AI.

Research highlights

Our research is focused on key insights to fundamental risks from advanced AI. This is an abridged selection of FAR.AI’s latest and most notable research.

Large language models (LLMs) are already more persuasive than humans in many domains. While this power can be used for good, like helping people quit smoking, it also presents significant risks, such as large-scale political manipulation, disinformation, or terrorism recruitment. But how easy is it to get frontier models to persuade into harmful beliefs or illegal actions? Really easy – just ask them.

We tested the effectiveness of "defense-in-depth" AI safety strategies, where multiple layers of filters are used to prevent AI models from generating harmful content. Our a new attack method, STACK, bypasses defenses layer-by-layer and achieved a 71% success rate on catastrophic risk scenarios wher conventional attacks achieved 0% success against these multi-layered defenses. Our findings highlight that multi-layer AI defenses, while valuable, have significant vulnerabilities when facing attacks specifically designed to penetrate multiple defensive layers sequentially.

Achieving robustness remains a significant challenge even in narrow domains like Go. We test three approaches to defend Go AIs from adversarial strategies. We find these defenses protect against previously discovered adversaries, but uncover qualitatively new adversaries that undermine these defenses.

Our research portfolio covers three core agendas.

Robustness

Will superhuman AI systems be reliable and secure?

Model Evaluation

How can we tell if advanced AI systems are trustworthy and secure?

Interpretability

How can we understand an AI’s internals?