FAR.AI

In late May, I had the pleasure of speaking at our inaugural Technical Innovations for AI Policy Conference in Washington, D.C. It was wonderful to see technical experts, researchers, and policymakers come together to tackle challenges that seemed intractable not too long ago.

This spirit of collaboration was equally evident at our Singapore Alignment Workshop. Through more than 40 presentations on critical vulnerabilities, it became clear that there is a growing, diverse, and committed community rising to meet the complex global challenges presented by advanced AI. In his keynote, Yoshua Bengio reminded everyone of the stakes involved by sharing his personal concerns for his children alongside a call for technical safeguards.

As AI capabilities continue to progress rapidly, our research team has continued to investigate critical questions including how robust defense-in-depth pipelines really are, using lie detectors to teach AI honesty, and reverse-engineering neural networks to understand how AI agents plan.

It is our hope that with rigorous research identifying problems, practical solutions to technical and governance issues, and the strategic partnerships we’re fostering, we can achieve a future where powerful AI systems benefit everyone. Thank you for being part of this journey with us.

Founder & CEO, FAR.AI

Programs & Events
Research
Announcements & Opportunities

FAR.AI’s Programs & Events

Technical Innovations for AI Policy Conference

"I hope that today this divide can end, that we can bury the hatchet and forge a new alliance."

— Mark Beall, President of the AI Policy Network

From May 31 to June 1, we brought over 150 technical experts, researchers, and policymakers to Washington, D.C. for the Technical Innovations for AI Policy Conference, which we held in collaboration with FAI, CNAS, and RAND. Key speakers included Congressman Bill Foster, Saif Khan, Brad Carson, Helen Toner, Samuel Hammond, Mark Beall, Ben Buchanan, Miranda Bogen, Irene Solaiman, and Alex Bores. Read more.

Full playlist

FAR.Research: Highlights

Do lie detectors make AIs more honest or better liars?

Humans can prefer pleasant lies over harsh realities, and so AI systems have learned to lie to us. But developers have also created lie detectors with high accuracy. Can we use these lie detectors to train AI to be honest? Find out here.

How robust are defense-in-depth pipelines?

Leading AI companies are increasingly layering multiple safety checks to prevent misuse, a strategy known as "defense-in-depth". But what happens when we specifically design an attack to penetrate multi-layered defenses? Find out here.

What did we learn from red teaming?

On the day Claude Opus 4 was released, our Research Engineer Ian McKenzie developed an attack inspired by the above work to bypass Claude’s ASL-3 protections and generate detailed instructions on manufacturing a chemical weapon. Since then, we have refined this method to work in other domains. We’re in discussions with Anthropic to mitigate these vulnerabilities, and will share more after a responsible disclosure period. Read more.

Understanding how AI plans

Previously, we found evidence that an agent trained to play the puzzle game Sokoban was planning. We have now reverse-engineered how the network develops plans—the first mechanistic explanation of a network conducting learned planning. Read more.

Announcements & Opportunities

We’re growing

Last quarter, we welcomed Lars Yencken (Head of Engineering), Kellin Pelrine (Research Scientist), and James Collins (Research Engineer). Expanding the team allows us to tackle more projects, including AI-driven manipulation risks and vulnerabilities in frontier safeguards.

Work with us

Are you an events professional interested in shaping the future of AI safety through world-class conferences and workshops? We’re hiring a Senior Events Project Manager to lead events ranging from international conferences to intimate workshops. Looking for something else? Explore our other roles.

If you have expertise in AI safety, field building, or operations, please consider applying to join our Board. Email hello@far.ai.

Support AI safety and security

All of our work, from identifying problems to developing solutions and forging the strategic partnerships necessary for future implementation, relies on community support. Here are some ways you can get involved:

Connect us with potential donors
Share fundraising ideas
Introduce grant opportunities
Make a donation yourself

Donate

FAR.AI 2025 Q2

Building Bridges: From Research to Global Workshops

CONTENTS