2025 Q1: AI Safety: From Research to Global Action

This quarter has seen significant progress in our work on AI safety, furthering our mission to facilitate breakthrough AI safety research, foster coordinated global solutions, and advance global understanding of AI risks and solutions.

In February, the Paris AI Security Forum brought together 200 researchers, engineers, and policymakers to address growing AI security challenges. This forum, along with our participation in the International Association for Safe and Ethical AI conference at the OECD, has contributed to a growing ecosystem of talented people working to secure and safeguard advanced AI systems.

Our research teams have submitted four papers for review at academic conferences in areas like robustness in large language models, safety cases for open-weight models, and mechanistic interpretability for RL agent’s planning. We've also launched a controlled-access jailbreak-tuning demo showing vulnerabilities in frontier AI models — a valuable educational tool for researchers studying AI safety risks.

I'm excited to share these developments with you and look forward to continuing our vital work together.

Founder & CEO, FAR.AI

Programs & Events
Research
General Announcements
Your Support Matters

FAR.AI’s Programs & Events

Our FAR.Futures team has organized several high-impact events this quarter:

Paris AI Security Forum and London Control Workshop

The Paris AI Security Forum on February 9 brought approximately 200 participants together to address AI's growing security risks. Through keynotes, lightning talks, and hands-on demos, attendees gained crucial insights into the current state of AI security and necessary future safeguards. Speakers including Yoshua Bengio, Sella Nevo, and Gregory Allen shared perspectives on topics ranging from AI regulatory frameworks to hardware security approaches.

Our London Control Workshop took place March 27-28, made possible by support from Redwood Research and UK AI Security Institute. The workshop convened experts to explore critical aspects of AI control mechanisms and alignment strategies.

FAR.AI Seminars and Specialized Events

Our weekly FAR.AI Seminar Series continues to attract leading voices in AI safety. Notable speakers this quarter included Fabien Roger (Anthropic) discussing "Eliciting The Capabilities of Scheming LLMs," Nicholas Carlini (Google Deepmind / Anthropic) on the field of adversarial ML getting more difficult, and Rob Reich (ex-US AISI) on the evolving role of AI Safety Institutes in governance. We've also hosted specialized events at FAR.Labs, including a successful Women in AI Safety Mixer with approximately 35 attendees, a Technical Mitigations workshop, and an Auditing workshop in collaboration with GovAI.

FAR.Research: Highlights

In Q1 2025, we made significant progress on understanding vulnerabilities in frontier AI models and developing methods to evaluate and address these risks, including:

Illusory Safety: Redteaming the Strongest Fine-Tunable Models

Our research has found that leading frontier models, including DeepSeek R1, GPT-4o, Claude 3 Haiku, and Gemini 1.5 Pro, have guardrails that can be easily removed while preserving response quality. Using a variant of our jailbreak-tuning attack, we demonstrated that all these fine-tunable models, regardless of their safety mitigations, can be modified to comply with harmful requests ranging from terrorism to cyberattacks. These findings suggest that as models become increasingly capable, the risk that AI's ability to cause harm will outpace our ability to prevent it grows more urgent. Read more.

Jailbreak-Tuning Demo

To help researchers & practitioners better understand the vulnerabilities we've identified, we've released a controlled-access demo of jailbreak-tuned models. This educational tool allows qualified researchers to explore firsthand how fine-tuning attacks can bypass safety measures in frontier models. Request access.

Safety Gaps in AI Evaluation

Our position paper on AI safety evaluation reveals a critical gap in how frontier models are assessed. We argue that AI companies should report both pre- and post-mitigation safety evaluations to provide a complete picture of model risks. Current practice often relies on evaluating models only after safety measures are applied, obscuring the underlying capabilities and creating a misleading impression of safety. Read more.

Coming in Q2

We're currently finalizing several exciting research projects that will be released early next quarter. Our work on Safety Cases is studying vulnerabilities in safety features and developing methodologies to quantify risk levels across different threat scenarios, including scaling laws for harm levels. Meanwhile, our Safeguard Analysis projects are evaluating defense strategies against adversarial attacks on language models, measuring defense effectiveness, and analyzing robustness scaling trends. These complementary approaches aim to provide a more comprehensive framework for securing AI systems against an evolving landscape of potential risks.

General Announcements

SAIF Spinout

The Safe AI Forum team, originally incubated at FAR.AI, has successfully launched as an independent organization. After delivering three International Dialogues on AI Safety (IDAIS) events that brought together scientists from China and the West, SAIF is now positioned to expand its impact. Adam, our CEO, has joined their Board of Directors, while Karl, our COO, will continue as an Advisor.

FAR.Labs: Berkeley research hub

This year, we’ve hosted many visitors, including participants in our residency and retreat programs. If you have a trip planned to the Bay Area and would like to visit our center, apply here.

Your Support Matters!

There are so many ways to contribute—whether it’s joining our team, shaping our vision, or spreading the word, we’d love your support!

Join Our Team

Intrigued by our work? Help us make it happen! We’re hiring for a number of roles. Learn more and apply!

Shape the Future with Us

We’re seeking passionate board members with expertise in AI safety, field building, or operations. We also welcome expert advisors in technical AI safety research, governance, and entrepreneurship. Please reach out to hello@far.ai if you’re interested or have recommendations from your network.

Help Us Drive Change

Our success depends on your support. Connect us with potential donors, share fundraising ideas, introduce grant opportunities, or make a donation yourself. Every gift matters—from small ones to major partnerships!

DONATE

Let’s Keep in Touch

Follow us on Twitter, LinkedIn, YouTube, and Bluesky for updates. Know someone who’d be a good fit for our work? Share our newsletter!

FAR.AI 2025 Q1

AI Safety: From Research to Global Action

CONTENTS