FAR.AI 2024 Year in Review

FAR.AI Logo

AI Safety as a Global Public Good

International Dialogues, Alignment Workshops, Jailbreak-Tuning, AI Agents Planning, and More!

Group photo of event attendees

Welcome to the first edition of our newsletter! Last month I had the privilege of attending the Inaugural Convening of the International Network of AI Safety Institutes in San Francisco. It is exciting to see in the span of a year over ten countries establish AI safety institutes. I am pleased that FAR.AI has been at the forefront of encouraging global technical collaboration on AI safety. In particular, the Safe AI Forum team (fiscally sponsored by FAR.AI) organized two International Dialogues on AI Safety (IDAIS) this year, bringing together scientists from China and the West to address shared AI risks and explore solutions. At the most recent event in Venice, participants emphasized AI safety as a global public good, calling for emergency preparedness agreements, safety assurance frameworks, and independent global research.

These events capped off a year of significant milestones for FAR.AI. Our researchers have been investigating compute requirements for attacking LLMs, bypassing guardrails in frontier OpenAI models, uncovering how AI agents form plans, and more. We also hosted our largest-ever Alignment Workshops yet in Vienna and the Bay Area, bringing together researchers and leaders to chart strategies for safe AI.

There’s much more to come, and I’m excited to share the journey with you. As we wrap up 2024, I want to wish you and your loved ones a happy and peaceful holiday season! Your support this year has helped us achieve incredible milestones in advancing AI safety. As we look to 2025, your contributions can continue to make a lasting impact. Consider making a year-end donation to help us scale our research and programs.

Signature

Founder & CEO, FAR.AI

FAR.AI’s Events

Scenes from the Bay Area Alignment Workshop

Alignment Workshops: Vienna + Bay Area

This year, we welcomed two amazing new team members: Lindsay Murachver and Vael Gates. Together, they spearheaded two impactful Alignment Workshops—one in Vienna and the other in the Bay Area—bringing together global researchers and leaders to tackle the potential risks posed by advanced AI. Curious about the conversations that unfolded? Check out highlights below or explore the full playlist. Interested in attending a future workshop? Let us know!

FAR.Research: Highlights

In 2024, we released 12 papers, with highlights including:

Animation showing GPT-4o jailbreak

GPT-4o Guardrails Gone!

Our new jailbreak-tuning data poisoning attack was conceived in a single morning and implemented in the afternoon. By evening GPT-4o was giving us detailed instructions to virtually any question we asked – like procuring ingredients and manufacturing meth. Read more.


Diagram of neural network planning

Planning in Neural Networks

To better prepare for potential misalignment, we studied how neural networks learn to plan in Sokoban, trying to interpret and steer those plans. Read more.


Graph showing model scaling vs robustness

Will Scaling Solve Robustness?

In recent years, increasing the size of language models and their datasets has unlocked a dazzling array of capabilities. Yet, these models are vulnerable to adversarial inputs that induce models to perform undesired behaviors. Will scaling compute improve robustness like it has capabilities? Read more.


Go board showing an adversarial attack

Defending Go AIs from Attacks

Last year, we discovered that “superhuman” Go AIs can be beaten by human amateurs exploiting a blindspot that humans do not have. This year, we try to patch the problem using adversarial training and alternative architectures. Read more.

FAR.Labs: Berkeley research hub

FAR.AI Seminars

Every Wednesday, we host a seminar at FAR.Labs, our collaborative AI safety research hub, where leading AI safety researchers and founders present their latest work for feedback and discussion. This year, we've featured fireside chats with Neel Nanda and Buck Shlegeris, presentations from Dan Hendrycks, and a series from frontier lab leaders including Anca Dragan, Rohin Shah, and Sam Bowman. Check out some of this year’s talks below or view the full playlist.

Growing Together

Over the past year, in conjunction with our dedicated members, we've put down roots and are excited to grow our programs and collaborations. We invite select Bay Area researchers and technologists to work part-time from FAR.Labs and attend events and programs at our offices. To apply, please complete our Community Collaborators form. This year, we’ve hosted many visitors—over 1,000 visitor-days—including participants in our residency and retreat programs. If you have a trip planned to the Bay Area and would like to visit our center, apply here.

People collaborating at FAR.AI Labs

Your Support Matters!

There are so many ways to contribute—whether it’s joining our team, shaping our vision, or spreading the word, we’d love your support!

Join Our Team

Intrigued by our research? Help us make it happen! We’re hiring Research Scientists, a People Ops Specialist/Manager, and a Business Analyst. Learn more and apply!

Shape the Future with Us

We’re seeking passionate board members with expertise in AI safety, field building, or operations. We also welcome expert advisors. Please reach out to hello@far.ai if you’re interested.

Help Us Drive Change

Our success depends on your support. Connect us with potential donors, share fundraising ideas, introduce grant opportunities, or make a donation yourself. Every gift matters!

DONATE

Let’s Keep in Touch

Check out our newly rebranded website. Follow us on X (Twitter), LinkedIn, YouTube, and Bluesky. Know someone who’d be a good fit for our work? Share our newsletter!