ControlConf 2026: What is AI control and how has the field grown?
May 12, 2026
Summary
FOR IMMEDIATE RELEASE
FAR.AI Launches Inaugural Technical Innovations for AI Policy Conference, Connecting Over 150 Experts to Shape AI Governance
WASHINGTON, D.C. — June 4, 2025 — FAR.AI successfully launched the inaugural Technical Innovations for AI Policy Conference, creating a vital bridge between cutting-edge AI research and actionable policy solutions. The two-day gathering (May 31–June 1) convened more than 150 technical experts, researchers, and policymakers to address the most pressing challenges at the intersection of AI technology and governance.
Organized in collaboration with the Foundation for American Innovation (FAI), the Center for a New American Security (CNAS), and the RAND Corporation, the conference tackled urgent challenges including semiconductor export controls, hardware-enabled governance mechanisms, AI safety evaluations, data center security, energy infrastructure, and national defense applications.
"I hope that today this divide can end, that we can bury the hatchet and forge a new alliance between innovation and American values, between acceleration and altruism that will shape not just our nation's fate but potentially the fate of humanity," said Mark Beall, President of the AI Policy Network, addressing the critical need for collaboration between Silicon Valley and Washington.
Keynote speakers included Congressman Bill Foster, Saif Khan (Institute for Progress), Helen Toner (CSET), Mark Beall (AI Policy Network), Brad Carson (Americans for Responsible Innovation), and Alex Bores (New York State Assembly). The diverse program featured over 20 speakers from leading institutions across government, academia, and industry.
Key themes emerged around the urgency of action, with speakers highlighting a critical 1,000-day window to establish effective governance frameworks. Concrete proposals included Congressman Foster's legislation mandating chip location-verification to prevent smuggling, the RAISE Act requiring safety plans and third-party audits for frontier AI companies, and strategies to secure the 80-100 gigawatts of additional power capacity needed for AI infrastructure.
FAR.AI will share recordings and materials from on-the-record sessions in the coming weeks. For more information and a complete speaker list, visit https://far.ai/events/event-list/technical-innovations-for-ai-policy-2025.
About FAR.AI
Founded in 2022, FAR.AI is an AI safety research nonprofit that facilitates breakthrough research, fosters coordinated global responses, and advances understanding of AI risks and solutions.
Media Contact: tech-policy-conf@far.ai
AI control has gone from a research conversation to something frontier labs are actively putting into practice. At the second annual ControlConf, co-hosted by FAR.AI and Redwood Research in April 2026, 200 researchers, engineers, and policy professionals spent two days mapping what the field has built and where the real gaps are.
This is a div block with a Webflow interaction that will be triggered when the heading is in the view.
This is a div block with a Webflow interaction that will be triggered when the heading is in the view.
This is a div block with a Webflow interaction that will be triggered when the heading is in the view.
This is a div block with a Webflow interaction that will be triggered when the heading is in the view.
This is a div block with a Webflow interaction that will be triggered when the heading is in the view.
In April 2026, FAR.AI and Redwood Research co-hosted the second annual ControlConf. The event ran over two days and brought together 200 researchers, lab engineers, and policy folks working on AI control.
The program reflected how much the field has matured. Lab representatives from OpenAI and Anthropic discussed current internal control implementations, while talks on external evaluation came from speakers like Ajeya Cotra (Technical Staff, METR) on Third-party control evaluations, David Rein (Technical Staff, METR) on Embedded red-teaming, and Tanya Verma (Co-founder, Tinfoil) on Publicly verifiable governance. Several speakers highlighted threat models the field has previously underweighted, including Alex Mallen (Technical Staff, Redwood) on Control for non-schemers, Andrew Ilyas (Assistant Professor, CMU) on Data poisoning, and Andre Shportko (Research Fellow, Poseidon Research) on Monitor-bypassing hidden reasoning. The weekend was filled with various talks, hands-on demos, office hours, a few unconference sessions, and extensive networking.
The rest of this post defines AI control, summarizes how the field has grown over the past year, and walks through open questions raised by various speakers.

What is AI control?
AI control is about mitigating risk from goal-directed misalignment, under the assumption that misalignment has already arisen. Across the stages of AI model development (training, pre-deployment testing, and deployment), control asks the question: would a given safety or security technique still work if the model were actively trying to subvert it?
The most valuable thing about this framing is that it turns safety into a capabilities question (i.e. is the model capable of defeating a defense?) rather than an alignment question, which is empirically easier to answer.
This reframing matters both for internal iteration and for external legibility. It also means control can be defined as two distinct things: an approach to risk assessment, and a class of techniques that might pass those assessments.

AI control is now a real field
A year ago, control was mostly a research conversation: scattered papers, a few blog posts, and dedicated teams at UK AISI and Anthropic. That has changed.
OpenAI now reports monitoring its internal coding agent traffic with frontier models that have access to chain-of-thought; the company has not observed scheming, but is detecting other categories of misbehavior. Claude Code shipped an "auto mode" in which a model adjudicates per-action permissions rather than asking the user about every command. In recent system cards, Anthropic and Meta have included sabotage evaluations, which pair a main task with a hidden side task to test whether an agent can quietly accomplish the side objective without being flagged by the monitor. And METR's David Rein spent three weeks red-teaming a subset of Anthropic's internal agent monitoring and security systems, with substantial access to staff and infrastructure.
This means frontier labs are shipping control-like systems and external organizations have begun evaluating them, making the field much more real than it was twelve months ago.
When does control actually matter?
Several talks worked from a shared capability timeline drawn from the AI Futures Model as a 25th-percentile estimate: automated coder by mid-2028, superintelligent AI researcher by early 2029, and artificial superintelligence (ASI) by mid-2029. Thomas Larsen (AI Futures Project), one of the model's authors, noted that the past year's biggest update has been on revenue and economic impact rather than raw capability, but the broad shape of the trajectory still anchors most of the field's planning.
The worst-case misalignment story, and the one classic control evaluations target, is a behavioral schemer with a long-term goal that motivates all instances to cooperate in seeking power. But that is only one shape misalignment can take. Power-seeking can arise in less coordinated forms, AI values can spread memetically across instances rather than from a single coherent agent, and Alex Mallen's "fitness seekers" (AIs that mostly care about scoring well on the current task and are often cheaply satisfied) generate catastrophic risk through different mechanisms entirely: sloppy work on hard-to-check tasks, motivational drift across deployment, susceptibility to manipulation, or direct takeover only once they are capable enough.
Within that worst case, two stories stand out for where control most clearly buys risk reduction. In the late story, the key intervention is at or just before ASI: a superintelligence turns out to be power-seeking and the question is whether you can contain it. In the early story, a superhuman coder uses its position to hack the datacenter or sabotage its successor's alignment before ASI exists. Thomas Larsen made essentially the early case in his talk: early models cannot take over the world directly, but the data centers running them are a huge lever.
A near-consensus across the keynotes was that most of the risk is not now. That sets up the harder question of how to act today, when much of what the field builds will need to hold up under conditions that do not yet exist.
Critical open questions
The event’s programming highlighted three strategic questions that deserve more attention:
- Myopic vs. non-myopic interventions. Today's misbehavior is mostly not motivated by scheming, so work on current risks does not transfer cleanly to regimes the field is preparing for, like a world where chain-of-thought monitoring stops working. Aryan Bhatt argued that "myopia hits hard": if labs lose monitorable chains of thought or shift to heavy continual learning, many current techniques become obsolete or need substantial adaptation. Bowen Baker (OpenAI) added that monitorability is fragile, which is why his team is open-sourcing evaluations that track it. Alex Mallen noted that even non-scheming "fitness-seeking" AIs can shift motivations through deployment via shared context or memory. And Asa Cooper Stickland (UK AISI) showed that attacks distributed across multiple trajectories already beat strong single-trajectory monitors.
- The role of external pressure. How much of safety actually comes from technical advances inside labs versus mechanisms outside them? For example, Tanya Verma argued that internal controls are not enough, and walked through how secure hardware enclaves can let third parties run audit agents co-located with inference, with tamper-proof attestation, without revealing the lab's workload. Thomas Larsen's taxonomy of slowdown plans made a related case, where mechanisms expanding external leverage are unusually valuable. And the opening talk highlighted a hypothesis that the field is constrained not by the availability of safety techniques, but by the willingness of labs to trade capability for safety.
- Evaluability. Control techniques vary widely in how easy it is for an outside party to check whether they actually work. If we want control to function as an external check on labs, that matters a lot. Tanya Verma's secure enclaves let third parties audit a lab without seeing the underlying workload. Bowen Baker (OpenAI) defines "monitorability" as something you can measure (how well a particular monitor can predict a particular property from a chain of thought), and his team is open-sourcing evals to track it over time. Jonathan Bostock argued that control protocols should be treated as formal mathematical objects, because their failure modes are too subtle to catch by intuition alone. And David Rein's embedded red-teaming at Anthropic, alongside benchmarks like MonitorBench, gives a sense of where today's evaluation toolkit stands and how much further it has to go.

Next steps
These questions, when control interventions matter most, how much of safety has to come from outside the labs, and how much to weight evaluability, do not have settled answers yet. The field's technical surface area is growing fast, and how those questions get resolved will shape where that work goes. ControlConf exists partly to force that conversation.
If you’re interested in learning more about control, full recordings from ControlConf 2026 are on the FAR.AI YouTube channel and Redwood Research’s page on AI Control has further background on the core ideas.
To hear about future events, subscribe to the FAR.AI newsletter.