STPA & AI Control

Summary

Simon Mylius introduces Systems Theoretic Process Analysis (STPA), a methodology developed at MIT for aerospace applications which can be used to build comprehensive AI safety assurance cases for AI control

SESSION Transcript

Simon Mylius: Hi. I'm going to be talking about Systems Theoretic Process Analysis and how it can be applied to AI control. This is really to address the question of how you can be very confident when you're building a safety assurance case that everything important has been covered. The safety frameworks that AI companies have put out don't explicitly describe that in very much detail in terms of a structured process, and that's what STPA provides. I'm going to talk you through how STPA works and then apply it to the AI control deployment.
STPA was developed by a team at MIT initially for aerospace applications. It's now used in a whole bunch of other high hazard or high risk industries. Google ran a pilot study that they published some results from where they just put two engineers using STPA to address service reliability part time for five months, and they came up with results that they think could have prevented for serious incidents.
STPA is a systemic approach, so it goes beyond other kinds of risk and hazard analysis methodologies to look at organizational, operational, human factors.
I'm aware that we've overloaded the definition of the word control here. STPA is all about control structure models. In the control structure model, you'd have a bunch of controllers, which in this case could be people, teams, computer programs, agents. Each of those controllers applies control actions to controlled processes or to other controllers, and they get feedback back from them. The analysis is all about looking at how those control actions could become unsafe. The way that could happen is if there are specific set of conditions or loss scenarios. It's a four stage process, and I'm going to walk through the four stages applied to AI control.
First stage is all about defining losses and hazards. We're going to look at the example from the sketch of an AI control safety case where the threat we're concerned about is exfiltration of sensitive code from an AI company by an internally deployed agent. The loss that we're most concerned about is unauthorized loss of intellectual property, and that could be caused by the hazard where the LLM agent transmits sensitive data to external entities.
The next stage, stage two, is to model the control structure. You start off by defining the system boundary, which in this case is the AI company. Then you look at all of the entities within that, the control entities, and the interactions between them, so the control actions which are applied from a controller to a controlled process. In this diagram, they are shown by red arrows, and the feedback is shown by blue arrows.
Once you've built the model, then the next stage is to go through each of those control actions and try and find ways that they could become unsafe. That's stage three.
STPA provides four guides or ways that control actions could be unsafe. The first one is if the control action is not provided. Second one is if it is provided but it still causes a hazard, so it could be incorrectly specified, for example. Third one is if it's provided in the wrong order or at the wrong time, and the fourth one is for the wrong duration. The example that I've picked out here is where the untrusted agent uses local file storage to store state information that persists across memory resets.
Stage four is to identify the causal factors that could contribute to this loss scenario. In this case, we've identified inadequate monitoring, missing constraints, and a flawed process model.
As takeaways, STPA can be complementary to safety cases and to other elements of—other AI governance methodologies and the the design pattern on the right shows a control structure and how all of the different STPA elements can feed into different levels of the control structure.
It's a structured process, which means that it lends itself well to automation, so that's moving towards AI for AI safety. Being a systemic tool, it provides a useful way to communicate some of the whole system risks to people that might have a less clear understanding of AI systems.
Thank you very much. I'd love to talk about this more. Come find me afterwards.