Disaster Preparedness for AI Safety

Summary

Tegan Maharaj outlines practical safety measures for AI disasters, including safe zones, diverse decision-making, human back channels, and quarantine protocols.

SESSION Transcript

Many of you know me, I know many of you, happy to see you here, and you know that I've been working on safety since about 2016, which is kind of a long time ago now, I'm feeling kind of old in the field.
And during that time, I've also been doing a lot of climate and ecology work, that's my background, is in ecology. And these days a lot of my work is focused on risk, and risk assessment, and risk modeling. And in 2016, climate and ecology risk, we were really, very much mid-disaster already. We were assuming disasters were happening, they were happening, and we were dealing with them.
But for AI, back in 2016, if you'd asked me, can we guarantee that we could avoid any and all AI-related disasters, I might've said yes, I might've said like, yeah, I think there's a chance that could happen. Today I no longer believe that. In fact it sounds sort of charmingly naive to think that I might have once thought it possible. And in the last few years, especially, I think I've met a lot of people who are almost hoping for a disaster, because they think of that as kind of our only chance to get people to really understand a lot of the risks and harms that might happen from all of these topics that we in the safety community have been thinking about for many years.
Personally, I'm kind of jaded about that possibility, seeing how the world has responded to climate disasters. But even making that comparison kind of drew my attention to how prepared are we for disasters? And compared to climate disasters, I would say we are very unprepared, and it's not like the climate disasters are going well.
So, given that we basically expect these disasters to happen, I think we have a responsibility to prepare for them, and I think we have a lot of specialized, technical and other knowledge in this room, in this field, that could achieve that Goldilocks disaster, where we learn from it, make the most of it kind of, and minimize human suffering and global suffering, and instead of just having a disaster that itself kind of sparks more disasters.
So, how do we do that? This is a list of things that I have come up with of what we can do to prepare for a disaster. I won't go too much into the framework of how I came up with these, but I think it's worth talking a little bit briefly about it, that it comes from my background in ecological modeling and ecological risk. So, it's from a systems perspective on risk, complex systems perspective on risk.
So, these things are interventions that work at multiple scales, and they have kind of self-reinforcing properties where having them happen at one scale introduces a sort of fractal stability and feedback pattern that supports them working at other scales.
So, when I say multiple scales, I mean that you can think about, like deploying warning systems, building human backups at an individual level or in your community, at an org level, at a regional level, at a country level. So, sort of as Mark was saying, we don't need to worry about this fragmented ecosystem thing. We almost want a fragmented ecosystem; We want many interventions that are multiply redundant at multiple scales.
And before I say a little bit about each one of these, I also want to say that I've thought pretty hard about making these things that have—making these be interventions that themselves have very low risk. And I think that's something that's really worth thinking about, that I think maybe the safety community has not thought enough about on some of the interventions that we do.
So, things like alignment, robustness, scalable oversight that have been priorities for the safety community for a long time, these things are great, but they also have the potential to just be capabilities increases, and produce further risks themselves, and I think we've seen a lot of that happen. And even things like evals, or demonstrating failures, or interpretability methods, the results of these interventions or technologies could be used by bad actors to improve their systems.
So, thinking about the risk of the interventions that we're doing for safety, is I think a really important aspect of the interventions we do.
AI free safe zones, both physical and online, increasing the diversity in who has decision making power, basically, if we have—if we don't have that diversity, we're very vulnerable to things like group think or missing big important stuff. Building human back channels, we should always have a working human backup, and all of that.
We've talked about off switches in the past before, but I don't think off switches work in practice unless we have those first three. Otherwise, the economic incentives is such a big deal to flip the off switch that we probably won't do it unless we have backups.
Decentralized resource management, multiple redundancy, warning systems, I’m putting all that together into drills and quarantine protocols. Time is up, so, if you would like to work on the report, there's a QR code and just please get in touch with me, I'm around all day, and thank you very much.