flexHEG

Summary

Davidad explores the feasibility of bug-resistant software, referencing DARPA's HACMS project, and introduces "flexHEG," a hardware security framework ensuring confidentiality, integrity, and availability without enabling covert surveillance, highlighting its adaptability and robust defense against various attacks.

SESSION Transcript

I do think a lot about formal verification, and that wasn't part of what I was going to talk about, but I want to say a few words since Yoshua brought it up, and since it didn't appear on that list of top priority things. Basically in the 2010s, so 2013 to 2017, DARPA showed through the HACMS project that it is possible to create software systems that are completely free of exploitable bugs in the software domain. Of course, there's still ways of supply chain attacks or electromagnetic attacks for getting physical access to hardware systems. But in the software domain, the defender just wins in the limit of the maximum defender capability and maximum offense capability, the defender just wins.
With AI automation of formal verification, that should become much cheaper and more scalable, and I think it's important that we get there as soon as possible. Among people who've looked at this, it's not actually very controversial. It's not like new research needs to be done into the mathematical possibility of unhackable software. Clearly it’s possible, but a lot of development needs to be done, like adding Nvidia drivers to seL4 and formally verifying them, which will take a lot of cognitive labor, some of which maybe can be automated already. I think that's very important. What I think is less clearly possible is securing the hardware against electromagnetic and physical attacks. That's actually what I'm mainly going to talk about.
Background, because it's brought up, probably everyone knows this, but just to establish common knowledge or definitions, confidentiality is like no one can read your data. Integrity is no one can mess with your data. Availability is no one can erase your data or stop you from being able to run a computation. These are like the core properties of information security, which I'll then refer to in the definition or requirements of what is a flexHEG.
FlexHEG is a concept, which is really is a set of requirements or specifications that must be met in order for something to be considered a flexHEG, according to me. Then there's also use cases, which will be the next slide, and then after that, like a feasibility analysis that I think is some evidence that is feasible by breaking the problem down into six smaller problems, but which are still not all completely solved.
But starting with the specifications, which is really what defines flexHEG. First of all, FlexHEG, as compared to other hardware-enabled security mechanisms or like Intel Management Engine or things like that, should be trustworthy to the owner that it will not facilitate violations of the owner's confidentiality or integrity. It will not be used as a tool for some authority to surveil what's happening on the system in a way that is covert, regardless of any future updates to its firmware. That requires some particular design choices to be made.
The second criterion is that any functions that are core to it being a flexHEG, including functions that you put in optionally, like location verification, should be able to work in an air-gapped configuration or a half air-gapped configuration, sometimes called a data diode configuration, where data can flow in but no data can flow out physically. There are ways of doing location verification where you have satellites orbiting that transmit cryptographically authenticated streams constantly, and you could receive those with an antenna and pipe them in through a data diode and do location verification without actually phoning home, that is to say, without providing a new channel for covert surveillance. I think it's important that also we’d be able to do this so that when we have very high-risk AI systems that we want to avoid giving any potential channel capacity to the outside world, the security measures that we're using to secure them don't require creating some kind of potential loophole for data to escape to the outside world of any kind. I think that's a very important constraint for the long-term AI safety-security intersection.
The third constraint is the core function from a governance point of view, which is that governments or the governance authority should be able to impose constraints on availability, so policies that prevent certain computations from being runnable on this hardware. Unlike with confidentiality and integrity where the owner should be able to trust the systems on their side, with availability, it's the governance mechanism that should be able to trust the systems on their side controlling availability regardless of what the owner tries to do, even if the owner has covert help from foreign governments. Now, of course, in the limit, you can always crush a machine or slice it open. But if you do a physical attack like that, it should fail bricked. It should fail in such a way that it's not useful anymore, it doesn't have any data you can access anymore. I think that's possible.
The fourth criterion is that the owner should be able to trust the hardware will not impair availability except according to transparently disclosed policies. There's some cryptographic mechanism for updating policies, and policies need to be authorized. They need to be public to a certain extent, although they can include confidential data that's used for evaluations of netsec things, for example. But there shouldn't be any secret lever that the governance mechanism has to just shut down your inference operation without that being disclosed.
The fifth criterion is policy should be updatable by a smart contract like cryptographic mechanism. It’s not like you—this is what makes it a flexible hardware-enabled governance mechanism, like you bake in a policy to the firmware, but actually the policy should be updatable really quite fast, like every 10 minutes maybe, so that this can be used as a mechanism also for emergency response to some kind of AI catastrophe. The policy updating rules should be updatable as well, so that this can begin as perhaps a unilateral or minilateral and evolve towards a more multilateral governance mechanism without needing to retool and produce a new batch of hardware. As a cryptographic mechanism should be able to have a quorum to change the quorum, and the updates should also be able to irreversibly restrict the set of potential future policies so that credible commitments can be made that this mechanism won't be used to restrict certain types of things. All this should scale to arbitrary cluster size with minimal operational cost that is not fatal. The final point here is your cryptographic mechanisms, if you have a hardware security module that is running things at line speed so that it's scalable, put in there your best hardware accelerator for a post-quantum system because you might need that before the hardware lifetime is up.
What this enables you to do is things like for computations that exceed certain limits or do certain things like do RL or use training data that involves biological sequence, you want to require some evals for that. You don't want people to be able to run big computations with certain properties without it conforming to a certain shape that includes certain evals. You might want to encrypt the model weights during training above a certain threshold and only allow their extraction after your evaluation protocols come up green. You might want to run automated red teaming protocols where you invoke the accelerator to run another AI system as an adversary that tries to discover new attacks and require that to come up green before you can extract the weights.
You could even do benefit sharing, so you could have weights with certain capabilities be unusable for inference by anybody until they are cryptographically shared with all parties to an agreement that whoever gets there first will share the capabilities with everyone. But you can share the capabilities using capabilities-based access control, an unfortunate name collision that actually means something completely different. Capabilities-based access control means that you're giving fine-grained access to do certain things with the data and not others. You could give access to run inference, but not fine-tuning. You can give that access with sovereign control. It's not like an API access where you give access to inference, but then the person who's giving the access can later revoke it and has some leverage. You could be running it on your own hardware in such a way that the granter cannot revoke access, but you also can't do fine-tuning, so that, I think, is possible. You could even have inference compute limits and speed limits, and you could even have those be global. You could have a crypto token that is like a taxi medallion for running advanced RL agents, and you could have only 50,000 of them in the world. People could bid and trade them, but you won't have a nation of geniuses in a data center. You'd only have a city in the world. That's a thing you could do with this.
You could verify other claims about the training process, and the same flexHEG hardware could also enable less ambitious approaches to hardware-enabled mechanisms use cases like offline licensing and geolocation.
Here's why I think this is possible. This is like a high-level design for flexHEG. There are basically six components.
The first component is you need to have some kind of security perimeter or physical security perimeter which has sensors for all the possible ways that someone could try to break through the security perimeter, including thermal disruption, mechanical disruption, kinetic, and electrical. You need to have noise generation on that security perimeter in ultrasonic and electromagnetic domains to mask any potential leakage of confidential information through those channels. But we live in a universe where things have locality, volumes are bounded by surfaces, and we have inverse square laws. It just seems like it should be physically possible to enumerate all the ways you could extract information from something and block them by generating noise. Then you need a self-disable mechanism so that if tampering is detected, then the processor is no longer usable, either for compute or for extracting the data. This can be done with having lots of eFuses buried between layers of physically unclonable functions which generate the keys which are necessary to decrypt the memory.
You need to have a processor, coprocessor, which actually runs the compliance mechanism. It doesn't need to be on a separate die. This is in the original design because we were thinking we're going to demonstrate this with a current Nvidia processor, but in a future processor, you just have the secure code processor on the same die, like the Intel Management Engine is on the same die, but it's a separate thing—a separate domain, isolated domain that runs the policies and then blocks the execution of code that hasn't been checked by this and signed off on by those compliance policies. That secure processor needs a way to receive from the outside world the cryptographically authenticated rule sets every 10 minutes, which, again, could be broadcast and data dioded.
Finally, flexHEGs, when you connect them into a cluster, need to have encrypted interconnect so that the computation can run across many processors, and the secure processors need to connect to each other and verify cluster scale properties. But there are a couple of ways to do that, either by dynamically attaching metadata about the computations that have gone into each piece of data or by statically verifying the entire cluster scale computation and then locally verifying that the local piece is a partition of the global graph.
I will stop there. I don't know if I have any time for questions, but no. Thank you.
[Audience laughter].