The AI Security Landscape
Summary
Dan Lahav and Sella Nevo emphasize the urgency of securing advancing AI technology against sophisticated attackers, including nation-states, to protect national security and prevent the bypassing of safety measures, referencing the RAND report's tools and recommendations for bolstering AI defense.
SESSION Transcript
Hi, I am Dan, as Caleb mentioned. I'm from Pattern Labs, and this is Sella from RAND. This is going to be a talk about why securing Frontier AI matters, and also kind of an introduction to strategic work that should be done in the field.
Following on the last talk, we don’t need to talk much about it, but AI capabilities are aggressively progressing at a breakneck speed over the last couple of years. And there is a strong case that capabilities are going to continue to grow. As Professor Bengio said, that creates a very strong case for why securing Frontier AI matters a lot.
First, this rapid development means that the IP works much more, and as an outcome, there is a strong commercial case and an urgency to protect assets. Not sure if the new OpenAI valuation is going to close at $300 billion or $200 billion or whatever, but obviously this works quite a lot. But in this regard, the case goes much deeper. AI becomes a national security issue.
This is exemplified, and now I don't think that we need to expand much more about it given DeepSeek in recent weeks and the potential AGI Manhattan Project that is going to be created. It's clear that it becomes gradually a national security issue as well. And there is a huge potential for catastrophic harms as well, as mentioned by Professor Bengio—things such as the potential for creation of bioweapons or cyber weapons—makes a strong case for the urgency of securing such advanced systems.
Even more so, we at least currently right now know that it's actually fairly straightforward to remove many safety mitigations if you're able to get access to the systems. That essentially means that even if you're able to solve safety problems or solve alignment, if someone is able to effectively hack the system, they can remove many of the mitigations that are created. So, in essence, you can't be safe if you're not secure.
The issue is that it's a really hard problem to secure such systems. Defense is hard. Just as one example, the fact that we're getting into a situation where AI becomes a national security issue means that the level of sophistication of actors that are going to attack the system is rapidly increasing as well.
Just to give a few numbers on what that means: we’re potentially going to see nation states or top capable cyber nation states enter the field. So, just a few numbers here—one huge number—according to people that we've interviewed in the past from the intelligence community, China has over 100,000 attackers just employed directly by the state, not even including the private market for hackers. Secondly, we know that such actors are ahead by decades compared to the market.
There are many examples, like the financial crypto analysis or Tempest, that show attack vectors that cyber was able to, just like the intelligence community was able to do decades before the industry, and also a lot of empirical examples of even very sophisticated organizations that were hacked. Stuxnet is an example of being able to hack centrifuges that were at the belly of a mountain. NSA tools got leaked multiple times. Microsoft seems to be getting hacked every couple of weeks. Google had issues with things like Aurora. And these are examples of very sophisticated organizations that are being hacked.
Bear in mind that some of the top contenders for advanced AI in the world are not these organizations. They are essentially startups that are not creating a product that is a centrifuge of a nuclear weapon system that is going to be developed in an isolated location, but essentially a product that is connected to the internet and will want them to have access for the entirety of the world. That means that the problem of defending such a system is incredibly hard, and it goes beyond the level of sophistication of the actors.
Advanced attack vectors are also inherent because these are complicated systems. And to explain more about the work that we've done, though, I'll hand over to Sella. Oh, you have a microphone.
Following on the last talk, we don’t need to talk much about it, but AI capabilities are aggressively progressing at a breakneck speed over the last couple of years. And there is a strong case that capabilities are going to continue to grow. As Professor Bengio said, that creates a very strong case for why securing Frontier AI matters a lot.
First, this rapid development means that the IP works much more, and as an outcome, there is a strong commercial case and an urgency to protect assets. Not sure if the new OpenAI valuation is going to close at $300 billion or $200 billion or whatever, but obviously this works quite a lot. But in this regard, the case goes much deeper. AI becomes a national security issue.
This is exemplified, and now I don't think that we need to expand much more about it given DeepSeek in recent weeks and the potential AGI Manhattan Project that is going to be created. It's clear that it becomes gradually a national security issue as well. And there is a huge potential for catastrophic harms as well, as mentioned by Professor Bengio—things such as the potential for creation of bioweapons or cyber weapons—makes a strong case for the urgency of securing such advanced systems.
Even more so, we at least currently right now know that it's actually fairly straightforward to remove many safety mitigations if you're able to get access to the systems. That essentially means that even if you're able to solve safety problems or solve alignment, if someone is able to effectively hack the system, they can remove many of the mitigations that are created. So, in essence, you can't be safe if you're not secure.
The issue is that it's a really hard problem to secure such systems. Defense is hard. Just as one example, the fact that we're getting into a situation where AI becomes a national security issue means that the level of sophistication of actors that are going to attack the system is rapidly increasing as well.
Just to give a few numbers on what that means: we’re potentially going to see nation states or top capable cyber nation states enter the field. So, just a few numbers here—one huge number—according to people that we've interviewed in the past from the intelligence community, China has over 100,000 attackers just employed directly by the state, not even including the private market for hackers. Secondly, we know that such actors are ahead by decades compared to the market.
There are many examples, like the financial crypto analysis or Tempest, that show attack vectors that cyber was able to, just like the intelligence community was able to do decades before the industry, and also a lot of empirical examples of even very sophisticated organizations that were hacked. Stuxnet is an example of being able to hack centrifuges that were at the belly of a mountain. NSA tools got leaked multiple times. Microsoft seems to be getting hacked every couple of weeks. Google had issues with things like Aurora. And these are examples of very sophisticated organizations that are being hacked.
Bear in mind that some of the top contenders for advanced AI in the world are not these organizations. They are essentially startups that are not creating a product that is a centrifuge of a nuclear weapon system that is going to be developed in an isolated location, but essentially a product that is connected to the internet and will want them to have access for the entirety of the world. That means that the problem of defending such a system is incredibly hard, and it goes beyond the level of sophistication of the actors.
Advanced attack vectors are also inherent because these are complicated systems. And to explain more about the work that we've done, though, I'll hand over to Sella. Oh, you have a microphone.
[Sella Nevo]
Thanks, Dan. A lot of what I'll talk about is based on the Securing AI Model Weights RAND report done in collaboration with Pattern Labs. And really, we wanted to provide some tools for companies to be able to know how to deal with some of these threats.
The first thing we did is we did a literature review and talked to about 30 experts on different tools that attackers have to try and attack these AI systems. We produced these 38 attack vectors, and we then tried to ask, OK, which actors would be capable of executing these kinds of attacks?
We won't go through all of them, but just to give a few examples—some of these attack vectors are easy, and really almost any actor can do them. For example, one thing that hackers like to do is go to the parking lot of the organization they want to attack, and then leave a bunch of USB sticks in the parking lot. Inevitably, someone will find one of those USB sticks and say, "Oh no, who dropped this? Let me plug it into my computer to find out."
That is a very kind thing to do, but also a very stupid thing to do, because, of course, then you allow malicious actors to have access to your system. Now, I assume many people here would not find a random USB stick in the parking lot and plug it into their computer, but unfortunately, attackers are also evolving pretty rapidly.
Right now, you can buy online for $180 a USB cable—note, not a USB stick, a USB cable—that if you plug it into your computer, it'll take over the computer. It also has wireless communication capabilities so that, even if your system is air-gapped, it can still be controlled remotely. And I assume not many people think twice before plugging—where did this USB cable come from before they plug it in.
Of course, some attack vectors, you need to be slightly more capable to be able to execute. For example, if you leave your laptop in a hotel room, it’s been shown time and time again that many hackers can undermine the access control system in your hotel room door and access it—not to mention, just bribing the staff or something—and can reach your computer. If you have disk encryption, then it's harder, but if you've left your computer on but just locked, there are fancy things that people can do.
In even more extreme cases, even if you never leave your computer in your hotel room, you still come back with your computer to the hotel room. So, if someone was in there and installed cameras, for example, maybe they can see what you're doing on your computer even if you're there with it. That’s why, if any of you have seen interviews with Snowden, sometimes when he's in a hotel room, he’s all under the blanket in the interview, which sounds very weird, but he doesn’t want people to be able to see.
Now, some attack vectors are actually only available to nation-states. Maybe the most obvious example of that is military takeover—you can’t do that if you don’t have a military—but there are more nuanced examples. For example, China has a law that requires anyone with a footprint in China to report any zero-days that they found; zero-days being vulnerabilities that are not yet known and therefore very hard to defend against. They must report them to the Chinese government quickly. And we know that they then use those vulnerabilities to hand then to their offensive cyber operations. This means that China has access to more vulnerabilities than any single organization could find on its own.
But really, the most interesting thing is, as Dan said, often, we’ve found that nation-states are just decades ahead of anyone else. Now it’s hard to know therefore what they can do right now, but we can look back in history and say, well, what do we now, in hindsight, know that they could do, let’s say 30, 40 years ago that we didn’t know at the time, and try to extrapolate what that means for today.
Dan mentioned briefly a Tempest attack, which broadly means the following: It turns out that digital systems will emanate electromagnetic radiation, and in 1985 a researcher named Van Eck found that you could analyze those emanations and find out information about what is inside the actual digital system. Therefore, it’s sometimes called Van Eck phreaking. But even though he published it in 1985 and that’s when most of us found out about it, many nation-states had been using this technique since World War II. This means that for four decades, governments had a whole dimension of information just all around us that none of us were aware of and, of course, could not protect ourselves from. That is one example out of many. One has to infer that there must—not must be, but there likely are different ways in which nation-states can attack us today.
So, what do we do about that? Well, one thing we’ve tried to do is provide these five security levels that Yoshua has mentioned before, which are five levels of security that try to calibrate what amount of effort you put into your security, what security measures you put in place, and what are the security outcomes you should expect. This allows labs to both know where they stand—they can compare it to their own systems—but also, if they want to improve their security, what are good next steps for things they might want to do?
No less importantly, it allows policymakers who actually don’t have the technical expertise to really engage at the object level of what security needs to be done to engage at the high-level questions of this. For example, now policymakers can talk to the labs and say, "You are now at, for example, security level 3, which broadly means you’re secure against non-state actors, but not state actors, and maybe we want you to go to what’s called security level 5, which is what one would need to be secure against China." This allows for more engagement and responsible governance of these systems.
All in all—actually, before that, all in all, the security levels have 167 recommendations of specific security measures. And maybe this is a good opportunity to say everything we’re talking about here has a lot more detail in the report. So, if you’re interested, I’d recommend looking at it.
Sadly, though, in the years since this report came out, a lot of the things that we have warned about have not been sufficiently addressed, and the same vulnerabilities have been identified now in production systems. For example, one of the things that we warned about is distillation attacks, which is a way to use the public API of a company to create a copycat model that achieves the same capabilities. Just last week, OpenAI claimed that DeepSeek has used a distillation attack to create its now very, very famous model.
Also, last week or the week before that, we heard from—actually, before what we heard, what we said a year ago was that machine learning infrastructure right now has very low security standards, even worse than regular software, which, to be honest, is not great to begin with. Just a week or two ago, Meta announced that their Llama stack has indeed a critical vulnerability that allows people to run malicious code on the server where the weights are stored. And the list goes on and on and on and on.
There’s still a lot more to do. But, of course, there’s a lot more to do not only in securing AI systems, which is what we’ve talked about so far, but AI security actually goes a lot broader than just that.
With that, Dan.
[Dan Lahav]: We need you to join the effort. AI security is a very nascent field, and there are many ways to exemplify that. But I think the easiest one is that there is no Wikipedia value around AI security. If you look for AI security on Wikipedia, you’re going to get an autocomplete to airport security. [Audience laughter]
For the purpose of the community, this talk, and the discussion, we think it is valuable to think about AI security as the intersection between AI and security to create a broad tent of efforts. That would roughly include four main tenets. We don't have much time to go over them, but I'll give a quick glimpse.
It's security for AI; that includes stuff like securing the weights or other critical components or InfoSec for containment more generally. It's AI for security. It’s very important to remember that this is actually the cool and hard thing about it, it’s a bi-directional field. AI is something that needs to be defended, but it is also potentially either part of the solution or the problem. AI can be used for offense and for defense as well.
Foundational work is also something that is another core tenet of what we call and should be a part of AI security and where we should concentrate some of our efforts. That includes strategy for field building, things like legibility and advocacy in the field, as well as general field building efforts, funding and getting more talent in.
There are many, many, many different efforts that should happen in the next few years. We've both jump-started a few projects as a follow-up to the report that Sella mentioned in the previous few slides. But we actually think that there is a place to create a strategy for the field. There is a project that is called North Star, that we’re working on together with the aim of identifying the top priority projects that should be mapped for the next five years in order for us to successfully be able to defend AI or utilize AI as part of the defense.
We actually ran a workshop in a previous AI security forum, and we’ve collected many ideas from you guys, but we’re also doing many, many interviews with experts, people from the intelligence community, higher up execs at the labs themselves. Just to give you a sense of a few recommendations, we won't have time to properly go over this slide, but you can take a picture if you want, and just like a few things that came out repeatedly as high-priority issues that we should do in AI security.
For example, things such as costing AI measures. Many labs are concerned by how much this is going to cost. What’s going to be the effort of getting to SL3 or SL4 or SL5. We've done some preliminary analysis, but there needs to be robust work for us to understand how trade-offs with productivity goes
Another key example is being able to set red lines for offensive cyber capabilities, work around evaluations. Currently, a lot is being done, but not enough. There is still a huge debate about what the red lines should be and how the field should move on forward to understand when we should actually go and activate some of the mitigations, which is what red lines are attempting to do.
This is just a glimpse. Our goal is to actually release the result of showing this taste of specific projects but many, many, many more over the next few months. Hopefully, by the next AI security forum, we will be able to show you something which is a much more complete work out there.
We also—you can help. There is a QR code here. If you want to help and give us feedback on high-priority projects. Suggest more high-priority projects or just be notified when this follow-up research is going to be released. You are more than welcome to scan the QR code and stay in touch with us.
I think we're out of time. We have much more to say, but unfortunately, we need to wrap up. Thank you, everyone, and thank you to the organizers of this event.
Thanks, Dan. A lot of what I'll talk about is based on the Securing AI Model Weights RAND report done in collaboration with Pattern Labs. And really, we wanted to provide some tools for companies to be able to know how to deal with some of these threats.
The first thing we did is we did a literature review and talked to about 30 experts on different tools that attackers have to try and attack these AI systems. We produced these 38 attack vectors, and we then tried to ask, OK, which actors would be capable of executing these kinds of attacks?
We won't go through all of them, but just to give a few examples—some of these attack vectors are easy, and really almost any actor can do them. For example, one thing that hackers like to do is go to the parking lot of the organization they want to attack, and then leave a bunch of USB sticks in the parking lot. Inevitably, someone will find one of those USB sticks and say, "Oh no, who dropped this? Let me plug it into my computer to find out."
That is a very kind thing to do, but also a very stupid thing to do, because, of course, then you allow malicious actors to have access to your system. Now, I assume many people here would not find a random USB stick in the parking lot and plug it into their computer, but unfortunately, attackers are also evolving pretty rapidly.
Right now, you can buy online for $180 a USB cable—note, not a USB stick, a USB cable—that if you plug it into your computer, it'll take over the computer. It also has wireless communication capabilities so that, even if your system is air-gapped, it can still be controlled remotely. And I assume not many people think twice before plugging—where did this USB cable come from before they plug it in.
Of course, some attack vectors, you need to be slightly more capable to be able to execute. For example, if you leave your laptop in a hotel room, it’s been shown time and time again that many hackers can undermine the access control system in your hotel room door and access it—not to mention, just bribing the staff or something—and can reach your computer. If you have disk encryption, then it's harder, but if you've left your computer on but just locked, there are fancy things that people can do.
In even more extreme cases, even if you never leave your computer in your hotel room, you still come back with your computer to the hotel room. So, if someone was in there and installed cameras, for example, maybe they can see what you're doing on your computer even if you're there with it. That’s why, if any of you have seen interviews with Snowden, sometimes when he's in a hotel room, he’s all under the blanket in the interview, which sounds very weird, but he doesn’t want people to be able to see.
Now, some attack vectors are actually only available to nation-states. Maybe the most obvious example of that is military takeover—you can’t do that if you don’t have a military—but there are more nuanced examples. For example, China has a law that requires anyone with a footprint in China to report any zero-days that they found; zero-days being vulnerabilities that are not yet known and therefore very hard to defend against. They must report them to the Chinese government quickly. And we know that they then use those vulnerabilities to hand then to their offensive cyber operations. This means that China has access to more vulnerabilities than any single organization could find on its own.
But really, the most interesting thing is, as Dan said, often, we’ve found that nation-states are just decades ahead of anyone else. Now it’s hard to know therefore what they can do right now, but we can look back in history and say, well, what do we now, in hindsight, know that they could do, let’s say 30, 40 years ago that we didn’t know at the time, and try to extrapolate what that means for today.
Dan mentioned briefly a Tempest attack, which broadly means the following: It turns out that digital systems will emanate electromagnetic radiation, and in 1985 a researcher named Van Eck found that you could analyze those emanations and find out information about what is inside the actual digital system. Therefore, it’s sometimes called Van Eck phreaking. But even though he published it in 1985 and that’s when most of us found out about it, many nation-states had been using this technique since World War II. This means that for four decades, governments had a whole dimension of information just all around us that none of us were aware of and, of course, could not protect ourselves from. That is one example out of many. One has to infer that there must—not must be, but there likely are different ways in which nation-states can attack us today.
So, what do we do about that? Well, one thing we’ve tried to do is provide these five security levels that Yoshua has mentioned before, which are five levels of security that try to calibrate what amount of effort you put into your security, what security measures you put in place, and what are the security outcomes you should expect. This allows labs to both know where they stand—they can compare it to their own systems—but also, if they want to improve their security, what are good next steps for things they might want to do?
No less importantly, it allows policymakers who actually don’t have the technical expertise to really engage at the object level of what security needs to be done to engage at the high-level questions of this. For example, now policymakers can talk to the labs and say, "You are now at, for example, security level 3, which broadly means you’re secure against non-state actors, but not state actors, and maybe we want you to go to what’s called security level 5, which is what one would need to be secure against China." This allows for more engagement and responsible governance of these systems.
All in all—actually, before that, all in all, the security levels have 167 recommendations of specific security measures. And maybe this is a good opportunity to say everything we’re talking about here has a lot more detail in the report. So, if you’re interested, I’d recommend looking at it.
Sadly, though, in the years since this report came out, a lot of the things that we have warned about have not been sufficiently addressed, and the same vulnerabilities have been identified now in production systems. For example, one of the things that we warned about is distillation attacks, which is a way to use the public API of a company to create a copycat model that achieves the same capabilities. Just last week, OpenAI claimed that DeepSeek has used a distillation attack to create its now very, very famous model.
Also, last week or the week before that, we heard from—actually, before what we heard, what we said a year ago was that machine learning infrastructure right now has very low security standards, even worse than regular software, which, to be honest, is not great to begin with. Just a week or two ago, Meta announced that their Llama stack has indeed a critical vulnerability that allows people to run malicious code on the server where the weights are stored. And the list goes on and on and on and on.
There’s still a lot more to do. But, of course, there’s a lot more to do not only in securing AI systems, which is what we’ve talked about so far, but AI security actually goes a lot broader than just that.
With that, Dan.
[Dan Lahav]: We need you to join the effort. AI security is a very nascent field, and there are many ways to exemplify that. But I think the easiest one is that there is no Wikipedia value around AI security. If you look for AI security on Wikipedia, you’re going to get an autocomplete to airport security. [Audience laughter]
For the purpose of the community, this talk, and the discussion, we think it is valuable to think about AI security as the intersection between AI and security to create a broad tent of efforts. That would roughly include four main tenets. We don't have much time to go over them, but I'll give a quick glimpse.
It's security for AI; that includes stuff like securing the weights or other critical components or InfoSec for containment more generally. It's AI for security. It’s very important to remember that this is actually the cool and hard thing about it, it’s a bi-directional field. AI is something that needs to be defended, but it is also potentially either part of the solution or the problem. AI can be used for offense and for defense as well.
Foundational work is also something that is another core tenet of what we call and should be a part of AI security and where we should concentrate some of our efforts. That includes strategy for field building, things like legibility and advocacy in the field, as well as general field building efforts, funding and getting more talent in.
There are many, many, many different efforts that should happen in the next few years. We've both jump-started a few projects as a follow-up to the report that Sella mentioned in the previous few slides. But we actually think that there is a place to create a strategy for the field. There is a project that is called North Star, that we’re working on together with the aim of identifying the top priority projects that should be mapped for the next five years in order for us to successfully be able to defend AI or utilize AI as part of the defense.
We actually ran a workshop in a previous AI security forum, and we’ve collected many ideas from you guys, but we’re also doing many, many interviews with experts, people from the intelligence community, higher up execs at the labs themselves. Just to give you a sense of a few recommendations, we won't have time to properly go over this slide, but you can take a picture if you want, and just like a few things that came out repeatedly as high-priority issues that we should do in AI security.
For example, things such as costing AI measures. Many labs are concerned by how much this is going to cost. What’s going to be the effort of getting to SL3 or SL4 or SL5. We've done some preliminary analysis, but there needs to be robust work for us to understand how trade-offs with productivity goes
Another key example is being able to set red lines for offensive cyber capabilities, work around evaluations. Currently, a lot is being done, but not enough. There is still a huge debate about what the red lines should be and how the field should move on forward to understand when we should actually go and activate some of the mitigations, which is what red lines are attempting to do.
This is just a glimpse. Our goal is to actually release the result of showing this taste of specific projects but many, many, many more over the next few months. Hopefully, by the next AI security forum, we will be able to show you something which is a much more complete work out there.
We also—you can help. There is a QR code here. If you want to help and give us feedback on high-priority projects. Suggest more high-priority projects or just be notified when this follow-up research is going to be released. You are more than welcome to scan the QR code and stay in touch with us.
I think we're out of time. We have much more to say, but unfortunately, we need to wrap up. Thank you, everyone, and thank you to the organizers of this event.