From the course: Cybersecurity Foundations

Understand the AI threats

From the course: Cybersecurity Foundations

Understand the AI threats

- [Instructor] The threats to an AI system can be described as a three tier model. At the bottom, we have the traditional cybersecurity threats such as ransomware, unauthorized modification of data for which AI data sets is known as poisoning, theft of model files and so on. It's at this tier that we have to protect our applications from cyber attacks. The next tier up is specific to AI models focusing on the threats from the AI models themselves. It's all about responsible use, which includes making sure the AI models we are using in our applications aren't hallucinating or producing toxic, harmful, or irrelevant information. Finally, there are some very specific attacks directly on the AI models and typically through their prompt interface. These include prompt and thought injections where an adversary will get the model to do things it really shouldn't like disclose sensitive information or cause downstream compromises to internal systems. It's also where attacks on the fabric of the AI models themselves can take place, such as inserting malicious code into the model. There are many threats relating to AI, both to the models themselves as well as threats to our business reputation. As a result of misbehavior from our own and our service providers AI models. OWASP has developed a top 10 list of threats as we see here. Prompt injection is a high profile attack and is where an adversary uses the standard prompt input to manipulate the way in which the AI model responds to a prompt. In particular, these injections are used to try and get responses which are normally denied, such as extraction of sensitive material used in the training of the model. Sensitive information loaded into an AI model can be extracted. And so we might want to make sure that sensitive data isn't used to train the model, or if it is that it's blocked if the model includes it in its response. Without such guardrails, it's possible that an adversary can manipulate their prompts to include sensitive data in the response. Supply chain has become a significant cybersecurity issue and this flows onto AI models. If we construct our models using components and data from a commercial or open source supplier, we need to make sure these have not been compromised as this would compromise our model. The way an AI model operates is determined by the data on which it's trained. This means that if an adversary can manipulate the training dataset, poisoning it in AI terms, then the AI model responses can be influenced in a malicious way. Improper output handling is a general threat related to the responses generated by the AI model. This covers how we confirm that it hasn't produced toxic or harmful output, which may upset a user. Where we have a chain of models, it also includes checking for malicious output designed to compromise downstream systems such as opening backdoors. Excessive agency is a threat that's particularly relevant in agentic systems, where an AI model can take action and with too much power comes more dangerous impacts. For example, an AI model which can issue a system command with super user privileges could be manipulated into deleting or ransoming our critical files. System prompt leakage is a new threat in 2025 and refers to the threat where adversaries manipulate the model to extract the instructions used to constrain the behavior of the model. By understanding these, adversaries can manipulate their prompts to bypass them. In addition, they may contain secrets or other information which when discovered can be used to facilitate other attacks. Vector and embedding weaknesses are another new threat to enter the top 10 in 2025 and refers to weaknesses in the protection afforded to the generation, storage and retrieval of vectors and embeddings, which will be used in the model. This is particularly relevant where retrieval augmented generation is done and these vectors and embeddings are stored in an external vector database. Adversaries may attempt to inject harmful content or access them to extract sensitive information. Misinformation isn't an external threat, but rather misbehavior of an AI model. It occurs when a model responds with false or misleading information that appears credible. An example of this is AI model hallucination, which occurs when a model generates content to fill gaps in their training data using statistical patterns without truly understanding the content. What comes out may be completely irrelevant or false. Unbounded consumption, which can lead to excessive cost or denial of service is as much a problem for AI models as it is for any IT system, but AI models are particularly vulnerable as uncontrolled models can be made to consume a lot of resources responding to prompts. Another framework for AI threats is the Mitre Atlas Matrix. This has a more granular set of threats and is more focused on external adversary attacks. It follows the standard attack process starting with reconnaissance and progressing through initial access, privilege escalation, and through to persistence and finally exfiltration. We won't go through all of the entries, but let's take a look at some of them. We'll start with reconnaissance, active scanning. This is a simple description indicating that adversaries may probe our systems. We can look at a case study called ShadowRay and this describes an attack via the Ray job API, which can be detected during reconnaissance. Under resource development, we can select published poisoned datasets and again, we have a case study shown and at the bottom we can find two approaches to mitigate this threat. We can see LLM prompt injection under a number of headings including defense evasion. Also in this category we can see LLM jailbreak. Let's have a look at it. This describes the process of running a prompt injection to override any preset restrictions. And below we can see that generative AI guardrails are one of our mitigations. In addition to the various phases of threats, the last column indicates the various impacts that can occur as a result of AI model compromise.

Contents