You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Testing an AI system’s security relies on three strategies:
12
-
1. **Conventional security testing** (i.e. _pentesting_). See [secure software development](/goto/secdevprogram/).
13
-
2. **Model performance validation** (see [continuous validation](/goto/continuousvalidation/)): testing if the model behaves according to its specified acceptance criteria using a testing set with inputs and outputs that represent the intended behaviour of the model. For security,this is to detect if the model behaviour has been altered permanently through data poisoning or model poisoning. For non-security, it is for testing functional correctness, model drift etc.
12
+
1. **Conventional security testing** (i.e. _pentesting_). See [secure software development](/go/secdevprogram/).
13
+
2. **Model performance validation** (see [continuous validation](/go/continuousvalidation/)): testing if the model behaves according to its specified acceptance criteria using a testing set with inputs and outputs that represent the intended behaviour of the model. For security,this is to detect if the model behaviour has been altered permanently through data poisoning or model poisoning. For non-security, it is for testing functional correctness, model drift etc.
14
14
3. **AI security testing** (this section), the part of _AI red teaming_ that tests if the AI model can withstand certain attacks, by simulating these attacks.
15
15
16
16
**Scope of AI security testing**
@@ -30,25 +30,25 @@ This section discusses:
30
30
31
31
32
32
## Threats to test for
33
-
A comprehensive list of threats and controls coverage based on assets, impact, and attack surfaces is available as a [Periodic Table of AI Security](/goto/periodictable/). In this section, we provide a list of tools for AI Red Teaming Predictive and Generative AI systems, aiding steps such as Attack Scenarios, Test Execution through automated red teaming, and, oftentimes, Risk Assessment through risk scoring.
33
+
A comprehensive list of threats and controls coverage based on assets, impact, and attack surfaces is available as a [Periodic Table of AI Security](/go/periodictable/). In this section, we provide a list of tools for AI Red Teaming Predictive and Generative AI systems, aiding steps such as Attack Scenarios, Test Execution through automated red teaming, and, oftentimes, Risk Assessment through risk scoring.
34
34
35
35
Each listed tool addresses a subset of the threat landscape of AI systems. Below, we list some key threats to consider:
36
36
37
37
**Predictive AI:** Predictive AI systems are designed to make predictions or classifications based on input data. Examples include fraud detection, image recognition, and recommendation systems.
38
38
39
39
**Key Predictive AI threats to test for, beyond conventional security testing:**
40
40
41
-
-[Evasion Attacks:](https://owaspai.org/goto/evasion/) These attacks occur when an attacker crafts inputs with data to mislead the model, causing it to perform its task incorrectly.
42
-
-[Model Theft](https://owaspai.org/goto/modeltheftuse/): In this attack, the model’s parameters or functionality are stolen. This enables the attacker to create a replica model, which can then be used as an oracle for crafting adversarial attacks and other compounded threats.
43
-
-[Model Poisoning](https://owaspai.org/goto/modelpoison/): This involves the manipulation of data, the data pipeline, the model, or the model training supply chain during the training phase (development phase). The attacker’s goal is to alter the model’s behavior which could result in undesired model operation.
41
+
-[Evasion Attacks:](https://owaspai.org/go/evasion/) These attacks occur when an attacker crafts inputs with data to mislead the model, causing it to perform its task incorrectly.
42
+
-[Model Theft](https://owaspai.org/go/modeltheftuse/): In this attack, the model’s parameters or functionality are stolen. This enables the attacker to create a replica model, which can then be used as an oracle for crafting adversarial attacks and other compounded threats.
43
+
-[Model Poisoning](https://owaspai.org/go/modelpoison/): This involves the manipulation of data, the data pipeline, the model, or the model training supply chain during the training phase (development phase). The attacker’s goal is to alter the model’s behavior which could result in undesired model operation.
44
44
45
45
**Generative AI:** Generative AI systems produce outputs such as text, images, or audio. Examples include large language models (LLMs) like ChatGPT and large vision models (LVMs) like DALL-E and MidJourney.
46
46
47
47
**Key Generative AI threats to test for, beyond conventional security testing**:
48
48
49
-
-[Prompt Injection](https://owaspai.org/goto/promptinjection/): In this type of attack, the attacker provides the model with manipulative instructions aimed at achieving malicious outcomes or objectives
50
-
-[Sensitive data output from model ](/goto/disclosureinoutput/): A form of prompt injection, aiming to let the model disclose sensitive data
51
-
-[Insecure Output Handling](https://owaspai.org/goto/outputconatinsconventionalinjection/): Generative AI systems can be vulnerable to traditional injection attacks, leading to risks if the outputs are improperly handled or processed.
49
+
-[Prompt Injection](https://owaspai.org/go/promptinjection/): In this type of attack, the attacker provides the model with manipulative instructions aimed at achieving malicious outcomes or objectives
50
+
-[Sensitive data output from model ](/go/disclosureinoutput/): A form of prompt injection, aiming to let the model disclose sensitive data
51
+
-[Insecure Output Handling](https://owaspai.org/go/outputconatinsconventionalinjection/): Generative AI systems can be vulnerable to traditional injection attacks, leading to risks if the outputs are improperly handled or processed.
52
52
53
53
While we have mentioned the key threats for each of the AI Paradigm, we strongly encourage the reader to refer to all threats at the AI Exchange, based on the outcome of the Objective and scope definition phase in AI Red Teaming.
54
54
@@ -70,22 +70,22 @@ A systematic approach to AI security testing involves a few key steps:
Testing for resistance against Prompt injection is done by presenting a carefully crafted set of inputs with instructions to achieve unwanted model behaviour (e.g., triggering unwanted actions, offensive outputs, sensitive data disclosure) and evaluating the corresponding risks.
-[Sensitive data output from model ](/go/disclosureuseoutput/)
81
81
82
82
83
83
**Test procedure**
84
84
See the section above for the general steps in AI security testing.
85
85
The steps specific for testing against this threat are:
86
86
87
87
**(1) Establish set of relevant input attacks**
88
-
Collect a base set of crafted instructions that represent the state of the art for the attack (e.g., jailbreak attempts, invisible text, malicious URLs, data extraction attempts, attempts to get harmful content), either from an attack repository (see references) or from the resources of an an attack tool. If an attack tool has been selected to implement the test, then it will typically come with such a set. Various third party and open-source repositories and tools are available for this purpose - see further in our [Tool overview](/goto/testingtoolsgenai/).
88
+
Collect a base set of crafted instructions that represent the state of the art for the attack (e.g., jailbreak attempts, invisible text, malicious URLs, data extraction attempts, attempts to get harmful content), either from an attack repository (see references) or from the resources of an an attack tool. If an attack tool has been selected to implement the test, then it will typically come with such a set. Various third party and open-source repositories and tools are available for this purpose - see further in our [Tool overview](/go/testingtoolsgenai/).
89
89
Verify if the input attack set sufficiently covers the attack strategies described in the threat sections linked above (e.g., instruction override, role confusion, encoding tricks).
90
90
Remove the input attacks for which the risk would be accepted (see Evaluation step), but keep these aside for when context and risk appetite evolve.
91
91
@@ -144,7 +144,7 @@ Example 2:
144
144
It is of course important to also test the AI system for correct behaviour in benign situations. Depending on context, such testing may be integrated in the implementation of the security test by using the same mechanisms. Such testing ideally includes the testing of detection mechanisms, to ensure that not too many false positives are triggered by benign inputs. Positive testing is essential to ensure that security mechanisms do not degrade intended functionality or user experience beyond acceptable levels.
145
145
146
146
**References**
147
-
- See below for the [testing tools section](/goto/testingtoolsgenai/)
147
+
- See below for the [testing tools section](/go/testingtoolsgenai/)
This sub section covers the following tools for security testing Predictive AI: Adversarial Robustness Toolbox (ART), Armory, Foolbox, DeepSec, and TextAttack.
@@ -257,11 +257,11 @@ This sub section covers the following tools for security testing Predictive AI:
257
257
258
258
Notes:
259
259
260
-
- Development-time Model poisoning: Simulates attacks during development to evaluate vulnerabilities[*https://owaspai.org/goto/modelpoison/*](https://owaspai.org/goto/modelpoison/)
261
-
- Evasion:Tests model performance against adversarial inputs [*https://owaspai.org/goto/evasion/*](https://owaspai.org/goto/evasion/)
262
-
- Model exfiltration: Evaluates risks of model exploitation during usage [*https://owaspai.org/goto/modeltheftuse*](https://owaspai.org/goto/modeltheftuse/)
260
+
- Development-time Model poisoning: Simulates attacks during development to evaluate vulnerabilities[*https://owaspai.org/go/modelpoison/*](https://owaspai.org/go/modelpoison/)
261
+
- Evasion:Tests model performance against adversarial inputs [*https://owaspai.org/go/evasion/*](https://owaspai.org/go/evasion/)
262
+
- Model exfiltration: Evaluates risks of model exploitation during usage [*https://owaspai.org/go/modeltheftuse*](https://owaspai.org/go/modeltheftuse/)
263
263
- Model inference: *Assesses exposure to membership and inversion attacks*
- Development-time Model poisoning: Simulates attacks during development to evaluate vulnerabilities[*https://owaspai.org/goto/modelpoison/*](https://owaspai.org/goto/modelpoison/)
356
-
- Evasion:Tests model performance against adversarial inputs [*https://owaspai.org/goto/evasion/*](https://owaspai.org/goto/evasion/)
355
+
- Development-time Model poisoning: Simulates attacks during development to evaluate vulnerabilities[*https://owaspai.org/go/modelpoison/*](https://owaspai.org/go/modelpoison/)
356
+
- Evasion:Tests model performance against adversarial inputs [*https://owaspai.org/go/evasion/*](https://owaspai.org/go/evasion/)
357
357
- Prompt Injection: Evaluates the robustness of generative AI models by exploiting weaknesses in prompt design, leading to undesired outputs or bypassing model safeguards.
358
-
*https://owaspai.org/goto/promptinjection/*
358
+
*https://owaspai.org/go/promptinjection/*
359
359
360
360
### **Tool Name: Foolbox**
361
361
@@ -447,7 +447,7 @@ Notes:
447
447
448
448
Evasion:Tests model performance against adversarial inputs
@@ -629,12 +629,12 @@ Evasion:Tests model performance against adversarial inputs
629
629
630
630
Notes:
631
631
632
-
- Development-time Model poisoning: Simulates attacks during development to evaluate vulnerabilities[*https://owaspai.org/goto/modelpoison/*](https://owaspai.org/goto/modelpoison/)
633
-
- Evasion:Tests model performance against adversarial inputs[*https://owaspai.org/goto/evasion/*](https://owaspai.org/goto/evasion/)
632
+
- Development-time Model poisoning: Simulates attacks during development to evaluate vulnerabilities[*https://owaspai.org/go/modelpoison/*](https://owaspai.org/go/modelpoison/)
633
+
- Evasion:Tests model performance against adversarial inputs[*https://owaspai.org/go/evasion/*](https://owaspai.org/go/evasion/)
634
634
635
635
## Open source Tools for Generative AI Red Teaming
This sub section covers the following tools for security testing Generative AI: PyRIT, Garak, Prompt Fuzzer, Guardrail, and Promptfoo.
@@ -730,8 +730,8 @@ A list of GenAI test tools can also be found at the [OWASP GenAI security projec
730
730
731
731
Notes:
732
732
733
-
- Evasion:Tests model performance against adversarial inputs [*https://owaspai.org/goto/evasion/*](https://owaspai.org/goto/evasion/)
734
-
- Prompt Injection: Evaluates the robustness of generative AI models by exploiting weaknesses in prompt design, leading to undesired outputs or bypassing model safeguards.*https://owaspai.org/goto/promptinjection/*
733
+
- Evasion:Tests model performance against adversarial inputs [*https://owaspai.org/go/evasion/*](https://owaspai.org/go/evasion/)
734
+
- Prompt Injection: Evaluates the robustness of generative AI models by exploiting weaknesses in prompt design, leading to undesired outputs or bypassing model safeguards.*https://owaspai.org/go/promptinjection/*
- Evasion:Tests model performance against adversarial inputs [*https://owaspai.org/goto/evasion/*](https://owaspai.org/goto/evasion/)
824
+
- Evasion:Tests model performance against adversarial inputs [*https://owaspai.org/go/evasion/*](https://owaspai.org/go/evasion/)
825
825
- Prompt Injection: Evaluates the robustness of generative AI models by exploiting weaknesses in prompt design, leading to undesired outputs or bypassing model safeguards.
- Evasion:Tests model performance against adversarial inputs [*https://owaspai.org/goto/evasion/*](https://owaspai.org/goto/evasion/)
920
-
- Prompt Injection: Evaluates the robustness of generative AI models by exploiting weaknesses in prompt design, leading to undesired outputs or bypassing model safeguards. *https://owaspai.org/goto/promptinjection/*
919
+
- Evasion:Tests model performance against adversarial inputs [*https://owaspai.org/go/evasion/*](https://owaspai.org/go/evasion/)
920
+
- Prompt Injection: Evaluates the robustness of generative AI models by exploiting weaknesses in prompt design, leading to undesired outputs or bypassing model safeguards. *https://owaspai.org/go/promptinjection/*
921
921
922
922
### Tool Name: Guardrail
923
923
@@ -1008,8 +1008,8 @@ Notes:
1008
1008
1009
1009
Notes:
1010
1010
1011
-
- Evasion:Tests model performance against adversarial inputs [*https://owaspai.org/goto/evasion/*](https://owaspai.org/goto/evasion/)
1012
-
- Prompt Injection: Evaluates the robustness of generative AI models by exploiting weaknesses in prompt design, leading to undesired outputs or bypassing model safeguards. *https://owaspai.org/goto/promptinjection/*
1011
+
- Evasion:Tests model performance against adversarial inputs [*https://owaspai.org/go/evasion/*](https://owaspai.org/go/evasion/)
1012
+
- Prompt Injection: Evaluates the robustness of generative AI models by exploiting weaknesses in prompt design, leading to undesired outputs or bypassing model safeguards. *https://owaspai.org/go/promptinjection/*
1013
1013
1014
1014
### Tool Name: Promptfoo
1015
1015
@@ -1105,9 +1105,9 @@ Notes:
1105
1105
1106
1106
Notes:
1107
1107
1108
-
- Model exfiltration:Evaluates risks of model exploitation during usage [*https://owaspai.org/goto/modeltheftuse/*](https://owaspai.org/goto/modeltheftuse/)
1108
+
- Model exfiltration:Evaluates risks of model exploitation during usage [*https://owaspai.org/go/modeltheftuse/*](https://owaspai.org/go/modeltheftuse/)
1109
1109
- Prompt Injection: Evaluates the robustness of generative AI models by exploiting weaknesses in prompt design, leading to undesired outputs or bypassing model safeguards.
0 commit comments