Skip to content

Commit 677098f

Browse files
goto to go
1 parent f75d082 commit 677098f

File tree

1 file changed

+39
-39
lines changed

1 file changed

+39
-39
lines changed

content/ai_exchange/content/docs/5_testing.md

Lines changed: 39 additions & 39 deletions
Original file line numberDiff line numberDiff line change
@@ -5,12 +5,12 @@ heroText: "AI security tests simulate adversarial behaviours to uncover vulnerab
55
weight: 6
66
---
77
> Category: discussion
8-
> Permalink: https://owaspai.org/goto/testing/
8+
> Permalink: https://owaspai.org/go/testing/
99
1010
## Introduction
1111
Testing an AI system’s security relies on three strategies:
12-
1. **Conventional security testing** (i.e. _pentesting_). See [secure software development](/goto/secdevprogram/).
13-
2. **Model performance validation** (see [continuous validation](/goto/continuousvalidation/)): testing if the model behaves according to its specified acceptance criteria using a testing set with inputs and outputs that represent the intended behaviour of the model. For security,this is to detect if the model behaviour has been altered permanently through data poisoning or model poisoning. For non-security, it is for testing functional correctness, model drift etc.
12+
1. **Conventional security testing** (i.e. _pentesting_). See [secure software development](/go/secdevprogram/).
13+
2. **Model performance validation** (see [continuous validation](/go/continuousvalidation/)): testing if the model behaves according to its specified acceptance criteria using a testing set with inputs and outputs that represent the intended behaviour of the model. For security,this is to detect if the model behaviour has been altered permanently through data poisoning or model poisoning. For non-security, it is for testing functional correctness, model drift etc.
1414
3. **AI security testing** (this section), the part of _AI red teaming_ that tests if the AI model can withstand certain attacks, by simulating these attacks.
1515

1616
**Scope of AI security testing**
@@ -30,25 +30,25 @@ This section discusses:
3030

3131

3232
## Threats to test for
33-
A comprehensive list of threats and controls coverage based on assets, impact, and attack surfaces is available as a [Periodic Table of AI Security](/goto/periodictable/). In this section, we provide a list of tools for AI Red Teaming Predictive and Generative AI systems, aiding steps such as Attack Scenarios, Test Execution through automated red teaming, and, oftentimes, Risk Assessment through risk scoring.
33+
A comprehensive list of threats and controls coverage based on assets, impact, and attack surfaces is available as a [Periodic Table of AI Security](/go/periodictable/). In this section, we provide a list of tools for AI Red Teaming Predictive and Generative AI systems, aiding steps such as Attack Scenarios, Test Execution through automated red teaming, and, oftentimes, Risk Assessment through risk scoring.
3434

3535
Each listed tool addresses a subset of the threat landscape of AI systems. Below, we list some key threats to consider:
3636

3737
**Predictive AI:** Predictive AI systems are designed to make predictions or classifications based on input data. Examples include fraud detection, image recognition, and recommendation systems.
3838

3939
**Key Predictive AI threats to test for, beyond conventional security testing:**
4040

41-
- [Evasion Attacks:](https://owaspai.org/goto/evasion/) These attacks occur when an attacker crafts inputs with data to mislead the model, causing it to perform its task incorrectly.
42-
- [Model Theft](https://owaspai.org/goto/modeltheftuse/): In this attack, the model’s parameters or functionality are stolen. This enables the attacker to create a replica model, which can then be used as an oracle for crafting adversarial attacks and other compounded threats.
43-
- [Model Poisoning](https://owaspai.org/goto/modelpoison/): This involves the manipulation of data, the data pipeline, the model, or the model training supply chain during the training phase (development phase). The attacker’s goal is to alter the model’s behavior which could result in undesired model operation.
41+
- [Evasion Attacks:](https://owaspai.org/go/evasion/) These attacks occur when an attacker crafts inputs with data to mislead the model, causing it to perform its task incorrectly.
42+
- [Model Theft](https://owaspai.org/go/modeltheftuse/): In this attack, the model’s parameters or functionality are stolen. This enables the attacker to create a replica model, which can then be used as an oracle for crafting adversarial attacks and other compounded threats.
43+
- [Model Poisoning](https://owaspai.org/go/modelpoison/): This involves the manipulation of data, the data pipeline, the model, or the model training supply chain during the training phase (development phase). The attacker’s goal is to alter the model’s behavior which could result in undesired model operation.
4444

4545
**Generative AI:** Generative AI systems produce outputs such as text, images, or audio. Examples include large language models (LLMs) like ChatGPT and large vision models (LVMs) like DALL-E and MidJourney.
4646

4747
**Key Generative AI threats to test for, beyond conventional security testing**:
4848

49-
- [Prompt Injection](https://owaspai.org/goto/promptinjection/): In this type of attack, the attacker provides the model with manipulative instructions aimed at achieving malicious outcomes or objectives
50-
- [Sensitive data output from model ](/goto/disclosureinoutput/): A form of prompt injection, aiming to let the model disclose sensitive data
51-
- [Insecure Output Handling](https://owaspai.org/goto/outputconatinsconventionalinjection/): Generative AI systems can be vulnerable to traditional injection attacks, leading to risks if the outputs are improperly handled or processed.
49+
- [Prompt Injection](https://owaspai.org/go/promptinjection/): In this type of attack, the attacker provides the model with manipulative instructions aimed at achieving malicious outcomes or objectives
50+
- [Sensitive data output from model ](/go/disclosureinoutput/): A form of prompt injection, aiming to let the model disclose sensitive data
51+
- [Insecure Output Handling](https://owaspai.org/go/outputconatinsconventionalinjection/): Generative AI systems can be vulnerable to traditional injection attacks, leading to risks if the outputs are improperly handled or processed.
5252

5353
While we have mentioned the key threats for each of the AI Paradigm, we strongly encourage the reader to refer to all threats at the AI Exchange, based on the outcome of the Objective and scope definition phase in AI Red Teaming.
5454

@@ -70,22 +70,22 @@ A systematic approach to AI security testing involves a few key steps:
7070

7171
### Testing against Prompt injection
7272
> Category: AI security test
73-
> Permalink: https://owaspai.org/goto/testingpromptinjection/
73+
> Permalink: https://owaspai.org/go/testingpromptinjection/
7474
7575
**Test description**
7676
Testing for resistance against Prompt injection is done by presenting a carefully crafted set of inputs with instructions to achieve unwanted model behaviour (e.g., triggering unwanted actions, offensive outputs, sensitive data disclosure) and evaluating the corresponding risks.
7777
This covers the following threats:
78-
- [Direct prompt injection](/goto/directpromptinjection/)
79-
- [Indirect prompt injection](/goto/indirectpromptinjection/)
80-
- [Sensitive data output from model ](/goto/disclosureuseoutput/)
78+
- [Direct prompt injection](/go/directpromptinjection/)
79+
- [Indirect prompt injection](/go/indirectpromptinjection/)
80+
- [Sensitive data output from model ](/go/disclosureuseoutput/)
8181

8282

8383
**Test procedure**
8484
See the section above for the general steps in AI security testing.
8585
The steps specific for testing against this threat are:
8686

8787
**(1) Establish set of relevant input attacks**
88-
Collect a base set of crafted instructions that represent the state of the art for the attack (e.g., jailbreak attempts, invisible text, malicious URLs, data extraction attempts, attempts to get harmful content), either from an attack repository (see references) or from the resources of an an attack tool. If an attack tool has been selected to implement the test, then it will typically come with such a set. Various third party and open-source repositories and tools are available for this purpose - see further in our [Tool overview](/goto/testingtoolsgenai/).
88+
Collect a base set of crafted instructions that represent the state of the art for the attack (e.g., jailbreak attempts, invisible text, malicious URLs, data extraction attempts, attempts to get harmful content), either from an attack repository (see references) or from the resources of an an attack tool. If an attack tool has been selected to implement the test, then it will typically come with such a set. Various third party and open-source repositories and tools are available for this purpose - see further in our [Tool overview](/go/testingtoolsgenai/).
8989
Verify if the input attack set sufficiently covers the attack strategies described in the threat sections linked above (e.g., instruction override, role confusion, encoding tricks).
9090
Remove the input attacks for which the risk would be accepted (see Evaluation step), but keep these aside for when context and risk appetite evolve.
9191

@@ -144,7 +144,7 @@ Example 2:
144144
It is of course important to also test the AI system for correct behaviour in benign situations. Depending on context, such testing may be integrated in the implementation of the security test by using the same mechanisms. Such testing ideally includes the testing of detection mechanisms, to ensure that not too many false positives are triggered by benign inputs. Positive testing is essential to ensure that security mechanisms do not degrade intended functionality or user experience beyond acceptable levels.
145145

146146
**References**
147-
- See below for the [testing tools section](/goto/testingtoolsgenai/)
147+
- See below for the [testing tools section](/go/testingtoolsgenai/)
148148
- [Microsoft's promptbench](https://github.com/microsoft/promptbench/blob/main/promptbench/prompt_attack/README.md)
149149
- [Overview of benchmarks](https://www.promptfoo.dev/blog/top-llm-safety-bias-benchmarks/)
150150
- [AdvBench](https://huggingface.co/datasets/walledai/AdvBench)
@@ -164,7 +164,7 @@ The below section will cover the tools for predictive AI, followed by the sectio
164164

165165
## **Open source Tools for Predictive AI Red Teaming**
166166
> Category: tool review
167-
> Permalink: https://owaspai.org/goto/testingtoolspredictiveai/
167+
> Permalink: https://owaspai.org/go/testingtoolspredictiveai/
168168
169169

170170
This sub section covers the following tools for security testing Predictive AI: Adversarial Robustness Toolbox (ART), Armory, Foolbox, DeepSec, and TextAttack.
@@ -257,11 +257,11 @@ This sub section covers the following tools for security testing Predictive AI:
257257

258258
Notes:
259259

260-
- Development-time Model poisoning: Simulates attacks during development to evaluate vulnerabilities[*https://owaspai.org/goto/modelpoison/*](https://owaspai.org/goto/modelpoison/)
261-
- Evasion:Tests model performance against adversarial inputs  [*https://owaspai.org/goto/evasion/*](https://owaspai.org/goto/evasion/)
262-
- Model exfiltration: Evaluates risks of model exploitation during usage  [*https://owaspai.org/goto/modeltheftuse*](https://owaspai.org/goto/modeltheftuse/)
260+
- Development-time Model poisoning: Simulates attacks during development to evaluate vulnerabilities[*https://owaspai.org/go/modelpoison/*](https://owaspai.org/go/modelpoison/)
261+
- Evasion:Tests model performance against adversarial inputs  [*https://owaspai.org/go/evasion/*](https://owaspai.org/go/evasion/)
262+
- Model exfiltration: Evaluates risks of model exploitation during usage  [*https://owaspai.org/go/modeltheftuse*](https://owaspai.org/go/modeltheftuse/)
263263
- Model inference: *Assesses exposure to membership and inversion attacks*
264-
*[https://owaspai.org/goto/modelinversionandmembership/](https://owaspai.org/goto/modelinversionandmembership/)*
264+
*[https://owaspai.org/go/modelinversionandmembership/](https://owaspai.org/go/modelinversionandmembership/)*
265265

266266
### **Tool Name: Armory**
267267

@@ -352,10 +352,10 @@ Notes:
352352

353353
Notes:
354354

355-
- Development-time Model poisoning: Simulates attacks during development to evaluate vulnerabilities[*https://owaspai.org/goto/modelpoison/*](https://owaspai.org/goto/modelpoison/)
356-
- Evasion:Tests model performance against adversarial inputs  [*https://owaspai.org/goto/evasion/*](https://owaspai.org/goto/evasion/)
355+
- Development-time Model poisoning: Simulates attacks during development to evaluate vulnerabilities[*https://owaspai.org/go/modelpoison/*](https://owaspai.org/go/modelpoison/)
356+
- Evasion:Tests model performance against adversarial inputs  [*https://owaspai.org/go/evasion/*](https://owaspai.org/go/evasion/)
357357
- Prompt Injection: Evaluates the robustness of generative AI models by exploiting weaknesses in prompt design, leading to undesired outputs or bypassing model safeguards.
358-
*https://owaspai.org/goto/promptinjection/*
358+
*https://owaspai.org/go/promptinjection/*
359359

360360
### **Tool Name: Foolbox**
361361

@@ -447,7 +447,7 @@ Notes:
447447

448448
Evasion:Tests model performance against adversarial inputs
449449

450-
[*https://owaspai.org/goto/evasion/*](https://owaspai.org/goto/evasion/)
450+
[*https://owaspai.org/go/evasion/*](https://owaspai.org/go/evasion/)
451451

452452
**Tool Name: DeepSec**
453453

@@ -539,7 +539,7 @@ Notes:
539539

540540
Evasion:Tests model performance against adversarial inputs
541541

542-
[*https://owaspai.org/goto/evasion/*](https://owaspai.org/goto/evasion/)
542+
[*https://owaspai.org/go/evasion/*](https://owaspai.org/go/evasion/)
543543

544544
### Tool Name: TextAttack
545545

@@ -629,12 +629,12 @@ Evasion:Tests model performance against adversarial inputs
629629

630630
Notes:
631631

632-
- Development-time Model poisoning: Simulates attacks during development to evaluate vulnerabilities[*https://owaspai.org/goto/modelpoison/*](https://owaspai.org/goto/modelpoison/)
633-
- Evasion:Tests model performance against adversarial inputs[*https://owaspai.org/goto/evasion/*](https://owaspai.org/goto/evasion/)
632+
- Development-time Model poisoning: Simulates attacks during development to evaluate vulnerabilities[*https://owaspai.org/go/modelpoison/*](https://owaspai.org/go/modelpoison/)
633+
- Evasion:Tests model performance against adversarial inputs[*https://owaspai.org/go/evasion/*](https://owaspai.org/go/evasion/)
634634

635635
## Open source Tools for Generative AI Red Teaming
636636
> Category: tool review
637-
> Permalink: https://owaspai.org/goto/testingtoolsgenai/
637+
> Permalink: https://owaspai.org/go/testingtoolsgenai/
638638
639639

640640
This sub section covers the following tools for security testing Generative AI: PyRIT, Garak, Prompt Fuzzer, Guardrail, and Promptfoo.
@@ -730,8 +730,8 @@ A list of GenAI test tools can also be found at the [OWASP GenAI security projec
730730

731731
Notes:
732732

733-
- Evasion:Tests model performance against adversarial inputs  [*https://owaspai.org/goto/evasion/*](https://owaspai.org/goto/evasion/)
734-
- Prompt Injection: Evaluates the robustness of generative AI models by exploiting weaknesses in prompt design, leading to undesired outputs or bypassing model safeguards.*https://owaspai.org/goto/promptinjection/*
733+
- Evasion:Tests model performance against adversarial inputs  [*https://owaspai.org/go/evasion/*](https://owaspai.org/go/evasion/)
734+
- Prompt Injection: Evaluates the robustness of generative AI models by exploiting weaknesses in prompt design, leading to undesired outputs or bypassing model safeguards.*https://owaspai.org/go/promptinjection/*
735735

736736
### Tool Name: Garak
737737

@@ -821,9 +821,9 @@ https://github.com/NVIDIA/garak |
821821
| Indirect prompt injection | |
822822
| Development time model theft | |
823823
| Output contains injection | |
824-
- Evasion:Tests model performance against adversarial inputs  [*https://owaspai.org/goto/evasion/*](https://owaspai.org/goto/evasion/)
824+
- Evasion:Tests model performance against adversarial inputs  [*https://owaspai.org/go/evasion/*](https://owaspai.org/go/evasion/)
825825
- Prompt Injection: Evaluates the robustness of generative AI models by exploiting weaknesses in prompt design, leading to undesired outputs or bypassing model safeguards.
826-
*https://owaspai.org/goto/promptinjection/*
826+
*https://owaspai.org/go/promptinjection/*
827827

828828
### Tool Name: Prompt Fuzzer
829829

@@ -916,8 +916,8 @@ https://github.com/NVIDIA/garak |
916916

917917
Notes:
918918

919-
- Evasion:Tests model performance against adversarial inputs  [*https://owaspai.org/goto/evasion/*](https://owaspai.org/goto/evasion/)
920-
- Prompt Injection: Evaluates the robustness of generative AI models by exploiting weaknesses in prompt design, leading to undesired outputs or bypassing model safeguards. *https://owaspai.org/goto/promptinjection/*
919+
- Evasion:Tests model performance against adversarial inputs  [*https://owaspai.org/go/evasion/*](https://owaspai.org/go/evasion/)
920+
- Prompt Injection: Evaluates the robustness of generative AI models by exploiting weaknesses in prompt design, leading to undesired outputs or bypassing model safeguards. *https://owaspai.org/go/promptinjection/*
921921

922922
### Tool Name: Guardrail
923923

@@ -1008,8 +1008,8 @@ Notes:
10081008

10091009
Notes:
10101010

1011-
- Evasion:Tests model performance against adversarial inputs  [*https://owaspai.org/goto/evasion/*](https://owaspai.org/goto/evasion/)
1012-
- Prompt Injection: Evaluates the robustness of generative AI models by exploiting weaknesses in prompt design, leading to undesired outputs or bypassing model safeguards. *https://owaspai.org/goto/promptinjection/*
1011+
- Evasion:Tests model performance against adversarial inputs  [*https://owaspai.org/go/evasion/*](https://owaspai.org/go/evasion/)
1012+
- Prompt Injection: Evaluates the robustness of generative AI models by exploiting weaknesses in prompt design, leading to undesired outputs or bypassing model safeguards. *https://owaspai.org/go/promptinjection/*
10131013

10141014
### Tool Name: Promptfoo
10151015

@@ -1105,9 +1105,9 @@ Notes:
11051105

11061106
Notes:
11071107

1108-
- Model exfiltration:Evaluates risks of model exploitation during usage  [*https://owaspai.org/goto/modeltheftuse/*](https://owaspai.org/goto/modeltheftuse/)
1108+
- Model exfiltration:Evaluates risks of model exploitation during usage  [*https://owaspai.org/go/modeltheftuse/*](https://owaspai.org/go/modeltheftuse/)
11091109
- Prompt Injection: Evaluates the robustness of generative AI models by exploiting weaknesses in prompt design, leading to undesired outputs or bypassing model safeguards.
1110-
*[https://owaspai.org/goto/promptinjection/](https://owaspai.org/goto/promptinjection/)*
1110+
*[https://owaspai.org/go/promptinjection/](https://owaspai.org/go/promptinjection/)*
11111111

11121112
## Tool Ratings
11131113
This section rates the discussed tools by Popularity, Community Support, Scalability and Integration.

0 commit comments

Comments
 (0)