Skip to content

Commit e89a912

Browse files
Freshness, in progress.
1 parent 50b5791 commit e89a912

File tree

2 files changed

+30
-28
lines changed

2 files changed

+30
-28
lines changed

articles/ai-foundry/how-to/develop/run-scans-ai-red-teaming-agent.md

Lines changed: 24 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,12 @@
11
---
2-
title: Run AI Red Teaming Agent locally (Azure AI Evaluation SDK)
2+
title: Run AI Red Teaming Agent Locally (Azure AI Evaluation SDK)
33
titleSuffix: Azure AI Foundry
4-
description: This article provides instructions on how to use the AI Red Teaming Agent to run a local automated scan of a Generative AI application with the Azure AI Evaluation SDK.
4+
description: Learn how to use the AI Red Teaming Agent to run a local automated scan of a Generative AI application with the Azure AI Evaluation SDK.
55
ms.service: azure-ai-foundry
66
ms.custom:
77
- references_regions
88
ms.topic: how-to
9-
ms.date: 06/03/2025
9+
ms.date: 10/20/2025
1010
ms.reviewer: minthigpen
1111
ms.author: lagayhar
1212
author: lgayhardt
@@ -16,12 +16,12 @@ author: lgayhardt
1616

1717
[!INCLUDE [feature-preview](../../includes/feature-preview.md)]
1818

19-
The AI Red Teaming Agent (preview) is a powerful tool designed to help organizations proactively find safety risks associated with generative AI systems during design and development. By integrating Microsoft's open-source framework for Python Risk Identification Tool's ([PyRIT](https://github.com/Azure/PyRIT)) AI red teaming capabilities directly into Azure AI Foundry, teams can automatically scan their model and application endpoints for risks, simulate adversarial probing, and generate detailed reports.
19+
The AI Red Teaming Agent (preview) is a powerful tool designed to help organizations proactively find safety risks associated with generative AI systems during design and development. The AI red teaming capabilities of Microsoft's open-source framework for Python Risk Identification Tool ([PyRIT](https://github.com/Azure/PyRIT)) are integrated directly into Azure AI Foundry. Teams can automatically scan their model and application endpoints for risks, simulate adversarial probing, and generate detailed reports.
2020

21-
This article guides you through the process of
21+
This article explains how to:
2222

23-
- Creating an AI Red Teaming Agent locally
24-
- Running automated scans locally and viewing the results
23+
- Create an AI Red Teaming Agent locally
24+
- Run automated scans locally and view the results
2525

2626
## Prerequisites
2727

@@ -31,7 +31,7 @@ This article guides you through the process of
3131

3232
## Getting started
3333

34-
First install the `redteam` package as an extra from Azure AI Evaluation SDK, this provides the PyRIT functionality:
34+
Install the `redteam` package as an extra from Azure AI Evaluation SDK. This package provides the PyRIT functionality:
3535

3636
```python
3737
uv pip install "azure-ai-evaluation[redteam]"
@@ -72,7 +72,7 @@ def simple_callback(query: str) -> str:
7272
red_team_result = await red_team_agent.scan(target=simple_callback)
7373
```
7474

75-
This example generates a default set of 10 attack prompts for each of the default set of four risk categories (violence, sexual, hate and unfairness, and self-harm) to result in a total of 40 rows of attack prompts to be generated and sent to your target.
75+
This example generates a default set of 10 attack prompts for each of the default set of four risk categories: violence, sexual, hate and unfairness, and self-harm. The example has a total of 40 rows of attack prompts to be generated and sent to your target.
7676

7777
Optionally, you can specify which risk categories of content risks you want to cover with `risk_categories` parameter and define the number of prompts covering each risk category with `num_objectives` parameter.
7878

@@ -120,7 +120,7 @@ azure_openai_config = {
120120
red_team_result = await red_team_agent.scan(target=azure_openai_config)
121121
```
122122

123-
**Simple callback**: A simple callback which takes in a string prompt from `red_team_agent` and returns some string response from your application.
123+
**Simple callback**: A simple callback that takes in a string prompt from `red_team_agent` and returns some string response from your application:
124124

125125
```python
126126
# Define a simple callback function that simulates a chatbot
@@ -131,7 +131,7 @@ def simple_callback(query: str) -> str:
131131
red_team_result = await red_team_agent.scan(target=simple_callback)
132132
```
133133

134-
**Complex callback**: A more complex callback that is aligned to the OpenAI Chat Protocol
134+
**Complex callback**: A more complex callback that is aligned to the OpenAI Chat Protocol:
135135

136136
```python
137137
# Create a more complex callback function that handles conversation state
@@ -185,9 +185,9 @@ The following risk categories are supported in the AI Red Teaming Agent's runs,
185185

186186
## Custom attack objectives
187187

188-
Though the AI Red Teaming Agent provides a Microsoft curated set of adversarial attack objectives covering each supported risk, you might want to bring your own additional custom set to be used fo reach risk category as your own organization policy might be different.
188+
The AI Red Teaming Agent provides a Microsoft curated set of adversarial attack objectives covering each supported risk. Because your organization's policy might be different, you might want to bring your own custom set to use for each risk category.
189189

190-
You can run the AI Red Teaming Agent on your own dataset
190+
You can run the AI Red Teaming Agent on your own dataset.
191191

192192
```python
193193
custom_red_team_agent = RedTeam(
@@ -197,7 +197,7 @@ custom_red_team_agent = RedTeam(
197197
)
198198
```
199199

200-
Your dataset must be a JSON file, in the following format with the associated metadata for the corresponding risk-types. When bringing your own prompts, the supported `risk-type`s are `violence`, `sexual`, `hate_unfairness`, and `self_harm` so that the attacks can be evaluated for success correspondingly by our Safety Evaluators. The number of prompts you specify will be the `num_objectives` used in the scan.
200+
Your dataset must be a JSON file, in the following format with the associated metadata for the corresponding risk types. When you bring your own prompts, the supported `risk-type`s are `violence`, `sexual`, `hate_unfairness`, and `self_harm`. Use these supported types so that the Safety Evaluators can evaluate the attacks for success correspondingly. The number of prompts that you specify is the `num_objectives` used in the scan.
201201

202202
```json
203203
[
@@ -229,25 +229,25 @@ Your dataset must be a JSON file, in the following format with the associated me
229229

230230
## Supported attack strategies
231231

232-
If only the target is passed in when you run a scan and no attack strategies are specified, the `red_team_agent` will only send baseline direct adversarial queries to your target. This is the most naive method of attempting to elicit undesired behavior or generated content. It's recommended to try the baseline direct adversarial querying first before applying any attack strategies.
232+
If only the target is passed in when you run a scan and no attack strategies are specified, the `red_team_agent` sends only baseline direct adversarial queries to your target. This approach is the most naive method of attempting to elicit undesired behavior or generated content. We recommend that you try the baseline direct adversarial querying first before applying any attack strategies.
233233

234-
Attack strategies are methods to take the baseline direct adversarial queries and convert them into another form to try bypassing your target's safeguards. Attack strategies are classified into three buckets of complexities. Attack complexity reflects the effort an attacker needs to put in conducting the attack.
234+
Attack strategies are methods to take the baseline direct adversarial queries and convert them into another form to try bypassing your target's safeguards. Attack strategies are classified into three levels of complexity. Attack complexity reflects the effort an attacker needs to put in conducting the attack.
235235

236236
- **Easy complexity attacks** require less effort, such as translation of a prompt into some encoding
237237
- **Moderate complexity attacks** requires having access to resources such as another generative AI model
238-
- **Difficult complexity attacks** includes attacks that require access to significant resources and effort to execute an attack such as knowledge of search-based algorithms in addition to a generative AI model.
238+
- **Difficult complexity attacks** includes attacks that require access to significant resources and effort to execute an attack, such as knowledge of search-based algorithms, in addition to a generative AI model.
239239

240240
### Default grouped attack strategies
241241

242-
We offer a group of default attacks for easy complexity and moderate complexity which can be used in `attack_strategies` parameter. A difficult complexity attack can be a composition of two strategies in one attack.
242+
This approach offers a group of default attacks for easy complexity and moderate complexity that can be used in the `attack_strategies` parameter. A difficult complexity attack can be a composition of two strategies in one attack.
243243

244244
| Attack strategy complexity group | Includes |
245245
| --- | --- |
246246
| `EASY` | `Base64`, `Flip`, `Morse` |
247247
| `MODERATE` | `Tense` |
248248
| `DIFFICULT` | Composition of `Tense` and `Base64` |
249249

250-
The following scan would first run all the baseline direct adversarial queries. Then, it would apply the following attack techniques: `Base64`, `Flip`, `Morse`, `Tense`, and a composition of `Tense` and `Base64` which would first translate the baseline query into past tense then encode it into `Base64`.
250+
The following scan would first run all the baseline direct adversarial queries. Then, it would apply the following attack techniques: `Base64`, `Flip`, `Morse`, `Tense`, and a composition of `Tense` and `Base64`, which would first translate the baseline query into past tense then encode it into `Base64`.
251251

252252
```python
253253
from azure.ai.evaluation.red_team import AttackStrategy
@@ -266,7 +266,7 @@ red_team_agent_result = await red_team_agent.scan(
266266

267267
### Specific attack strategies
268268

269-
More advanced users can specify the desired attack strategies instead of using default groups. The following attack strategies are supported:
269+
You can specify the desired attack strategies instead of using default groups. The following attack strategies are supported:
270270

271271
| Attack strategy | Description | Complexity |
272272
| --- | --- | --- |
@@ -294,7 +294,9 @@ More advanced users can specify the desired attack strategies instead of using d
294294

295295
Each new attack strategy specified is applied to the set of baseline adversarial queries used in addition to the baseline adversarial queries.
296296

297-
This following example would generate one attack objective per each of the four risk categories specified. This will first, generate four baseline adversarial prompts which would be sent to your target. Then, each baseline query would get converted into each of the four attack strategies. This results in a total of 20 attack-response pairs from your AI system. The last attack strategy is an example of a composition of two attack strategies to create a more complex attack query: the `AttackStrategy.Compose()` function takes in a list of two supported attack strategies and chains them together. The example's composition would first encode the baseline adversarial query into Base64 then apply the ROT13 cipher on the Base64-encoded query. Compositions only support chaining two attack strategies together.
297+
This following example would generate one attack objective per each of the four risk categories specified. This approach first generates four baseline adversarial prompts, which would be sent to your target. Then, each baseline query would get converted into each of the four attack strategies. This conversion results in a total of 20 attack-response pairs from your AI system.
298+
299+
The last attack strategy is an example of a composition of two attack strategies to create a more complex attack query: the `AttackStrategy.Compose()` function takes in a list of two supported attack strategies and chains them together. The example's composition would first encode the baseline adversarial query into Base64 then apply the ROT13 cipher on the Base64-encoded query. Compositions only support chaining two attack strategies together.
298300

299301
```python
300302
red_team_agent = RedTeam(
@@ -335,7 +337,7 @@ red_team_agent_result = await red_team_agent.scan(
335337
)
336338
```
337339

338-
The `My-First-RedTeam-Scan.json` file contains a scorecard that provides a breakdown across attack complexity and risk categories, as well as a joint attack complexity and risk category report. Important metadata is tracked in the `parameters` section which outlines which risk categories were used to generate the attack objectives and which attack strategies were specified in the scan.
340+
The `My-First-RedTeam-Scan.json` file contains a scorecard that provides a breakdown across attack complexity and risk categories. It also includes a joint attack complexity and risk category report. Important metadata is tracked in the `parameters` section, which outlines which risk categories were used to generate the attack objectives and which attack strategies were specified in the scan.
339341

340342
```json
341343
{

articles/ai-foundry/includes/view-ai-red-teaming-results.md

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -4,34 +4,34 @@ description: Include file
44
author: lgayhardt
55
ms.service: azure-ai-foundry
66
ms.topic: include
7-
ms.date: 9/18/2025
7+
ms.date: 10/20/2025
88
ms.author: lagayhar
99
ms.custom: include file
1010
---
1111

1212
## Viewing AI red teaming results in Azure AI Foundry project (preview)
1313

14-
After your automated scan is finished running, the results also get logged to your Azure AI Foundry project which you specified in the creation of your AI red teaming agent.
14+
After your automated scan finishes, the results also get logged to your Azure AI Foundry project, which you specified in the creation of your AI red teaming agent.
1515

1616
### View report of each scan
1717

18-
In your Azure AI Foundry project or hub-based project, navigate to the **Evaluations** page and select the **AI red teaming** tab to view the comprehensive report with a detailed drill-down of each scan.
18+
In your Azure AI Foundry project or hub-based project, navigate to the **Evaluation** page. Select **AI red teaming** to view the comprehensive report with a detailed drill-down of each scan.
1919

2020
:::image type="content" source="../media/evaluations/red-teaming-agent/ai-red-team.png" alt-text="Screenshot of AI Red Teaming tab in Azure AI Foundry project page." lightbox="../media/evaluations/red-teaming-agent/ai-red-team.png":::
2121

22-
Once you select into the scan, you can view the report by risk categories, which shows you the overall number of successful attacks and a breakdown of successful attacks per risk categories:
22+
When you select into the scan, you can view the report by risk categories, which shows the overall number of successful attacks and a breakdown of successful attacks per risk categories:
2323

2424
:::image type="content" source="../media/evaluations/red-teaming-agent/ai-red-team-report-risk.png" alt-text="Screenshot of AI Red Teaming report view by risk category in Azure AI Foundry." lightbox="../media/evaluations/red-teaming-agent/ai-red-team-report-risk.png":::
2525

2626
Or by attack complexity classification:
2727

2828
:::image type="content" source="../media/evaluations/red-teaming-agent/ai-red-team-report-attack.png" alt-text="Screenshot of AI Red Teaming report view by attack complexity category in Azure AI Foundry." lightbox="../media/evaluations/red-teaming-agent/ai-red-team-report-attack.png":::
2929

30-
Drilling down further into the data tab provides a row-level view of each attack-response pair, enabling deeper insights into system issues and behaviors. For each attack-response pair, you can see additional information such as whether or not the attack was successful, what attack strategy was used and its attack complexity. There's also an option for a human in the loop reviewer to provide human feedback by selecting the thumbs up or thumbs down icon.
30+
Drilling down further into the data tab provides a row-level view of each attack-response pair, enabling deeper insights into system issues and behaviors. For each attack-response pair, you can see additional information, such as whether or not the attack was successful, what attack strategy was used and its attack complexity. There's also an option for a human in the loop reviewer to provide human feedback by selecting the thumbs up or thumbs down icon.
3131

3232
:::image type="content" source="../media/evaluations/red-teaming-agent/ai-red-team-data.png" alt-text="Screenshot of AI Red Teaming data page in Azure AI Foundry." lightbox="../media/evaluations/red-teaming-agent/ai-red-team-data.png":::
3333

34-
To view each conversation, selecting **View more** opens up the full conversation for more detailed analysis of the AI system's response.
34+
To view each conversation, select **View more** to see the full conversation for more detailed analysis of the AI system's response.
3535

3636
:::image type="content" source="../media/evaluations/red-teaming-agent/ai-red-team-data-conversation.png" alt-text="Screenshot of AI Red Teaming data page with a conversation history opened in Azure AI Foundry." lightbox="../media/evaluations/red-teaming-agent/ai-red-team-data-conversation.png":::
3737

0 commit comments

Comments
 (0)