Skip to content

Commit 896dc37

Browse files
authored
Merge pull request #3931 from MicrosoftDocs/main
Publish to live, Friday 4 AM PST, 4/4
2 parents 4d7dd5f + 1092379 commit 896dc37

File tree

64 files changed

+1829
-218
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

64 files changed

+1829
-218
lines changed
Lines changed: 112 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,112 @@
1+
---
2+
title: AI Red Teaming Agent
3+
titleSuffix: Azure AI Foundry
4+
description: This article provides conceptual overview of the AI Red Teaming Agent.
5+
manager: scottpolly
6+
ms.service: azure-ai-foundry
7+
ms.topic: how-to
8+
ms.date: 04/04/2025
9+
ms.reviewer: minthigpen
10+
ms.author: lagayhar
11+
author: lgayhardt
12+
---
13+
14+
# AI Red Teaming Agent (preview)
15+
16+
[!INCLUDE [feature-preview](../includes/feature-preview.md)]
17+
18+
The AI Red Teaming Agent (preview) is a powerful tool designed to help organizations proactively find safety risks associated with generative AI systems during design and development of generative AI models and applications.
19+
20+
Traditional red teaming involves exploiting the cyber kill chain and describes the process by which a system is tested for security vulnerabilities. However, with the rise of generative AI, the term AI red teaming has been coined to describe probing for novel risks (both content safety and security related) that these systems present and refers to simulating the behavior of an adversarial user who is trying to cause your AI system to misbehave in a particular way.
21+
22+
The AI Red Teaming Agent leverages Microsoft's open-source framework for Python Risk Identification Tool's ([PyRIT](https://github.com/Azure/PyRIT)) AI red teaming capabilities along with Azure AI Foundry's [Risk and Safety Evaluations](./evaluation-metrics-built-in.md#risk-and-safety-evaluators) to help you automatically assess safety issues in three ways:
23+
24+
- **Automated scans for content safety risks:** Firstly, you can automatically scan your model and application endpoints for safety risks by simulating adversarial probing.
25+
- **Evaluate probing success:** Next, you can evaluate and score each attack-response pair to generate insightful metrics such as Attack Success Rate (ASR).
26+
- **Reporting and logging** Finally, you can generate a score card of the attack probing techniques and risk categories to help you decide if the system is ready for deployment. Findings can be logged, monitored, and tracked over time directly in Azure AI Foundry, ensuring compliance and continuous risk mitigation.
27+
28+
Together these components (scanning, evaluating, and reporting) help teams understand how AI systems respond to common attacks, ultimately guiding a comprehensive risk management strategy.
29+
30+
## When to use the AI Red Teaming Agent's scans
31+
32+
When thinking about AI-related safety risks developing trustworthy AI systems, Microsoft uses NIST's framework to mitigate risk effectively: Govern, Map, Measure, Manage. We'll focus on the last three parts in relation to the generative AI development lifecycle:
33+
34+
- Map: Identify relevant risks and define your use case.
35+
- Measure: Evaluate risks at scale.
36+
- Manage: Mitigate risks in production and monitor with a plan for incident response.
37+
38+
:::image type="content" source="../media/evaluations/red-teaming-agent/map-measure-mitigate-ai-red-teaming.png" alt-text="Diagram of how to use AI Red Teaming Agent showing proactive to reactive and less costly to more costly." lightbox="../media/evaluations/red-teaming-agent/map-measure-mitigate-ai-red-teaming.png":::
39+
40+
AI Red Teaming Agent can be used to run automated scans and simulate adversarial probing to help accelerate the identification and evaluation of known risks at scale. This helps teams "shift left" from costly reactive incidents to more proactive testing frameworks that can catch issues before deployment. Manual AI red teaming process is time and resource intensive. It relies on the creativity of safety and security expertise to simulate adversarial probing. This process can create a bottleneck for many organizations to accelerate AI adoption. With the AI Red Teaming Agent, organizations can now leverage Microsoft’s deep expertise to scale and accelerate their AI development with Trustworthy AI at the forefront.
41+
42+
We encourage teams to use the AI Red Teaming Agent to run automated scans throughout the design, development, and pre-deployment stage:
43+
44+
- Design: Picking out the safest foundational model on your use case.
45+
- Development: Upgrading models within your application or creating fine-tuned models for your specific application.
46+
- Pre-deployment: Before deploying GenAI applications to productions.
47+
48+
In production, we recommend implementing **safety mitigations** such as [Azure AI Content Safety filters](../../ai-services/content-safety/overview.md) or implementing safety system messages using our [templates](../../ai-services/openai/concepts/safety-system-message-templates.md).
49+
50+
## How AI Red Teaming works
51+
52+
The AI Red Teaming Agent helps automate simulation of adversarial probing of your target AI system. It provides a curated dataset of seed prompts or attack objectives per supported risk categories. These can be used to automate direct adversarial probing. However, direct adversarial probing might be easily caught by existing safety alignments of your model deployment. Applying attack strategies from PyRIT provides an extra conversion that can help to by-pass or subvert the AI system into producing undesirable content.
53+
54+
In the diagram, we can see that a direct ask to your AI system on how to loot a bank triggers a refusal response. However, applying an attack strategy such as flipping all the characters can help trick the model into answering the question.
55+
56+
:::image type="content" source="../media/evaluations/red-teaming-agent/how-ai-red-teaming-works.png" alt-text="Diagram of how AI Red Teaming Agent works." lightbox="../media/evaluations/red-teaming-agent/how-ai-red-teaming-works.png":::
57+
58+
Additionally, the AI Red Teaming Agent provides users with a fine-tuned adversarial large language model dedicated to the task of simulating adversarial attacks and evaluating responses that might have harmful content in them with the Risk and Safety Evaluators. The key metric to assess the risk posture of your AI system is Attack Success Rate (ASR) which calculates the percentage of successful attacks over the number of total attacks.
59+
60+
## Supported risk categories
61+
62+
The following risk categories are supported in the AI Red Teaming Agent from [Risk and Safety Evaluations](./evaluation-metrics-built-in.md#risk-and-safety-evaluators). Only text-based scenarios are supported.
63+
64+
| **Risk category** | **Description** |
65+
|------------------|-----------------|
66+
| **Hateful and Unfair Content** | Hateful and unfair content refers to any language or imagery pertaining to hate toward or unfair representations of individuals and social groups along factors including but not limited to race, ethnicity, nationality, gender, sexual orientation, religion, immigration status, ability, personal appearance, and body size. Unfairness occurs when AI systems treat or represent social groups inequitably, creating or contributing to societal inequities. |
67+
| **Sexual Content** | Sexual content includes language or imagery pertaining to anatomical organs and genitals, romantic relationships, acts portrayed in erotic terms, pregnancy, physical sexual acts (including assault or sexual violence), prostitution, pornography, and sexual abuse. |
68+
| **Violent Content** | Violent content includes language or imagery pertaining to physical actions intended to hurt, injure, damage, or kill someone or something. It also includes descriptions of weapons and guns (and related entities such as manufacturers and associations). |
69+
| **Self-Harm-Related Content** | Self-harm-related content includes language or imagery pertaining to actions intended to hurt, injure, or damage one's body or kill oneself. |
70+
71+
## Supported attack strategies
72+
73+
The following attack strategies are supported in the AI Red Teaming Agent from [PyRIT](https://azure.github.io/PyRIT/index.html):
74+
75+
| **Attack Strategy** | **Description** |
76+
|---------------------|-----------------|
77+
| AnsiAttack | Utilizes ANSI escape sequences to manipulate text appearance and behavior. |
78+
| AsciiArt | Generates visual art using ASCII characters, often used for creative or obfuscation purposes. |
79+
| AsciiSmuggler | Conceals data within ASCII characters, making it harder to detect. |
80+
| Atbash | Implements the Atbash cipher, a simple substitution cipher where each letter is mapped to its reverse. |
81+
| Base64 | Encodes binary data into a text format using Base64, commonly used for data transmission. |
82+
| Binary | Converts text into binary code, representing data in a series of 0s and 1s. |
83+
| Caesar | Applies the Caesar cipher, a substitution cipher that shifts characters by a fixed number of positions. |
84+
| CharacterSpace | Alters text by adding spaces between characters, often used for obfuscation. |
85+
| CharSwap | Swaps characters within text to create variations or obfuscate the original content. |
86+
| Diacritic | Adds diacritical marks to characters, changing their appearance and sometimes their meaning. |
87+
| Flip | Flips characters from front to back, creating a mirrored effect. |
88+
| Leetspeak | Transforms text into Leetspeak, a form of encoding that replaces letters with similar-looking numbers or symbols. |
89+
| Morse | Encodes text into Morse code, using dots and dashes to represent characters. |
90+
| ROT13 | Applies the ROT13 cipher, a simple substitution cipher that shifts characters by 13 positions. |
91+
| SuffixAppend | Appends an adversarial suffix to the prompt |
92+
| StringJoin | Joins multiple strings together, often used for concatenation or obfuscation. |
93+
| UnicodeConfusable | Uses Unicode characters that look similar to standard characters, creating visual confusion. |
94+
| UnicodeSubstitution | Substitutes standard characters with Unicode equivalents, often for obfuscation. |
95+
| Url | Encodes text into URL format |
96+
| Jailbreak | Injects specially crafted prompts to bypass AI safeguards, known as User Injected Prompt Attacks (UPIA). |
97+
| Tense | Changes the tense of text, specifically converting it into past tense. |
98+
99+
## Learn more
100+
101+
Get started with our [documentation on how to run an automated scan for safety risks with the AI Red Teaming Agent](../how-to/develop/run-scans-ai-red-teaming-agent.md).
102+
103+
Learn more about the tools leveraged by the AI Red Teaming Agent.
104+
105+
- [Azure AI Risk and Safety Evaluations](./safety-evaluations-transparency-note.md)
106+
- [PyRIT: Python Risk Identification Tool](https://github.com/Azure/PyRIT)
107+
108+
The most effective strategies for risk assessment we’ve seen leverage automated tools to surface potential risks, which are then analyzed by expert human teams for deeper insights. If your organization is just starting with AI red teaming, we encourage you to explore the resources created by our own AI red team at Microsoft to help you get started.
109+
110+
- [Planning red teaming for large language models (LLMs) and their applications](../../ai-services/openai/concepts/red-teaming.md)
111+
- [Three takeaways from red teaming 100 generative AI products](https://www.microsoft.com/security/blog/2025/01/13/3-takeaways-from-red-teaming-100-generative-ai-products/)
112+
- [Microsoft AI Red Team building future of safer AI](https://www.microsoft.com/security/blog/2023/08/07/microsoft-ai-red-team-building-future-of-safer-ai/)

articles/ai-foundry/concepts/evaluation-approach-gen-ai.md

Lines changed: 9 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ ms.custom:
99
- build-2024
1010
- ignite-2024
1111
ms.topic: conceptual
12-
ms.date: 12/23/2024
12+
ms.date: 04/04/2025
1313
ms.reviewer: mithigpe
1414
ms.author: lagayhar
1515
author: lgayhardt
@@ -56,17 +56,18 @@ Pre-production evaluation involves:
5656
The pre-production stage acts as a final quality check, reducing the risk of deploying an AI application that doesn't meet the desired performance or safety standards.
5757

5858
- Bring your own data: You can evaluate your AI applications in pre-production using your own evaluation data with Azure AI Foundry or [Azure AI Evaluation SDK’s](../how-to/develop/evaluate-sdk.md) supported evaluators, including [generation quality, safety,](./evaluation-metrics-built-in.md) or [custom evaluators](../how-to/develop/evaluate-sdk.md#custom-evaluators), and [view results via the Azure AI Foundry portal](../how-to/evaluate-results.md).
59-
- Simulators: If you don’t have evaluation data (test data), Azure AI [Evaluation SDK’s simulators](..//how-to/develop/simulator-interaction-data.md) can help by generating topic-related or adversarial queries. These simulators test the model’s response to situation-appropriate or attack-like queries (edge cases).
60-
- The [adversarial simulator](../how-to/develop/simulator-interaction-data.md#generate-adversarial-simulations-for-safety-evaluation) injects queries that mimic potential security threats or attempt jailbreaks, helping identify limitations and preparing the model for unexpected conditions.
61-
- [Context-appropriate simulators](../how-to/develop/simulator-interaction-data.md#generate-synthetic-data-and-simulate-non-adversarial-tasks) generate typical, relevant conversations you’d expect from users to test quality of responses.
59+
- Simulators and AI red teaming agent (preview): If you don’t have evaluation data (test data), Azure AI [Evaluation SDK’s simulators](..//how-to/develop/simulator-interaction-data.md) can help by generating topic-related or adversarial queries. These simulators test the model’s response to situation-appropriate or attack-like queries (edge cases).
60+
- [Adversarial simulators](../how-to/develop/simulator-interaction-data.md#generate-adversarial-simulations-for-safety-evaluation) injects static queries that mimic potential safety risks or security attacks such as or attempt jailbreaks, helping identify limitations and preparing the model for unexpected conditions.
61+
- [Context-appropriate simulators](../how-to/develop/simulator-interaction-data.md#generate-synthetic-data-and-simulate-non-adversarial-tasks) generate typical, relevant conversations you’d expect from users to test quality of responses. With context-appropriate simulators you can assess metrics such as groundedness, relevance, coherence, and fluency of generated responses.
62+
- [AI red teaming agent](../how-to/develop/run-scans-ai-red-teaming-agent.md) (preview) simulates complex adversarial attacks against your AI system using a broad range of safety and security attacks using Microsoft’s open framework for Python Risk Identification Tool or [PyRIT](https://github.com/Azure/PyRIT). Automated scans using the AI red teaming agent enhances pre-production risk assessment by systematically testing AI applications for risks. This process involves simulated attack scenarios to identify weaknesses in model responses before real-world deployment. By running AI red teaming scans, you can detect and mitigate potential safety issues before deployment. This tool is recommended to be used in conjunction with human-in-the-loop processes such as conventional AI red teaming probing to help accelerate risk identification and aid in the assessment by a human expert.
6263

63-
Alternatively, you can also use [Azure AI Foundrys evaluation widget](../how-to/evaluate-generative-ai-app.md) for testing your generative AI applications.
64+
Alternatively, you can also use [Azure AI Foundry portal's evaluation widget](../how-to/evaluate-generative-ai-app.md) for testing your generative AI applications.
6465

6566
Once satisfactory results are achieved, the AI application can be deployed to production.
6667

6768
## Post-production monitoring
6869

69-
After deployment, the AI application enters the post-production evaluation phase, also known as online evaluation or monitoring. At this stage, the model is embedded within a real-world product and responds to actual user queries. Monitoring ensures that the model continues to behave as expected and adapts to any changes in user behavior or content.
70+
After deployment, the AI application enters the post-production evaluation phase, also known as online evaluation or monitoring. At this stage, the model is embedded within a real-world product and responds to actual user queries in production. Monitoring ensures that the model continues to behave as expected and adapts to any changes in user behavior or content.
7071

7172
- **Ongoing performance tracking**: Regularly measuring AI application’s response using key metrics to ensure consistent output quality.
7273
- **Incident response**: Quickly responding to any harmful, unfair, or inappropriate outputs that might arise during real-world use.
@@ -82,14 +83,15 @@ Cheat sheet:
8283
| Purpose | Process | Parameters |
8384
| -----| -----| ----|
8485
| What are you evaluating for? | Identify or build relevant evaluators | - [Quality and performance](./evaluation-metrics-built-in.md?tabs=warning#generation-quality-metrics) ( [Quality and performance sample notebook](https://github.com/Azure-Samples/rag-data-openai-python-promptflow/blob/main/src/evaluation/evaluate.py))<br> </br> - [Safety and Security](./evaluation-metrics-built-in.md?#risk-and-safety-evaluators) ([Safety and Security sample notebook](https://github.com/Azure-Samples/rag-data-openai-python-promptflow/blob/main/src/evaluation/evaluatesafetyrisks.py)) <br> </br> - [Custom](../how-to/develop/evaluate-sdk.md#custom-evaluators) ([Custom sample notebook](https://github.com/Azure-Samples/rag-data-openai-python-promptflow/blob/main/src/evaluation/evaluate.py)) |
85-
| What data should you use? | Upload or generate relevant dataset | [Generic simulator for measuring Quality and Performance](./concept-synthetic-data.md) ([Generic simulator sample notebook](https://github.com/Azure/azureml-examples/blob/main/sdk/python/foundation-models/system/finetune/Llama-notebooks/datagen/synthetic-data-generation.ipynb)) <br></br> - [Adversarial simulator for measuring Safety and Security](../how-to/develop/simulator-interaction-data.md) ([Adversarial simulator sample notebook](https://github.com/Azure-Samples/rag-data-openai-python-promptflow/blob/main/src/evaluation/simulate_and_evaluate_online_endpoint.ipynb))|
86+
| What data should you use? | Upload or generate relevant dataset | [Generic simulator for measuring Quality and Performance](./concept-synthetic-data.md) ([Generic simulator sample notebook](https://github.com/Azure/azureml-examples/blob/main/sdk/python/foundation-models/system/finetune/Llama-notebooks/datagen/synthetic-data-generation.ipynb)) <br></br> - [Adversarial simulator for measuring Safety and Security](../how-to/develop/simulator-interaction-data.md) ([Adversarial simulator sample notebook](https://github.com/Azure-Samples/rag-data-openai-python-promptflow/blob/main/src/evaluation/simulate_and_evaluate_online_endpoint.ipynb)) <br></br> AI red teaming agent for running automated scans to assess safety and security vulnerabilities ([AI red teaming agent sample notebook](https://aka.ms/airedteamingagent-sample))|
8687
| What resources should conduct the evaluation? | Run evaluation | - Local run <br> </br> - Remote cloud run |
8788
| How did my model/app perform? | Analyze results | [View aggregate scores, view details, score details, compare evaluation runs](..//how-to/evaluate-results.md) |
8889
| How can I improve? | Make changes to model, app, or evaluators | - If evaluation results didn't align to human feedback, adjust your evaluator. <br></br> - If evaluation results aligned to human feedback but didn't meet quality/safety thresholds, apply targeted mitigations. |
8990

9091
## Related content
9192

9293
- [Evaluate your generative AI apps via the playground](../how-to/evaluate-prompts-playground.md)
94+
- [Run automated scans with the AI red teaming agent to assess safety and security risks](../how-to/develop/run-scans-ai-red-teaming-agent.md)
9395
- [Evaluate your generative AI apps with the Azure AI Foundry SDK or portal](../how-to/evaluate-generative-ai-app.md)
9496
- [Evaluation and monitoring metrics for generative AI](evaluation-metrics-built-in.md)
9597
- [Transparency Note for Azure AI Foundry safety evaluations](safety-evaluations-transparency-note.md)

0 commit comments

Comments
 (0)