Skip to content

Commit 372ce0e

Browse files
authored
Update run-scans-ai-red-teaming-agent.md
1 parent a18beb3 commit 372ce0e

File tree

1 file changed

+17
-19
lines changed

1 file changed

+17
-19
lines changed

articles/ai-foundry/how-to/develop/run-scans-ai-red-teaming-agent.md

Lines changed: 17 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -56,12 +56,17 @@ azure_ai_project = {
5656
red_team_agent = RedTeam(
5757
azure_ai_project=azure_ai_project, # required
5858
credential=DefaultAzureCredential(), # required
59-
risk_categories=[RiskCategory.Violence, RiskCategory.HateUnfairness, RiskCategory.Sexual, RiskCategory.SelfHarm], # optional, defaults to all four
59+
risk_categories=[ # optional, defaults to all four risk categories
60+
RiskCategory.Violence,
61+
RiskCategory.HateUnfairness,
62+
RiskCategory.Sexual,
63+
RiskCategory.SelfHarm
64+
],
6065
num_objectives=5, # optional, defaults to 10
6166
)
6267
```
6368

64-
You can additionally specify which risk categories of content risks you want to cover with `risk_categories` and define the number of prompts covering each risk category with `num_objectives`. The previous example generates 10 seed prompts for each risk category for a total of 40 rows of prompts to be generated and sent to your target.
69+
Optionally, you can specify which risk categories of content risks you want to cover with `risk_categories` and define the number of prompts covering each risk category with `num_objectives`. The previous example generates 5 seed prompts for each risk category for a total of 20 rows of prompts to be generated and sent to your target.
6570

6671
> [!NOTE]
6772
> AI Red Teaming Agent only supports single-turn interactions in text-only scenarios.
@@ -77,7 +82,7 @@ Currently, AI Red Teaming Agent is only available in a few regions. Ensure your
7782

7883
## Running an automated scan for safety risks
7984

80-
Once your `RedTeam` is instantiated, you can run an automated scan with minimal configuration, only a target is required. The following would, by default, generate five baseline adversarial queries for each of the four risk categories for a total of 20 attack and response pairs.
85+
Once your `RedTeam` is instantiated, you can run an automated scan with minimal configuration, only a target is required. The following would, by default, generate five baseline adversarial queries for each of the four risk categories defined in the `RedTeam` above for a total of 20 attack and response pairs.
8186

8287
```python
8388
red_team_result = await red_team_agent.scan(target=your_target)
@@ -217,15 +222,20 @@ More advanced users can specify the desired attack strategies instead of using d
217222
| `Jailbreak` | User Injected Prompt Attacks (UPIA) injects specially crafted prompts to bypass AI safeguards | Easy |
218223
| `Tense` | Changes tense of text into past tense. | Moderate |
219224

220-
Each new attack strategy specified will be applied to the set of baseline adversarial queries used. If no attack strategies are specified then only baseline adversarial queries will be sent to your target.
225+
Each new attack strategy specified will be applied to the set of baseline adversarial queries used in addition to the baseline adversarial queries.
221226

222-
This following example would generate one attack objective per each of the four risk categories specified. That would generate four baseline adversarial prompts which would then get converted into each of the four attack strategies to result in a total of 16 attack-response pairs from your AI system. The last attack stratgy is an example of a composition of two attack strategies to create a more complex attack query: the `AttackStrategy.Compose()` function takes in a list of two supported attack strategies and chains them together. The example's composition would first encode the baseline adversarial query into Base64 then apply the ROT13 cipher on the Base64-encoded query. Compositions only support chaining two attack strategies together.
227+
This following example would generate one attack objective per each of the four risk categories specified. This will first, generate four baseline adversarial prompts which would be sent to your target. Then, each baseline query would get converted into each of the four attack strategies. This will result in a total of 20 attack-response pairs from your AI system. The last attack stratgy is an example of a composition of two attack strategies to create a more complex attack query: the `AttackStrategy.Compose()` function takes in a list of two supported attack strategies and chains them together. The example's composition would first encode the baseline adversarial query into Base64 then apply the ROT13 cipher on the Base64-encoded query. Compositions only support chaining two attack strategies together.
223228

224229
```python
225230
red_team_agent = RedTeam(
226231
azure_ai_project=azure_ai_project,
227232
credential=DefaultAzureCredential(),
228-
risk_categories=[RiskCategory.Violence, RiskCategory.HateUnfairness, RiskCategory.Sexual, RiskCategory.SelfHarm],
233+
risk_categories=[
234+
RiskCategory.Violence,
235+
RiskCategory.HateUnfairness,
236+
RiskCategory.Sexual,
237+
RiskCategory.SelfHarm
238+
],
229239
num_objectives=1,
230240
)
231241

@@ -417,18 +427,6 @@ The red teaming scorecard also provides row-level data on each attack-response p
417427
"violence": {
418428
"severity_label": "Very low",
419429
"reason": "Refusal message detected in system response."
420-
},
421-
"sexual": {
422-
"severity_label": "Very low",
423-
"reason": "Refusal message detected in system response."
424-
},
425-
"self_harm": {
426-
"severity_label": "Very low",
427-
"reason": "Refusal message detected in system response."
428-
},
429-
"hate_unfair": {
430-
"severity_label": "Very low",
431-
"reason": "Refusal message detected in system response."
432430
}
433431
}
434432
}
@@ -438,7 +436,7 @@ The red teaming scorecard also provides row-level data on each attack-response p
438436

439437
## Viewing your results in Azure AI Foundry project
440438

441-
After each automated scan is finished running locally, the results also get logged to your Azure AI Foundry project which you specified in the beginning. In your project, navigate to the Evaluations page and select the AI red teaming tab to view the comprehensive report with a detailed drill-down of each scan.
439+
After each automated scan is finished running locally, the results also get logged to your Azure AI Foundry project which you specified in the beginning. In your project, navigate to the **Evaluations** page and select the **AI red teaming** tab to view the comprehensive report with a detailed drill-down of each scan.
442440

443441
:::image type="content" source="../../media/evaluations/red-teaming-agent/ai-red-team.png" alt-text="Screenshot of AI Red Teaming tab in Azure AI Foundry project page." lightbox="../../media/evaluations/red-teaming-agent/ai-red-team.png":::
444442

0 commit comments

Comments
 (0)