Skip to content

Commit a153ddf

Browse files
Additional edits.
1 parent e89a912 commit a153ddf

File tree

2 files changed

+14
-14
lines changed

2 files changed

+14
-14
lines changed

articles/ai-foundry/how-to/develop/run-scans-ai-red-teaming-agent.md

Lines changed: 12 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -185,7 +185,7 @@ The following risk categories are supported in the AI Red Teaming Agent's runs,
185185

186186
## Custom attack objectives
187187

188-
The AI Red Teaming Agent provides a Microsoft curated set of adversarial attack objectives covering each supported risk. Because your organization's policy might be different, you might want to bring your own custom set to use for each risk category.
188+
The AI Red Teaming Agent provides a Microsoft curated set of adversarial attack objectives that cover each supported risk. Because your organization's policy might be different, you might want to bring your own custom set to use for each risk category.
189189

190190
You can run the AI Red Teaming Agent on your own dataset.
191191

@@ -197,7 +197,7 @@ custom_red_team_agent = RedTeam(
197197
)
198198
```
199199

200-
Your dataset must be a JSON file, in the following format with the associated metadata for the corresponding risk types. When you bring your own prompts, the supported `risk-type`s are `violence`, `sexual`, `hate_unfairness`, and `self_harm`. Use these supported types so that the Safety Evaluators can evaluate the attacks for success correspondingly. The number of prompts that you specify is the `num_objectives` used in the scan.
200+
Your dataset must be a JSON file, in the following format with the associated metadata for the corresponding risk types. When you bring your own prompts, the supported risk types are `violence`, `sexual`, `hate_unfairness`, and `self_harm`. Use these supported types so that the Safety Evaluators can evaluate the attacks for success. The number of prompts that you specify is the `num_objectives` used in the scan.
201201

202202
```json
203203
[
@@ -229,25 +229,25 @@ Your dataset must be a JSON file, in the following format with the associated me
229229

230230
## Supported attack strategies
231231

232-
If only the target is passed in when you run a scan and no attack strategies are specified, the `red_team_agent` sends only baseline direct adversarial queries to your target. This approach is the most naive method of attempting to elicit undesired behavior or generated content. We recommend that you try the baseline direct adversarial querying first before applying any attack strategies.
232+
If only the target is passed in when you run a scan and no attack strategies are specified, the `red_team_agent` sends only baseline direct adversarial queries to your target. This approach is the most naive method of attempting to elicit undesired behavior or generated content. We recommend that you try the baseline direct adversarial querying first before you apply any attack strategies.
233233

234234
Attack strategies are methods to take the baseline direct adversarial queries and convert them into another form to try bypassing your target's safeguards. Attack strategies are classified into three levels of complexity. Attack complexity reflects the effort an attacker needs to put in conducting the attack.
235235

236-
- **Easy complexity attacks** require less effort, such as translation of a prompt into some encoding
237-
- **Moderate complexity attacks** requires having access to resources such as another generative AI model
238-
- **Difficult complexity attacks** includes attacks that require access to significant resources and effort to execute an attack, such as knowledge of search-based algorithms, in addition to a generative AI model.
236+
- **Easy complexity attacks** require less effort, such as translation of a prompt into some encoding.
237+
- **Moderate complexity attacks** require having access to resources such as another generative AI model.
238+
- **Difficult complexity attacks** include attacks that require access to significant resources and effort to run an attack, such as knowledge of search-based algorithms, in addition to a generative AI model.
239239

240240
### Default grouped attack strategies
241241

242-
This approach offers a group of default attacks for easy complexity and moderate complexity that can be used in the `attack_strategies` parameter. A difficult complexity attack can be a composition of two strategies in one attack.
242+
This approach offers a group of default attacks for easy complexity and moderate complexity that you can use in the `attack_strategies` parameter. A difficult complexity attack can be a composition of two strategies in one attack.
243243

244244
| Attack strategy complexity group | Includes |
245245
| --- | --- |
246246
| `EASY` | `Base64`, `Flip`, `Morse` |
247247
| `MODERATE` | `Tense` |
248248
| `DIFFICULT` | Composition of `Tense` and `Base64` |
249249

250-
The following scan would first run all the baseline direct adversarial queries. Then, it would apply the following attack techniques: `Base64`, `Flip`, `Morse`, `Tense`, and a composition of `Tense` and `Base64`, which would first translate the baseline query into past tense then encode it into `Base64`.
250+
The following scan first runs all the baseline direct adversarial queries. Then, it applies the following attack techniques: `Base64`, `Flip`, `Morse`, `Tense`, and a composition of `Tense` and `Base64`, which first translates the baseline query into past tense then encode it into `Base64`.
251251

252252
```python
253253
from azure.ai.evaluation.red_team import AttackStrategy
@@ -292,11 +292,11 @@ You can specify the desired attack strategies instead of using default groups. T
292292
| `Jailbreak` | User Injected Prompt Attacks (UPIA) injects specially crafted prompts to bypass AI safeguards | Easy |
293293
| `Tense` | Changes tense of text into past tense. | Moderate |
294294

295-
Each new attack strategy specified is applied to the set of baseline adversarial queries used in addition to the baseline adversarial queries.
295+
Each new attack strategy is applied to the set of baseline adversarial queries used in addition to the baseline adversarial queries.
296296

297-
This following example would generate one attack objective per each of the four risk categories specified. This approach first generates four baseline adversarial prompts, which would be sent to your target. Then, each baseline query would get converted into each of the four attack strategies. This conversion results in a total of 20 attack-response pairs from your AI system.
297+
The following example generates one attack objective per each of the four risk categories specified. This approach first generates four baseline adversarial prompts to send to your target. Then, each baseline query gets converted into each of the four attack strategies. This conversion results in a total of 20 attack-response pairs from your AI system.
298298

299-
The last attack strategy is an example of a composition of two attack strategies to create a more complex attack query: the `AttackStrategy.Compose()` function takes in a list of two supported attack strategies and chains them together. The example's composition would first encode the baseline adversarial query into Base64 then apply the ROT13 cipher on the Base64-encoded query. Compositions only support chaining two attack strategies together.
299+
The last attack strategy is a composition of two attack strategies to create a more complex attack query: the `AttackStrategy.Compose()` function takes in a list of two supported attack strategies and chains them together. The example's composition first encodes the baseline adversarial query into Base64 then apply the ROT13 cipher on the Base64-encoded query. Compositions support chaining only two attack strategies together.
300300

301301
```python
302302
red_team_agent = RedTeam(
@@ -510,4 +510,4 @@ The red teaming scorecard also provides row-level data on each attack-response p
510510

511511
## Related content
512512

513-
Try out an [example workflow](https://aka.ms/airedteamingagent-sample) in our GitHub samples.
513+
Try an [example workflow](https://aka.ms/airedteamingagent-sample) in the GitHub samples.

articles/ai-foundry/includes/view-ai-red-teaming-results.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@ After your automated scan finishes, the results also get logged to your Azure AI
1515

1616
### View report of each scan
1717

18-
In your Azure AI Foundry project or hub-based project, navigate to the **Evaluation** page. Select **AI red teaming** to view the comprehensive report with a detailed drill-down of each scan.
18+
In your Azure AI Foundry project or hub-based project, navigate to the **Evaluation** page. Select **AI red teaming** to view the report with detailed drill-down results of each scan.
1919

2020
:::image type="content" source="../media/evaluations/red-teaming-agent/ai-red-team.png" alt-text="Screenshot of AI Red Teaming tab in Azure AI Foundry project page." lightbox="../media/evaluations/red-teaming-agent/ai-red-team.png":::
2121

@@ -27,7 +27,7 @@ Or by attack complexity classification:
2727

2828
:::image type="content" source="../media/evaluations/red-teaming-agent/ai-red-team-report-attack.png" alt-text="Screenshot of AI Red Teaming report view by attack complexity category in Azure AI Foundry." lightbox="../media/evaluations/red-teaming-agent/ai-red-team-report-attack.png":::
2929

30-
Drilling down further into the data tab provides a row-level view of each attack-response pair, enabling deeper insights into system issues and behaviors. For each attack-response pair, you can see additional information, such as whether or not the attack was successful, what attack strategy was used and its attack complexity. There's also an option for a human in the loop reviewer to provide human feedback by selecting the thumbs up or thumbs down icon.
30+
Drilling down further into the data tab provides a row-level view of each attack-response pair. This information offers deeper insights into system issues and behaviors. For each attack-response pair, you can see more information, such as whether or not the attack was successful, what attack strategy was used, and its attack complexity. A human in the loop reviewer can provide human feedback by selecting the thumbs up or thumbs down icon.
3131

3232
:::image type="content" source="../media/evaluations/red-teaming-agent/ai-red-team-data.png" alt-text="Screenshot of AI Red Teaming data page in Azure AI Foundry." lightbox="../media/evaluations/red-teaming-agent/ai-red-team-data.png":::
3333

0 commit comments

Comments
 (0)