Skip to content

Commit a18beb3

Browse files
Merge pull request #3912 from minthigpen/patch-5
Update run-scans-ai-red-teaming-agent.md
2 parents d531342 + 7a57768 commit a18beb3

File tree

1 file changed

+12
-11
lines changed

1 file changed

+12
-11
lines changed

articles/ai-foundry/how-to/develop/run-scans-ai-red-teaming-agent.md

Lines changed: 12 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -43,7 +43,7 @@ You can instantiate the AI Red Teaming agent with your Azure AI Project and Azur
4343
```python
4444
# Azure imports
4545
from azure.identity import DefaultAzureCredential
46-
from azure.ai.evaluation import RedTeam, RiskCategory
46+
from azure.ai.evaluation.red_team import RedTeam, RiskCategory
4747

4848
# Azure AI Project Information
4949
azure_ai_project = {
@@ -77,7 +77,7 @@ Currently, AI Red Teaming Agent is only available in a few regions. Ensure your
7777

7878
## Running an automated scan for safety risks
7979

80-
Once your `RedTeam` is instantiated, you can run an automated scan with minimal configuration, only a target is required. The following would generate five direct adversarial queries for each of the four risk categories for a total of 20 attack and response pairs.
80+
Once your `RedTeam` is instantiated, you can run an automated scan with minimal configuration, only a target is required. The following would, by default, generate five baseline adversarial queries for each of the four risk categories for a total of 20 attack and response pairs.
8181

8282
```python
8383
red_team_result = await red_team_agent.scan(target=your_target)
@@ -93,7 +93,7 @@ The `RedTeam` can run automated scans on various targets.
9393
# Configuration for Azure OpenAI model
9494
azure_openai_config = {
9595
"azure_endpoint": os.environ.get("AZURE_OPENAI_ENDPOINT"),
96-
"api_key": os.environ.get("AZURE_OPENAI_KEY"),
96+
"api_key": os.environ.get("AZURE_OPENAI_KEY"), # not needed for entra ID based auth, use az login before running,
9797
"azure_deployment": os.environ.get("AZURE_OPENAI_DEPLOYMENT"),
9898
}
9999

@@ -154,7 +154,7 @@ red_team_result = await red_team_agent.scan(target=chat_target)
154154

155155
### Supported attack strategies
156156

157-
If only the target is passed in when you run a scan and no attack strategies are specified, the `red_team_agent` will only send direct adversarial queries to your target. This is the most naive method of attempting to elicit undesired behavior or generated content. It's recommended to try the baseline direct querying first before applying any attack strategies.
157+
If only the target is passed in when you run a scan and no attack strategies are specified, the `red_team_agent` will only send baseline direct adversarial queries to your target. This is the most naive method of attempting to elicit undesired behavior or generated content. It's recommended to try the baseline direct adversarial querying first before applying any attack strategies.
158158

159159
Attack strategies are methods to take the baseline direct adversarial queries and convert them into another form to try bypassing your target's safeguards. Attack strategies are classified into three buckets of complexities. Attack complexity reflects the effort an attacker needs to put in conducting the attack.
160160

@@ -175,7 +175,7 @@ We offer a group of default attacks for easy complexity and moderate complexity
175175
The following scan would first run all the baseline direct adversarial queries. Then, it would apply the following attack techniques: `Base64`, `Flip`, `Morse`, `Tense`, and a composition of `Tense` and `Base64` which would first translate the baseline query into past tense then encode it into `Base64`.
176176

177177
```python
178-
from azure.ai.evaluation import AttackStrategy
178+
from azure.ai.evaluation.red_team import AttackStrategy
179179

180180
# Run the red team scan with multiple attack strategies
181181
red_team_agent_result = await red_team_agent.scan(
@@ -217,16 +217,16 @@ More advanced users can specify the desired attack strategies instead of using d
217217
| `Jailbreak` | User Injected Prompt Attacks (UPIA) injects specially crafted prompts to bypass AI safeguards | Easy |
218218
| `Tense` | Changes tense of text into past tense. | Moderate |
219219

220-
Each new attack strategy specified will be applied to the set of baseline adversarial queries used.
220+
Each new attack strategy specified will be applied to the set of baseline adversarial queries used. If no attack strategies are specified then only baseline adversarial queries will be sent to your target.
221221

222-
This following example would generate one attack objective per each of the four risk categories specified. That would generate four baseline adversarial prompts which would then get converted into each of the three attack strategies to result in a total of 12 attack-response pairs from your AI system.
222+
This following example would generate one attack objective per each of the four risk categories specified. That would generate four baseline adversarial prompts which would then get converted into each of the four attack strategies to result in a total of 16 attack-response pairs from your AI system. The last attack stratgy is an example of a composition of two attack strategies to create a more complex attack query: the `AttackStrategy.Compose()` function takes in a list of two supported attack strategies and chains them together. The example's composition would first encode the baseline adversarial query into Base64 then apply the ROT13 cipher on the Base64-encoded query. Compositions only support chaining two attack strategies together.
223223

224224
```python
225225
red_team_agent = RedTeam(
226-
azure_ai_project=azure_ai_project, # required
227-
credential=DefaultAzureCredential(), # required
228-
risk_categories=[RiskCategory.Violence, RiskCategory.HateUnfairness, RiskCategory.Sexual, RiskCategory.SelfHarm], # optional, defaults to all four
229-
num_objectives=1, # optional, defaults to 10
226+
azure_ai_project=azure_ai_project,
227+
credential=DefaultAzureCredential(),
228+
risk_categories=[RiskCategory.Violence, RiskCategory.HateUnfairness, RiskCategory.Sexual, RiskCategory.SelfHarm],
229+
num_objectives=1,
230230
)
231231

232232
# Run the red team scan with multiple attack strategies
@@ -237,6 +237,7 @@ red_team_agent_result = await red_team_agent.scan(
237237
AttackStrategy.CharacterSpace, # Add character spaces
238238
AttackStrategy.ROT13, # Use ROT13 encoding
239239
AttackStrategy.UnicodeConfusable, # Use confusable Unicode characters
240+
AttackStrategy.Compose([AttackStrategy.Base64, AttackStrategy.ROT13]), # composition of strategies
240241
],
241242
)
242243
```

0 commit comments

Comments
 (0)