Skip to content

Commit 79e0b88

Browse files
committed
replace term
1 parent dacaf4f commit 79e0b88

34 files changed

+78
-79
lines changed

articles/ai-foundry/concepts/ai-red-teaming-agent.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -17,11 +17,11 @@ author: lgayhardt
1717

1818
The AI Red Teaming Agent (preview) is a powerful tool designed to help organizations proactively find safety risks associated with generative AI systems during design and development of generative AI models and applications.
1919

20-
Traditional red teaming involves exploiting the cyber kill chain and describes the process by which a system is tested for security vulnerabilities. However, with the rise of generative AI, the term AI red teaming has been coined to describe probing for novel risks (both content safety and security related) that these systems present and refers to simulating the behavior of an adversarial user who is trying to cause your AI system to misbehave in a particular way.
20+
Traditional red teaming involves exploiting the cyber kill chain and describes the process by which a system is tested for security vulnerabilities. However, with the rise of generative AI, the term AI red teaming has been coined to describe probing for novel risks (both content and security related) that these systems present and refers to simulating the behavior of an adversarial user who is trying to cause your AI system to misbehave in a particular way.
2121

2222
The AI Red Teaming Agent leverages Microsoft's open-source framework for Python Risk Identification Tool's ([PyRIT](https://github.com/Azure/PyRIT)) AI red teaming capabilities along with Azure AI Foundry's [Risk and Safety Evaluations](./evaluation-metrics-built-in.md#risk-and-safety-evaluators) to help you automatically assess safety issues in three ways:
2323

24-
- **Automated scans for content safety risks:** Firstly, you can automatically scan your model and application endpoints for safety risks by simulating adversarial probing.
24+
- **Automated scans for content risks:** Firstly, you can automatically scan your model and application endpoints for safety risks by simulating adversarial probing.
2525
- **Evaluate probing success:** Next, you can evaluate and score each attack-response pair to generate insightful metrics such as Attack Success Rate (ASR).
2626
- **Reporting and logging** Finally, you can generate a score card of the attack probing techniques and risk categories to help you decide if the system is ready for deployment. Findings can be logged, monitored, and tracked over time directly in Azure AI Foundry, ensuring compliance and continuous risk mitigation.
2727

articles/ai-foundry/concepts/content-filtering.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -26,7 +26,7 @@ author: PatrickFarley
2626

2727
The content filtering system is powered by [Azure AI Content Safety](../../ai-services/content-safety/overview.md), and it works by running both the prompt input and completion output through a set of classification models designed to detect and prevent the output of harmful content. Variations in API configurations and application design might affect completions and thus filtering behavior.
2828

29-
With Azure OpenAI model deployments, you can use the default content filter or create your own content filter (described later on). Models available through **serverless APIs** have content filtering enabled by default. To learn more about the default content filter enabled for serverless APIs, see [Content safety for models curated by Azure AI in the model catalog](model-catalog-content-safety.md).
29+
With Azure OpenAI model deployments, you can use the default content filter or create your own content filter (described later on). Models available through **serverless APIs** have content filtering enabled by default. To learn more about the default content filter enabled for serverless APIs, see [Guidelines & controls for models curated by Azure AI in the model catalog](model-catalog-content-safety.md).
3030

3131
## Language support
3232

articles/ai-foundry/concepts/evaluation-metrics-built-in.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -197,12 +197,12 @@ We support evaluating vulnerability towards the following types of jailbreak att
197197
- **Direct attack jailbreak** (also known as UPIA or User Prompt Injected Attack) injects prompts in the user role turn of conversations or queries to generative AI applications. Jailbreaks are when a model response bypasses the restrictions placed on it. Jailbreak also happens where an LLM deviates from the intended task or topic.
198198
- **Indirect attack jailbreak** (also known as XPIA or cross domain prompt injected attack) injects prompts in the returned documents or context of the user's query to generative AI applications.
199199

200-
*Evaluating direct attack* is a comparative measurement using the content safety evaluators as a control. It isn't its own AI-assisted evaluator. Run `ContentSafetyEvaluator` on two different, red-teamed datasets:
200+
*Evaluating direct attack* is a comparative measurement using the Azure AI Content Safety evaluators as a control. It isn't its own AI-assisted evaluator. Run `ContentSafetyEvaluator` on two different, red-teamed datasets:
201201

202202
- Baseline adversarial test dataset.
203203
- Adversarial test dataset with direct attack jailbreak injections in the first turn.
204204

205-
You can do this with functionality and attack datasets generated with the [direct attack simulator](../how-to/develop/simulator-interaction-data.md#simulating-jailbreak-attacks) with the same randomization seed. Then you can evaluate jailbreak vulnerability by comparing results from content safety evaluators between the two test dataset's aggregate scores for each safety evaluator. A direct attack jailbreak defect is detected when there's presence of content harm response detected in the second direct attack injected dataset when there was none or lower severity detected in the first control dataset.
205+
You can do this with functionality and attack datasets generated with the [direct attack simulator](../how-to/develop/simulator-interaction-data.md#simulating-jailbreak-attacks) with the same randomization seed. Then you can evaluate jailbreak vulnerability by comparing results from Azure AI Content Safety evaluators between the two test dataset's aggregate scores for each safety evaluator. A direct attack jailbreak defect is detected when there's presence of content harm response detected in the second direct attack injected dataset when there was none or lower severity detected in the first control dataset.
206206

207207
### Indirect attack definition and label
208208

articles/ai-foundry/concepts/model-catalog-content-safety.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
---
2-
title: Content safety for models curated by Azure AI in the model catalog
2+
title: Guidelines & controls for models curated by Azure AI in the model catalog
33
titleSuffix: Azure AI Foundry
4-
description: Learn about content safety for models deployed using serverless APIs, using Azure AI Foundry.
4+
description: Learn about Guidelines & controls for models deployed using serverless APIs, using Azure AI Foundry.
55
manager: scottpolly
66
ms.service: azure-ai-foundry
77
ms.topic: conceptual
@@ -13,11 +13,11 @@ reviewer: ositanachi
1313
ms.custom:
1414
---
1515

16-
# Content safety for models curated by Azure AI in the model catalog
16+
# Guidelines & controls for models curated by Azure AI in the model catalog
1717

1818
[!INCLUDE [feature-preview](../includes/feature-preview.md)]
1919

20-
In this article, learn about content safety capabilities for models from the model catalog deployed using serverless APIs.
20+
In this article, learn about Guidelines & controls capabilities for models from the model catalog deployed using serverless APIs.
2121

2222

2323
## Content filter defaults

articles/ai-foundry/concepts/models-featured.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -364,5 +364,5 @@ For examples of how to use Stability AI models, see the following examples:
364364
- [Deploy models as serverless APIs](../how-to/deploy-models-serverless.md)
365365
- [Model catalog and collections in Azure AI Foundry portal](../how-to/model-catalog-overview.md)
366366
- [Region availability for models in serverless API endpoints](../how-to/deploy-models-serverless-availability.md)
367-
- [Content safety for models curated by Azure AI in the model catalog](model-catalog-content-safety.md)
367+
- [Guidelines & controls for models curated by Azure AI in the model catalog](model-catalog-content-safety.md)
368368

articles/ai-foundry/how-to/deploy-models-gretel-navigator.md

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -233,11 +233,11 @@ result = client.complete(
233233
```
234234

235235

236-
### Apply content safety
236+
### Apply Guidelines and controls
237237

238-
The Azure AI model inference API supports [Azure AI content safety](https://aka.ms/azureaicontentsafety). When you use deployments with Azure AI content safety turned on, inputs and outputs pass through an ensemble of classification models aimed at detecting and preventing the output of harmful content. The content filtering (preview) system detects and takes action on specific categories of potentially harmful content in both input prompts and output completions.
238+
The Azure AI model inference API supports [Azure AI Content Safety](https://aka.ms/azureaicontentsafety). When you use deployments with Azure AI Content Safety turned on, inputs and outputs pass through an ensemble of classification models aimed at detecting and preventing the output of harmful content. The content filtering (preview) system detects and takes action on specific categories of potentially harmful content in both input prompts and output completions.
239239

240-
The following example shows how to handle events when the model detects harmful content in the input prompt and content safety is enabled.
240+
The following example shows how to handle events when the model detects harmful content in the input prompt and the filter is enabled.
241241

242242

243243
```python
@@ -477,11 +477,11 @@ The following example request shows other parameters that you can specify in the
477477
}
478478
```
479479

480-
### Apply content safety
480+
### Apply Guidelines & controls
481481

482-
The Azure AI model inference API supports [Azure AI content safety](https://aka.ms/azureaicontentsafety). When you use deployments with Azure AI content safety turned on, inputs and outputs pass through an ensemble of classification models aimed at detecting and preventing the output of harmful content. The content filtering (preview) system detects and takes action on specific categories of potentially harmful content in both input prompts and output completions.
482+
The Azure AI model inference API supports [Azure AI Content Safety](https://aka.ms/azureaicontentsafety). When you use deployments with Azure AI Content Safety turned on, inputs and outputs pass through an ensemble of classification models aimed at detecting and preventing the output of harmful content. The content filtering (preview) system detects and takes action on specific categories of potentially harmful content in both input prompts and output completions.
483483

484-
The following example shows how to handle events when the model detects harmful content in the input prompt and content safety is enabled.
484+
The following example shows how to handle events when the model detects harmful content in the input prompt.
485485

486486

487487
```json

articles/ai-foundry/how-to/develop/evaluate-sdk.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -508,7 +508,7 @@ Output:
508508

509509
```
510510

511-
The result of the content safety evaluators for a query and response pair is a dictionary containing:
511+
The result of the Guidelines & controls evaluators for a query and response pair is a dictionary containing:
512512

513513
- `{metric_name}` provides a severity label for that content risk ranging from Very low, Low, Medium, and High. To learn more about the descriptions of each content risk and severity scale, see [Evaluation and monitoring metrics for generative AI](../../concepts/evaluation-metrics-built-in.md).
514514
- `{metric_name}_score` has a range between 0 and 7 severity level that maps to a severity label given in `{metric_name}`.
@@ -570,12 +570,12 @@ We support evaluating vulnerability towards the following types of jailbreak att
570570
- **Direct attack jailbreak** (also known as UPIA or User Prompt Injected Attack) injects prompts in the user role turn of conversations or queries to generative AI applications.
571571
- **Indirect attack jailbreak** (also known as XPIA or cross domain prompt injected attack) injects prompts in the returned documents or context of the user's query to generative AI applications.
572572

573-
*Evaluating direct attack* is a comparative measurement using the content safety evaluators as a control. It isn't its own AI-assisted metric. Run `ContentSafetyEvaluator` on two different, red-teamed datasets:
573+
*Evaluating direct attack* is a comparative measurement using the Azure AI Content Safety evaluators as a control. It isn't its own AI-assisted metric. Run `ContentSafetyEvaluator` on two different, red-teamed datasets:
574574

575575
- Baseline adversarial test dataset.
576576
- Adversarial test dataset with direct attack jailbreak injections in the first turn.
577577

578-
You can do this with functionality and attack datasets generated with the [direct attack simulator](./simulator-interaction-data.md) with the same randomization seed. Then you can evaluate jailbreak vulnerability by comparing results from content safety evaluators between the two test dataset's aggregate scores for each safety evaluator. A direct attack jailbreak defect is detected when there's presence of content harm response detected in the second direct attack injected dataset when there was none or lower severity detected in the first control dataset.
578+
You can do this with functionality and attack datasets generated with the [direct attack simulator](./simulator-interaction-data.md) with the same randomization seed. Then you can evaluate jailbreak vulnerability by comparing results from Azure AI Content Safety evaluators between the two test dataset's aggregate scores for each safety evaluator. A direct attack jailbreak defect is detected when there's presence of content harm response detected in the second direct attack injected dataset when there was none or lower severity detected in the first control dataset.
579579

580580
*Evaluating indirect attack* is an AI-assisted metric and doesn't require comparative measurement like evaluating direct attacks. Generate an indirect attack jailbreak injected dataset with the [indirect attack simulator](./simulator-interaction-data.md) then run evaluations with the `IndirectAttackEvaluator`.
581581

articles/ai-foundry/how-to/develop/simulator-interaction-data.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -408,7 +408,7 @@ We support evaluating vulnerability towards the following types of jailbreak att
408408
- **Direct attack jailbreak** (also known as UPIA or User Prompt Injected Attack) injects prompts in the user role turn of conversations or queries to generative AI applications.
409409
- **Indirect attack jailbreak** (also known as XPIA or cross domain prompt injected attack) injects prompts in the returned documents or context of the user's query to generative AI applications.
410410

411-
*Evaluating direct attack* is a comparative measurement using the content safety evaluators as a control. It isn't its own AI-assisted metric. Run `ContentSafetyEvaluator` on two different, red-teamed datasets generated by `AdversarialSimulator`:
411+
*Evaluating direct attack* is a comparative measurement using the Azure AI Content Safety evaluators as a control. It isn't its own AI-assisted metric. Run `ContentSafetyEvaluator` on two different, red-teamed datasets generated by `AdversarialSimulator`:
412412

413413
- Baseline adversarial test dataset using one of the previous scenario enums for evaluating Hateful and unfair content, Sexual content, Violent content, Self-harm-related content.
414414
- Adversarial test dataset with direct attack jailbreak injections in the first turn:

articles/ai-foundry/how-to/evaluation-github-action.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@ author: lgayhardt
1717

1818
This GitHub Action enables offline evaluation of AI models and agents within your CI/CD pipelines. It's designed to streamline the evaluation process, allowing you to assess model performance and make informed decisions before deploying to production.
1919

20-
Offline evaluation involves testing AI models and agents using test datasets to measure their performance on various quality and safety metrics such as fluency, coherence, and content safety. After you select a model in the [Azure AI Model Catalog](https://azure.microsoft.com/products/ai-model-catalog?msockid=1f44c87dd9fa6d1e257fdd6dd8406c42) or [GitHub Model marketplace](https://github.com/marketplace/models), offline pre-production evaluation is crucial for AI application validation during integration testing. This process allows developers to identify potential issues and make improvements before deploying the model or application to production, such as when creating and updating agents.
20+
Offline evaluation involves testing AI models and agents using test datasets to measure their performance on various quality and safety metrics such as fluency, coherence, and appropriateness. After you select a model in the [Azure AI Model Catalog](https://azure.microsoft.com/products/ai-model-catalog?msockid=1f44c87dd9fa6d1e257fdd6dd8406c42) or [GitHub Model marketplace](https://github.com/marketplace/models), offline pre-production evaluation is crucial for AI application validation during integration testing. This process allows developers to identify potential issues and make improvements before deploying the model or application to production, such as when creating and updating agents.
2121

2222
[!INCLUDE [features](../includes/evaluation-github-action-azure-devops-features.md)]
2323

articles/ai-foundry/how-to/fine-tune-serverless.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -754,7 +754,7 @@ The training data used is the same as demonstrated in the SDK notebook. The CLI
754754

755755
## Content filtering
756756

757-
Models deployed as a service with pay-as-you-go billing are protected by Azure AI Content Safety. When deployed to real-time endpoints, you can opt out of this capability. With Azure AI content safety enabled, both the prompt and completion pass through an ensemble of classification models aimed at detecting and preventing the output of harmful content. The content filtering system detects and takes action on specific categories of potentially harmful content in both input prompts and output completions. Learn more about [Azure AI Content Safety](../concepts/content-filtering.md).
757+
Models deployed as a service with pay-as-you-go billing are protected by Azure AI Content Safety. When deployed to real-time endpoints, you can opt out of this capability. With Azure AI Content Safety enabled, both the prompt and completion pass through an ensemble of classification models aimed at detecting and preventing the output of harmful content. The content filtering system detects and takes action on specific categories of potentially harmful content in both input prompts and output completions. Learn more about [Azure AI Content Safety](../concepts/content-filtering.md).
758758

759759

760760
## Next steps

0 commit comments

Comments
 (0)