You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/ai-services/openai/concepts/content-filter.md
+20-20Lines changed: 20 additions & 20 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -41,12 +41,12 @@ Text and image models support Drugs as an additional classification. This catego
41
41
|--------|-----------|
42
42
| Hate and Fairness | Hate and fairness-related harms refer to any content that attacks or uses discriminatory language with reference to a person or Identity group based on certain differentiating attributes of these groups. <br><br>This includes, but is not limited to:<ul><li>Race, ethnicity, nationality</li><li>Gender identity groups and expression</li><li>Sexual orientation</li><li>Religion</li><li>Personal appearance and body size</li><li>Disability status</li><li>Harassment and bullying</li></ul> |
43
43
| Sexual | Sexual describes language related to anatomical organs and genitals, romantic relationships and sexual acts, acts portrayed in erotic or affectionate terms, including those portrayed as an assault or a forced sexual violent act against one’s will. <br><br> This includes but is not limited to:<ul><li>Vulgar content</li><li>Prostitution</li><li>Nudity and Pornography</li><li>Abuse</li><li>Child exploitation, child abuse, child grooming</li></ul> |
44
-
| Violence | Violence describes language related to physical actions intended to hurt, injure, damage, or kill someone or something; describes weapons, guns and related entities. <br><br>This includes, but is not limited to: <ul><li>Weapons</li><li>Bullying and intimidation</li><li>Terrorist and violent extremism</li><li>Stalking</li></ul> |
45
-
| Self-Harm | Self-harm describes language related to physical actions intended to purposely hurt, injure, damage one’s body or kill oneself. <br><br> This includes, but is not limited to: <ul><li>Eating Disorders</li><li>Bullying and intimidation</li></ul> |
44
+
| Violence | Violence describes language related to physical actions intended to hurt, injure, damage, or kill someone or something; describes weapons, guns and related entities. <br><br>This includes, but isn't limited to: <ul><li>Weapons</li><li>Bullying and intimidation</li><li>Terrorist and violent extremism</li><li>Stalking</li></ul> |
45
+
| Self-Harm | Self-harm describes language related to physical actions intended to purposely hurt, injure, damage one’s body or kill oneself. <br><br> This includes, but isn't limited to: <ul><li>Eating Disorders</li><li>Bullying and intimidation</li></ul> |
46
46
| Protected Material for Text<sup>*</sup> | Protected material text describes known text content (for example, song lyrics, articles, recipes, and selected web content) that can be outputted by large language models.
47
47
| Protected Material for Code | Protected material code describes source code that matches a set of source code from public repositories, which can be outputted by large language models without proper citation of source repositories.
48
48
49
-
<sup>*</sup> If you are an owner of text material and want to submit text content for protection, please[file a request](https://aka.ms/protectedmaterialsform).
49
+
<sup>*</sup> If you're an owner of text material and want to submit text content for protection, [file a request](https://aka.ms/protectedmaterialsform).
50
50
51
51
## Prompt Shields
52
52
@@ -91,16 +91,16 @@ The default content filtering configuration for the GPT model series is set to f
91
91
92
92
| Severity filtered | Configurable for prompts | Configurable for completions | Descriptions |
| Low, medium, high | Yes | Yes | Strictest filtering configuration. Content detected at severity levels low, medium and high is filtered.|
94
+
| Low, medium, high | Yes | Yes | Strictest filtering configuration. Content detected at severity levels low, medium, and high is filtered.|
95
95
| Medium, high | Yes | Yes | Content detected at severity level low isn't filtered, content at medium and high is filtered.|
96
96
| High | Yes| Yes | Content detected at severity levels low and medium isn't filtered. Only content at severity level high is filtered. Requires approval<sup>1</sup>.|
97
97
| No filters | If approved<sup>1</sup>| If approved<sup>1</sup>| No content is filtered regardless of severity level detected. Requires approval<sup>1</sup>.|
98
98
99
-
<sup>1</sup> For Azure OpenAI models, only customers who have been approved for modified content filtering have full content filtering control and can turn content filters off. Apply for modified content filters via this form: [Azure OpenAI Limited Access Review: Modified Content Filters](https://ncv.microsoft.com/uEfCgnITdR)
99
+
<sup>1</sup> For Azure OpenAI models, only customers who have been approved for modified content filtering have full content filtering control and can turn off content filters. Apply for modified content filters via this form: [Azure OpenAI Limited Access Review: Modified Content Filters](https://ncv.microsoft.com/uEfCgnITdR)
100
100
101
101
This preview feature is available for the following Azure OpenAI models:
Content filtering configurations are created within a Resource in Azure AI Studio, and can be associated with Deployments. [Learn more about configurability here](../how-to/content-filters.md).
@@ -112,8 +112,8 @@ Customers are responsible for ensuring that applications integrating Azure OpenA
112
112
When the content filtering system detects harmful content, you receive either an error on the API call if the prompt was deemed inappropriate, or the `finish_reason` on the response will be `content_filter` to signify that some of the completion was filtered. When building your application or system, you'll want to account for these scenarios where the content returned by the Completions API is filtered, which might result in content that is incomplete. How you act on this information will be application specific. The behavior can be summarized in the following points:
113
113
114
114
- Prompts that are classified at a filtered category and severity level will return an HTTP 400 error.
115
-
- Non-streaming completions calls won't return any content when the content is filtered. The `finish_reason` value will be set to content_filter. In rare cases with longer responses, a partial result can be returned. In these cases, the `finish_reason`will be updated.
116
-
- For streaming completions calls, segments will be returned back to the user as they're completed. The service will continue streaming until either reaching a stop token, length, or when content that is classified at a filtered category and severity level is detected.
115
+
- Non-streaming completions calls won't return any content when the content is filtered. The `finish_reason` value is set to content_filter. In rare cases with longer responses, a partial result can be returned. In these cases, the `finish_reason`is updated.
116
+
- For streaming completions calls, segments are returned back to the user as they're completed. The service continues streaming until either reaching a stop token, length, or when content that is classified at a filtered category and severity level is detected.
117
117
118
118
### Scenario: You send a non-streaming completions call asking for multiple outputs; no content is classified at a filtered category and severity level
119
119
@@ -224,7 +224,7 @@ The table below outlines the various ways content filtering can appear:
224
224
225
225
|**HTTP Response Code**|**Response behavior**|
226
226
|------------|------------------------|
227
-
|200|In this case, the call will stream back with the full generation and `finish_reason` will be either 'length' or 'stop' for each generated response.|
227
+
|200|In this case, the call streams back with the full generation and `finish_reason` will be either 'length' or 'stop' for each generated response.|
228
228
229
229
**Example request payload:**
230
230
@@ -337,7 +337,7 @@ The table below outlines the various ways content filtering can appear:
337
337
338
338
When annotations are enabled as shown in the code snippet below, the following information is returned via the API for the categories hate and fairness, sexual, violence, and self-harm:
For details on the inference RESTAPI endpoints for Azure OpenAI and how to create Chat and Completions please follow [Azure OpenAI Service RESTAPI reference guidance](../reference.md). Annotations are returned for all scenarios when using any preview API version starting from `2023-06-01-preview`, as well as the GAAPI version `2024-02-01`.
735
+
For details on the inference RESTAPI endpoints for Azure OpenAI and how to create Chat and Completions, follow [Azure OpenAI Service RESTAPI reference guidance](../reference.md). Annotations are returned for all scenarios when using any preview API version starting from `2023-06-01-preview`, as well as the GAAPI version `2024-02-01`.
736
736
737
737
### Example scenario: An input prompt containing content that is classified at a filtered category and severity level is sent to the completions API
738
738
@@ -781,7 +781,7 @@ For enhanced detection capabilities, prompts should be formatted according to th
781
781
782
782
The Chat Completion API is structured by definition. It consists of a list of messages, each with an assigned role.
783
783
784
-
The safety system will parsethis structured format and apply the following behavior:
784
+
The safety system parsesthis structured format and apply the following behavior:
785
785
- On the latest “user” content, the following categories ofRAI Risks will be detected:
786
786
- Hate
787
787
- Sexual
@@ -800,7 +800,7 @@ This is an example message array:
800
800
801
801
### Embedding documents in your prompt
802
802
803
-
In addition to detection on last user content, Azure OpenAI also supports the detection of specific risks inside context documents via Prompt Shields – Indirect Prompt Attack Detection. You should identify parts of the input that are a document (e.g.retrieved website, email, etc.) with the following document delimiter.
803
+
In addition to detection on last user content, Azure OpenAI also supports the detection of specific risks inside context documents via Prompt Shields – Indirect Prompt Attack Detection. You should identify parts of the input that are a document (for example, retrieved website, email, etc.) with the following document delimiter.
804
804
805
805
```
806
806
<documents>
@@ -812,7 +812,7 @@ When you do so, the following options are available for detection on tagged docu
812
812
- On each tagged “document” content, detect the following categories:
813
813
- Indirect attacks (optional)
814
814
815
-
Here is an example chat completion messages array:
815
+
Here's an example chat completion messages array:
816
816
817
817
```json
818
818
{"role": "system", "content": "Provide some context and/or instructions to the model, including document context. \"\"\" <documents>\n*insert your document content here*\n<\\documents> \"\"\""},
@@ -848,21 +848,21 @@ The escaped text in a chat completion context would read:
848
848
849
849
## Content streaming
850
850
851
-
This section describes the Azure OpenAI content streaming experience and options. Customershave the option to receive content from the API as it's generated, instead of waiting for chunks of content that have been verified to pass your content filters.
851
+
This section describes the Azure OpenAI content streaming experience and options. Customers can receive content from the API as it's generated, instead of waiting for chunks of content that have been verified to pass your content filters.
852
852
853
853
### Default
854
854
855
855
The content filtering system is integrated and enabled by default for all customers. In the default streaming scenario, completion content is buffered, the content filtering system runs on the buffered content, and – depending on the content filtering configuration – content is either returned to the user if it doesn't violate the content filtering policy (Microsoft's default or a custom user configuration), or it’s immediately blocked and returns a content filtering error, without returning the harmful completion content. Thisprocess is repeated until the end of the stream. Content is fully vetted according to the content filtering policy before it's returned to the user. Content isn't returned token-by-token inthis case, but in “content chunks” of the respective buffer size.
856
856
857
857
### Asynchronous Filter
858
858
859
-
Customers can choose the Asynchronous Filter as an additional option, providing a new streaming experience. In this case, content filters are run asynchronously, and completion content is returned immediately with a smooth token-by-token streaming experience. No content is buffered, which allows for a fast streaming experience with zero latency associated with content safety.
859
+
Customers can choose the Asynchronous Filter as an extra option, providing a newstreamingexperience. Inthis case, content filters are run asynchronously, and completion content is returned immediately with a smooth token-by-token streaming experience. No content is buffered, which allows for a fast streaming experience with zero latency associated with content safety.
860
860
861
-
Customers must be aware that while the feature improves latency, it's a trade-off against the safety and real-time vetting of smaller sections of model output. Because content filters are run asynchronously, content moderation messages and policy violation signals are delayed, which means some sections of harmful content that would otherwise have been filtered immediately could be displayed to the user.
861
+
Customers must understand that while the feature improves latency, it's a trade-off against the safety and real-time vetting of smaller sections of model output. Because content filters are run asynchronously, content moderation messages and policy violation signals are delayed, which means some sections of harmful content that would otherwise have been filtered immediately could be displayed to the user.
862
862
863
-
**Annotations**: Annotations and content moderation messages are continuously returned during the stream. We strongly recommend you consume annotations in your app and implement additionalAI content safety mechanisms, such as redacting content or returning additional safety information to the user.
863
+
**Annotations**: Annotations and content moderation messages are continuously returned during the stream. We strongly recommend you consume annotations in your app and implement other AI content safety mechanisms, such as redacting content or returning other safety information to the user.
864
864
865
-
**Content filtering signal**: The content filtering error signal is delayed. In case of a policy violation, it’s returned as soon as it’s available, and the stream is stopped. The content filtering signal is guaranteed within a ~1,000-character windowof the policy-violating content.
865
+
**Content filtering signal**: The content filtering error signal is delayed. If there is a policy violation, it’s returned as soon as it’s available, and the stream is stopped. The content filtering signal is guaranteed within a ~1,000-character window of the policy-violating content.
866
866
867
867
**Customer Copyright Commitment**: Content that is retroactively flagged as protected material may not be eligible for Customer Copyright Commitment coverage.
868
868
@@ -960,7 +960,7 @@ data: {
960
960
961
961
#### Sample response stream (passes filters)
962
962
963
-
Below is a real chat completion response using Asynchronous Filter. Note how the prompt annotations aren't changed, completion tokens are sent without annotations, and new annotation messages are sent without tokens—they are instead associated with certain content filter offsets.
963
+
Below is a real chat completion response using Asynchronous Filter. Note how the prompt annotations aren't changed, completion tokens are sent without annotations, and newannotation messages are sent without tokens—they're instead associated with certain content filter offsets.
0 commit comments