You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/ai-services/content-safety/concepts/custom-categories.md
+35-1Lines changed: 35 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -16,6 +16,38 @@ ms.author: pafarley
16
16
17
17
Azure AI Content Safety lets you create and manage your own content moderation categories for enhanced moderation and filtering that matches your specific policies or use cases.
The training pipeline is designed to leverage a combination of universal data assets, user-provided inputs, and advanced GPT model fine-tuning techniques to produce high-quality models tailored to specific tasks.
24
+
#### Data Assets
25
+
Filtered Universal Data: This component gathers datasets from multiple domains to create a comprehensive and diverse dataset collection. The goal is to have a robust data foundation that provides a variety of contexts for model training.
26
+
User Inputs
27
+
Customer Task Metadata: Metadata provided by customers, which defines the specific requirements and context of the task they wish the model to perform.
28
+
Customer Demonstrations: Sample demonstrations provided by customers that illustrate the expected output or behavior for the model. These demonstrations help optimize the model’s response based on real-world expectations.
29
+
30
+
#### Optimized Customer Prompt
31
+
Based on the customer metadata and demonstrations, an optimized prompt is generated. This prompt refines the inputs provided to the model, aligning it closely with customer needs and enhancing the model’s task performance.
32
+
33
+
#### GPTX Synthetic Task-Specific Dataset
34
+
Using the optimized prompt and filtered universal data, a synthetic, task-specific dataset is created. This dataset is tailored to the specific task requirements, enabling the model to understand and learn the desired behaviors and patterns.
35
+
### Model Training and Fine-Tuning
36
+
37
+
#### Model Options: The pipeline supports multiple language models (LM), including Zcode, SLM, or any other language model (LM) suitable for the task.
38
+
Task-Specific Fine-Tuned Model: The selected language model is fine-tuned on the synthetic task-specific dataset to produce a model that is highly optimized for the specific task.
39
+
User Outputs
40
+
41
+
#### ONNX Model: The fine-tuned model is converted into an ONNX (Open Neural Network Exchange) model format, ensuring compatibility and efficiency for deployment.
42
+
Deployment: The ONNX model is deployed, enabling users to make inference calls and access the model’s predictions. This deployment step ensures that the model is ready for production use in customer applications.
43
+
Key Features of the Training Pipeline
44
+
45
+
#### Task Specificity: The pipeline allows for the creation of models finely tuned to specific customer tasks, thanks to the integration of customer metadata and demonstrations.
46
+
- Scalability and Flexibility: The pipeline supports multiple language models, providing flexibility in choosing the model architecture best suited to the task.
47
+
- Efficiency in Deployment: The conversion to ONNX format ensures that the final model is lightweight and efficient, optimized for deployment environments.
48
+
- Continuous Improvement: By using synthetic datasets generated from diverse universal data sources, the pipeline can continuously improve model quality and applicability across various domains.
49
+
50
+
19
51
## Types of customization
20
52
21
53
There are multiple ways to define and use custom categories, which are detailed and compared in this section.
@@ -49,7 +81,9 @@ This implementation works on text content and image content.
The Azure AI Content Safety custom categories feature uses a multi-step process for creating, training, and using custom content classification models. Here's a look at the workflow:
Copy file name to clipboardExpand all lines: articles/ai-services/content-safety/includes/storage-account-access.md
+3-1Lines changed: 3 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -10,7 +10,9 @@ ms.author: pafarley
10
10
---
11
11
12
12
13
-
Next, you need to give your Content Safety resource access to read from the Azure Storage resource. Enable system-assigned Managed identity for the Azure AI Content Safety instance and assign the role of **Storage Blob Data Contributor/Owner/Reader** to the identity:
13
+
Next, you need to give your Content Safety resource access to read from the Azure Storage resource. Enable system-assigned Managed identity for the Azure AI Content Safety instance and assign the role of **Storage Blob Data Contributor/Owner** to the identity:
14
+
> [!IMPORTANT]
15
+
> **Only Storage Blob Data Contributor or Storage Blob Data Owner are valid roles to proceed.**
14
16
15
17
1. Enable managed identity for the Azure AI Content Safety instance.
Follow this guide to use Azure AI Content Safety Custom categories (standard) REST API to create your own content categories for your use case and train Azure AI Content Safety to detect them in new text content.
@@ -25,6 +26,13 @@ For more information on Custom categories, see the [Custom categories concept pa
25
26
>
26
27
> The end-to-end execution of custom category training can take from around five hours to ten hours. Plan your moderation pipeline accordingly.
27
28
29
+
30
+
## Custom categories (standard mode) (preview) User Flow
> Every time you change your category name, definition or samples, a new version will be created. You can use the version number to trace back to previous versions. Please remember this version number, as it will be required in the URL for the next step- training custom categories.
81
+
### Get new category version
82
+
83
+
```bash
84
+
curl -X PUT "<your_endpoint>/contentsafety/text/categories/survival-advice?api-version=2024-09-15-preview" \
63
85
-H "Ocp-Apim-Subscription-Key: <your_api_key>" \
64
86
-H "Content-Type: application/json" \
65
87
-d "{
@@ -71,10 +93,10 @@ curl -X PUT "<your_endpoint>/contentsafety/text/categories/survival-advice?api-v
71
93
72
94
### Start the category build process:
73
95
74
-
Replace `<your_api_key>` and `<your_endpoint>` with your own values. Allow enough time for model training: the end-to-end execution of custom category training can take from around five hours to ten hours. Plan your moderation pipeline accordingly. After you receive the response, store the operation ID (referred to as `id`) in a temporary location. This ID will be necessary for retrieving the build status using the **Get status** API in the next section.
96
+
Replace <your_api_key> and <your_endpoint> with your own values, and also **append the version number you obtained from the last step.** Allow enough time for model training: the end-to-end execution of custom category training can take from around five hours to ten hours. Plan your moderation pipeline accordingly. After you receive the response, store the operation ID (referred to as `id`) in a temporary location. This ID will be necessary for retrieving the build status using the **Get status** API in the next section.
75
97
76
98
```bash
77
-
curl -X POST "<your_endpoint>/contentsafety/text/categories/survival-advice:build?api-version=2024-02-15-preview" \
99
+
curl -X POST "<your_endpoint>/contentsafety/text/categories/survival-advice:build?api-version=2024-09-15-preview**&version={version}**" \
78
100
-H "Ocp-Apim-Subscription-Key: <your_api_key>" \
79
101
-H "Content-Type: application/json"
80
102
```
@@ -83,7 +105,7 @@ curl -X POST "<your_endpoint>/contentsafety/text/categories/survival-advice:buil
83
105
To retrieve the status, utilize the `id` obtained from the previous API response and place it in the path of the API below.
84
106
85
107
```bash
86
-
curl -X GET "<your_endpoint>/contentsafety/text/categories/operations/<id>?api-version=2024-02-15-preview" \
108
+
curl -X GET "<your_endpoint>/contentsafety/text/categories/operations/<id>?api-version=2024-09-15-preview" \
87
109
-H "Ocp-Apim-Subscription-Key: <your_api_key>" \
88
110
-H "Content-Type: application/json"
89
111
```
@@ -93,7 +115,7 @@ curl -X GET "<your_endpoint>/contentsafety/text/categories/operations/<id>?api-v
93
115
Run the following command to analyze text with your customized category. Replace `<your_api_key>` and `<your_endpoint>` with your own values.
94
116
95
117
```bash
96
-
curl -X POST "<your_endpoint>/contentsafety/text:analyzeCustomCategory?api-version=2024-02-15-preview" \
118
+
curl -X POST "<your_endpoint>/contentsafety/text:analyzeCustomCategory?api-version=2024-09-15-preview" \
97
119
-H "Ocp-Apim-Subscription-Key: <your_api_key>" \
98
120
-H "Content-Type: application/json" \
99
121
-d "{
@@ -132,7 +154,7 @@ You can create a new category with *category name*, *definition* and *sample_blo
Copy file name to clipboardExpand all lines: articles/ai-services/content-safety/quickstart-groundedness.md
+5-5Lines changed: 5 additions & 5 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -137,7 +137,7 @@ The parameters in the request body are defined in this table:
137
137
| - `query`| (Optional) This represents the question in a QnA task. Character limit: 7,500. | String |
138
138
|**text**| (Required) The LLM output text to be checked. Character limit: 7,500. | String |
139
139
|**groundingSources**| (Required) Uses an array of grounding sources to validate AI-generated text. See [Input requirements](./overview.md#input-requirements) for limits. | String array |
140
-
|**reasoning**| (Optional) Specifies whether to use the reasoning feature. The default value is `false`. If `true`, you need to bring your own Azure OpenAI GPT-4 Turbo (1106-preview) resources to provide an explanation. Be careful: using reasoning increases the processing time.| Boolean |
140
+
|**reasoning**| (Optional) Specifies whether to use the reasoning feature. The default value is `false`. If `true`, you need to bring your own Azure OpenAI GPT4o (0513, 0806 version) to provide an explanation. Be careful: using reasoning increases the processing time.| Boolean |
141
141
142
142
### Interpret the API response
143
143
@@ -171,7 +171,7 @@ The Groundedness detection API provides the option to include _reasoning_ in the
171
171
### Connect your own GPT deployment
172
172
173
173
> [!TIP]
174
-
> We only support **Azure OpenAI GPT-4 Turbo (1106-preview)** resources and do not support other GPT types. You have the flexibility to deploy your GPT-4 Turbo (1106-preview) resources in any region. However, to minimize potential latency and avoid any geographical boundary data privacy and risk concerns, we recommend situating them in the same region as your content safety resources. For comprehensive details on data privacy, refer to the [Data, privacy and security guidelines for Azure OpenAI Service](/legal/cognitive-services/openai/data-privacy) and [Data, privacy, and security for Azure AI Content Safety](/legal/cognitive-services/content-safety/data-privacy?context=%2Fazure%2Fai-services%2Fcontent-safety%2Fcontext%2Fcontext).
174
+
> We only support **Azure OpenAI GPT4o (0513, 0806 version) ** resources and do not support other GPT types. You have the flexibility to deploy your Azure OpenAI GPT4o (0513, 0806 version) resources in any region. However, to minimize potential latency and avoid any geographical boundary data privacy and risk concerns, we recommend situating them in the same region as your content safety resources. For comprehensive details on data privacy, refer to the [Data, privacy and security guidelines for Azure OpenAI Service](/legal/cognitive-services/openai/data-privacy) and [Data, privacy, and security for Azure AI Content Safety](/legal/cognitive-services/content-safety/data-privacy?context=%2Fazure%2Fai-services%2Fcontent-safety%2Fcontext%2Fcontext).
175
175
176
176
In order to use your Azure OpenAI GPT4o (0513, 0806 version) resource to enable the reasoning feature, use Managed Identity to allow your Content Safety resource to access the Azure OpenAI resource:
177
177
@@ -296,7 +296,7 @@ The parameters in the request body are defined in this table:
296
296
| **groundingSources** | (Required) Uses an array of grounding sources to validate AI-generated text. See [Input requirements](./overview.md#input-requirements) for limits, | String array |
297
297
| **reasoning** | (Optional) Set to `true`, the service uses Azure OpenAI resources to provide an explanation. Be careful: using reasoning increases the processing time and incurs extra fees.| Boolean |
298
298
| **llmResource** | (Required) If you want to use your own Azure OpenAI GPT4o (0513, 0806 version) resource to enable reasoning, add this field and include the subfields for the resources used. | String |
299
-
| - `resourceType `| Specifies the type of resource being used. Currently it only allows `AzureOpenAI`. We only support Azure OpenAI GPT-4 Turbo (1106-preview) resources and do not support other GPT types. | Enum|
299
+
| - `resourceType `| Specifies the type of resource being used. Currently it only allows `AzureOpenAI`. We only support Azure OpenAI GPT4o (0513, 0806 version) resources and do not support other GPT types. | Enum|
300
300
| - `azureOpenAIEndpoint `| Your endpoint URL for Azure OpenAI service. | String |
301
301
| - `azureOpenAIDeploymentName` | The name of the specific GPT deployment to use. | String|
302
302
@@ -353,7 +353,7 @@ The groundedness detection API includes a correction feature that automatically
353
353
### Connect your own GPT deployment
354
354
355
355
> [!TIP]
356
-
> Currently, the correction feature supports only **Azure OpenAI GPT-4 Turbo (1106-preview)** resources. To minimize latency and adhere to data privacy guidelines, it's recommended to deploy your GPT-4 Turbo (1106-preview) resources in the same region as your content safety resources. For more details on data privacy, please refer to the [Data, privacy and security guidelines for Azure OpenAI Service](/legal/cognitive-services/openai/data-privacy?context=/azure/ai-services/openai/context/context)
356
+
> Currently, the correction feature supports only **Azure OpenAI GPT4o (0513, 0806 version) ** resources. To minimize latency and adhere to data privacy guidelines, it's recommended to deploy your Azure OpenAI GPT4o (0513, 0806 version) in the same region as your content safety resources. For more details on data privacy, please refer to the [Data, privacy and security guidelines for Azure OpenAI Service](/legal/cognitive-services/openai/data-privacy?context=/azure/ai-services/openai/context/context)
357
357
and [Data, privacy, and security for Azure AI Content Safety](/legal/cognitive-services/content-safety/data-privacy?context=/azure/ai-services/content-safety/context/context).
358
358
359
359
To use your Azure OpenAI GPT4o (0513, 0806 version) resource for enabling the correction feature, use Managed Identity to allow your Content Safety resource to access the Azure OpenAI resource. Follow the steps in the [earlier section](#connect-your-own-gpt-deployment) to set up the Managed Identity.
@@ -450,7 +450,7 @@ The parameters in the request body are defined in this table:
450
450
|**groundingSources**| (Required) Uses an array of grounding sources to validate AI-generated text. See [Input requirements](./overview.md#input-requirements) for limits. | String Array |
451
451
|**correction**| (Optional) Set to `true`, the service uses Azure OpenAI resources to provide the corrected text, ensuring consistency with the grounding sources. Be careful: using correction increases the processing time and incurs extra fees.| Boolean |
452
452
|**llmResource**| (Required) If you want to use your own Azure OpenAI GPT4o (0513, 0806 version) resource to enable reasoning, add this field and include the subfields for the resources used. | String |
453
-
| - `resourceType `| Specifies the type of resource being used. Currently it only allows `AzureOpenAI`. We only support Azure OpenAI GPT-4 Turbo (1106-preview) resources and do not support other GPT types. | Enum|
453
+
| - `resourceType `| Specifies the type of resource being used. Currently it only allows `AzureOpenAI`. We only support Azure OpenAI GPT4o (0513, 0806 version) resources and do not support other GPT types. | Enum|
454
454
| - `azureOpenAIEndpoint `| Your endpoint URL for Azure OpenAI service. | String |
455
455
| - `azureOpenAIDeploymentName`| The name of the specific GPT deployment to use. | String|
0 commit comments