Skip to content

Commit 66cda23

Browse files
authored
Merge pull request #1107 from MicrosoftDocs/main
10/29 11:00 AM IST Publish
2 parents d84fb33 + 2c9fac4 commit 66cda23

File tree

6 files changed

+106
-25
lines changed

6 files changed

+106
-25
lines changed

articles/ai-services/content-safety/concepts/custom-categories.md

Lines changed: 35 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,38 @@ ms.author: pafarley
1616

1717
Azure AI Content Safety lets you create and manage your own content moderation categories for enhanced moderation and filtering that matches your specific policies or use cases.
1818

19+
## Custom categories Training Pipeline Overview
20+
![image](https://github.com/user-attachments/assets/2e097136-0e37-4b5e-ba59-cafcfd733d72)
21+
22+
### Pipeline Components
23+
The training pipeline is designed to leverage a combination of universal data assets, user-provided inputs, and advanced GPT model fine-tuning techniques to produce high-quality models tailored to specific tasks.
24+
#### Data Assets
25+
Filtered Universal Data: This component gathers datasets from multiple domains to create a comprehensive and diverse dataset collection. The goal is to have a robust data foundation that provides a variety of contexts for model training.
26+
User Inputs
27+
Customer Task Metadata: Metadata provided by customers, which defines the specific requirements and context of the task they wish the model to perform.
28+
Customer Demonstrations: Sample demonstrations provided by customers that illustrate the expected output or behavior for the model. These demonstrations help optimize the model’s response based on real-world expectations.
29+
30+
#### Optimized Customer Prompt
31+
Based on the customer metadata and demonstrations, an optimized prompt is generated. This prompt refines the inputs provided to the model, aligning it closely with customer needs and enhancing the model’s task performance.
32+
33+
#### GPTX Synthetic Task-Specific Dataset
34+
Using the optimized prompt and filtered universal data, a synthetic, task-specific dataset is created. This dataset is tailored to the specific task requirements, enabling the model to understand and learn the desired behaviors and patterns.
35+
### Model Training and Fine-Tuning
36+
37+
#### Model Options: The pipeline supports multiple language models (LM), including Zcode, SLM, or any other language model (LM) suitable for the task.
38+
Task-Specific Fine-Tuned Model: The selected language model is fine-tuned on the synthetic task-specific dataset to produce a model that is highly optimized for the specific task.
39+
User Outputs
40+
41+
#### ONNX Model: The fine-tuned model is converted into an ONNX (Open Neural Network Exchange) model format, ensuring compatibility and efficiency for deployment.
42+
Deployment: The ONNX model is deployed, enabling users to make inference calls and access the model’s predictions. This deployment step ensures that the model is ready for production use in customer applications.
43+
Key Features of the Training Pipeline
44+
45+
#### Task Specificity: The pipeline allows for the creation of models finely tuned to specific customer tasks, thanks to the integration of customer metadata and demonstrations.
46+
- Scalability and Flexibility: The pipeline supports multiple language models, providing flexibility in choosing the model architecture best suited to the task.
47+
- Efficiency in Deployment: The conversion to ONNX format ensures that the final model is lightweight and efficient, optimized for deployment environments.
48+
- Continuous Improvement: By using synthetic datasets generated from diverse universal data sources, the pipeline can continuously improve model quality and applicability across various domains.
49+
50+
1951
## Types of customization
2052

2153
There are multiple ways to define and use custom categories, which are detailed and compared in this section.
@@ -49,7 +81,9 @@ This implementation works on text content and image content.
4981

5082
## How it works
5183

52-
#### [Custom categories (standard) API](#tab/standard)
84+
### [Custom categories (standard) API](#tab/standard)
85+
![image](https://github.com/user-attachments/assets/5c377ec4-379b-4b41-884c-13524ca126d0)
86+
5387

5488
The Azure AI Content Safety custom categories feature uses a multi-step process for creating, training, and using custom content classification models. Here's a look at the workflow:
5589

articles/ai-services/content-safety/includes/storage-account-access.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,9 @@ ms.author: pafarley
1010
---
1111

1212

13-
Next, you need to give your Content Safety resource access to read from the Azure Storage resource. Enable system-assigned Managed identity for the Azure AI Content Safety instance and assign the role of **Storage Blob Data Contributor/Owner/Reader** to the identity:
13+
Next, you need to give your Content Safety resource access to read from the Azure Storage resource. Enable system-assigned Managed identity for the Azure AI Content Safety instance and assign the role of **Storage Blob Data Contributor/Owner** to the identity:
14+
> [!IMPORTANT]
15+
> **Only Storage Blob Data Contributor or Storage Blob Data Owner are valid roles to proceed.**
1416
1517
1. Enable managed identity for the Azure AI Content Safety instance.
1618

articles/ai-services/content-safety/quickstart-custom-categories.md

Lines changed: 31 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,7 @@ ms.date: 07/03/2024
1111
ms.author: pafarley
1212
---
1313

14+
1415
# Quickstart: Custom categories (standard mode) (preview)
1516

1617
Follow this guide to use Azure AI Content Safety Custom categories (standard) REST API to create your own content categories for your use case and train Azure AI Content Safety to detect them in new text content.
@@ -25,6 +26,13 @@ For more information on Custom categories, see the [Custom categories concept pa
2526
>
2627
> The end-to-end execution of custom category training can take from around five hours to ten hours. Plan your moderation pipeline accordingly.
2728
29+
30+
## Custom categories (standard mode) (preview) User Flow
31+
32+
![image](https://github.com/user-attachments/assets/2a510f8c-9f88-461e-9082-64d5e05ce13a)
33+
34+
35+
2836
## Prerequisites
2937

3038
* An Azure subscription - [Create one for free](https://azure.microsoft.com/free/cognitive-services/)
@@ -59,7 +67,21 @@ In the command below, replace `<your_api_key>`, `<your_endpoint>`, and other nec
5967
### Create new category version
6068

6169
```bash
62-
curl -X PUT "<your_endpoint>/contentsafety/text/categories/survival-advice?api-version=2024-02-15-preview" \
70+
curl -X PUT "<your_endpoint>/contentsafety/text/categories/survival-advice?api-version=2024-09-15-preview" \
71+
-H "Ocp-Apim-Subscription-Key: <your_api_key>" \
72+
-H "Content-Type: application/json" \
73+
-d "{
74+
\"categoryName\": \"survival-advice\",
75+
\"definition\": \"text prompts about survival advice in camping/wilderness situations\",
76+
\"sampleBlobUrl\": \"https://<your-azure-storage-url>/example-container/survival-advice.jsonl\"
77+
}"
78+
```
79+
> [!TIP]
80+
> Every time you change your category name, definition or samples, a new version will be created. You can use the version number to trace back to previous versions. Please remember this version number, as it will be required in the URL for the next step- training custom categories.
81+
### Get new category version
82+
83+
```bash
84+
curl -X PUT "<your_endpoint>/contentsafety/text/categories/survival-advice?api-version=2024-09-15-preview" \
6385
-H "Ocp-Apim-Subscription-Key: <your_api_key>" \
6486
-H "Content-Type: application/json" \
6587
-d "{
@@ -71,10 +93,10 @@ curl -X PUT "<your_endpoint>/contentsafety/text/categories/survival-advice?api-v
7193

7294
### Start the category build process:
7395

74-
Replace `<your_api_key>` and `<your_endpoint>` with your own values. Allow enough time for model training: the end-to-end execution of custom category training can take from around five hours to ten hours. Plan your moderation pipeline accordingly. After you receive the response, store the operation ID (referred to as `id`) in a temporary location. This ID will be necessary for retrieving the build status using the **Get status** API in the next section.
96+
Replace <your_api_key> and <your_endpoint> with your own values, and also **append the version number you obtained from the last step.** Allow enough time for model training: the end-to-end execution of custom category training can take from around five hours to ten hours. Plan your moderation pipeline accordingly. After you receive the response, store the operation ID (referred to as `id`) in a temporary location. This ID will be necessary for retrieving the build status using the **Get status** API in the next section.
7597

7698
```bash
77-
curl -X POST "<your_endpoint>/contentsafety/text/categories/survival-advice:build?api-version=2024-02-15-preview" \
99+
curl -X POST "<your_endpoint>/contentsafety/text/categories/survival-advice:build?api-version=2024-09-15-preview**&version={version}**" \
78100
-H "Ocp-Apim-Subscription-Key: <your_api_key>" \
79101
-H "Content-Type: application/json"
80102
```
@@ -83,7 +105,7 @@ curl -X POST "<your_endpoint>/contentsafety/text/categories/survival-advice:buil
83105
To retrieve the status, utilize the `id` obtained from the previous API response and place it in the path of the API below.
84106

85107
```bash
86-
curl -X GET "<your_endpoint>/contentsafety/text/categories/operations/<id>?api-version=2024-02-15-preview" \
108+
curl -X GET "<your_endpoint>/contentsafety/text/categories/operations/<id>?api-version=2024-09-15-preview" \
87109
-H "Ocp-Apim-Subscription-Key: <your_api_key>" \
88110
-H "Content-Type: application/json"
89111
```
@@ -93,7 +115,7 @@ curl -X GET "<your_endpoint>/contentsafety/text/categories/operations/<id>?api-v
93115
Run the following command to analyze text with your customized category. Replace `<your_api_key>` and `<your_endpoint>` with your own values.
94116

95117
```bash
96-
curl -X POST "<your_endpoint>/contentsafety/text:analyzeCustomCategory?api-version=2024-02-15-preview" \
118+
curl -X POST "<your_endpoint>/contentsafety/text:analyzeCustomCategory?api-version=2024-09-15-preview" \
97119
-H "Ocp-Apim-Subscription-Key: <your_api_key>" \
98120
-H "Content-Type: application/json" \
99121
-d "{
@@ -132,7 +154,7 @@ You can create a new category with *category name*, *definition* and *sample_blo
132154

133155
```python
134156
def create_new_category_version(category_name, definition, sample_blob_url):
135-
url = f"{ENDPOINT}/contentsafety/text/categories/{category_name}?api-version=2024-02-15-preview"
157+
url = f"{ENDPOINT}/contentsafety/text/categories/{category_name}?api-version=2024-09-15-preview"
136158
data = {
137159
"categoryName": category_name,
138160
"definition": definition,
@@ -156,7 +178,7 @@ You can start the category build process with the *category name* and *version n
156178

157179
```python
158180
def trigger_category_build_process(category_name, version):
159-
url = f"{ENDPOINT}/contentsafety/text/categories/{category_name}:build?api-version=2024-02-15-preview&version={version}"
181+
url = f"{ENDPOINT}/contentsafety/text/categories/{category_name}:build?api-version=2024-09-15-preview&version={version}"
160182
response = requests.post(url, headers=headers)
161183
return response.status_code
162184

@@ -174,7 +196,7 @@ To retrieve the status, utilize the `id` obtained from the previous response.
174196

175197
```python
176198
def get_build_status(id):
177-
url = f"{ENDPOINT}/contentsafety/text/categories/operations/{id}?api-version=2024-02-15-preview"
199+
url = f"{ENDPOINT}/contentsafety/text/categories/operations/{id}?api-version=2024-09-15-preview"
178200
response = requests.get(url, headers=headers)
179201
return response.status_code
180202

@@ -192,7 +214,7 @@ You need to specify the *category name* and the *version number* (optional; the
192214

193215
```python
194216
def analyze_text_with_customized_category(text, category_name, version):
195-
url = f"{ENDPOINT}/contentsafety/text:analyzeCustomCategory?api-version=2024-02-15-preview"
217+
url = f"{ENDPOINT}/contentsafety/text:analyzeCustomCategory?api-version=2024-09-15-preview"
196218
data = {
197219
"text": text,
198220
"categoryName": category_name,

articles/ai-services/content-safety/quickstart-groundedness.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -137,7 +137,7 @@ The parameters in the request body are defined in this table:
137137
| - `query` | (Optional) This represents the question in a QnA task. Character limit: 7,500. | String |
138138
| **text** | (Required) The LLM output text to be checked. Character limit: 7,500. | String |
139139
| **groundingSources** | (Required) Uses an array of grounding sources to validate AI-generated text. See [Input requirements](./overview.md#input-requirements) for limits. | String array |
140-
| **reasoning** | (Optional) Specifies whether to use the reasoning feature. The default value is `false`. If `true`, you need to bring your own Azure OpenAI GPT-4 Turbo (1106-preview) resources to provide an explanation. Be careful: using reasoning increases the processing time.| Boolean |
140+
| **reasoning** | (Optional) Specifies whether to use the reasoning feature. The default value is `false`. If `true`, you need to bring your own Azure OpenAI GPT4o (0513, 0806 version) to provide an explanation. Be careful: using reasoning increases the processing time.| Boolean |
141141

142142
### Interpret the API response
143143

@@ -171,7 +171,7 @@ The Groundedness detection API provides the option to include _reasoning_ in the
171171
### Connect your own GPT deployment
172172

173173
> [!TIP]
174-
> We only support **Azure OpenAI GPT-4 Turbo (1106-preview)** resources and do not support other GPT types. You have the flexibility to deploy your GPT-4 Turbo (1106-preview) resources in any region. However, to minimize potential latency and avoid any geographical boundary data privacy and risk concerns, we recommend situating them in the same region as your content safety resources. For comprehensive details on data privacy, refer to the [Data, privacy and security guidelines for Azure OpenAI Service](/legal/cognitive-services/openai/data-privacy) and [Data, privacy, and security for Azure AI Content Safety](/legal/cognitive-services/content-safety/data-privacy?context=%2Fazure%2Fai-services%2Fcontent-safety%2Fcontext%2Fcontext).
174+
> We only support **Azure OpenAI GPT4o (0513, 0806 version) ** resources and do not support other GPT types. You have the flexibility to deploy your Azure OpenAI GPT4o (0513, 0806 version) resources in any region. However, to minimize potential latency and avoid any geographical boundary data privacy and risk concerns, we recommend situating them in the same region as your content safety resources. For comprehensive details on data privacy, refer to the [Data, privacy and security guidelines for Azure OpenAI Service](/legal/cognitive-services/openai/data-privacy) and [Data, privacy, and security for Azure AI Content Safety](/legal/cognitive-services/content-safety/data-privacy?context=%2Fazure%2Fai-services%2Fcontent-safety%2Fcontext%2Fcontext).
175175
176176
In order to use your Azure OpenAI GPT4o (0513, 0806 version) resource to enable the reasoning feature, use Managed Identity to allow your Content Safety resource to access the Azure OpenAI resource:
177177

@@ -296,7 +296,7 @@ The parameters in the request body are defined in this table:
296296
| **groundingSources** | (Required) Uses an array of grounding sources to validate AI-generated text. See [Input requirements](./overview.md#input-requirements) for limits, | String array |
297297
| **reasoning** | (Optional) Set to `true`, the service uses Azure OpenAI resources to provide an explanation. Be careful: using reasoning increases the processing time and incurs extra fees.| Boolean |
298298
| **llmResource** | (Required) If you want to use your own Azure OpenAI GPT4o (0513, 0806 version) resource to enable reasoning, add this field and include the subfields for the resources used. | String |
299-
| - `resourceType `| Specifies the type of resource being used. Currently it only allows `AzureOpenAI`. We only support Azure OpenAI GPT-4 Turbo (1106-preview) resources and do not support other GPT types. | Enum|
299+
| - `resourceType `| Specifies the type of resource being used. Currently it only allows `AzureOpenAI`. We only support Azure OpenAI GPT4o (0513, 0806 version) resources and do not support other GPT types. | Enum|
300300
| - `azureOpenAIEndpoint `| Your endpoint URL for Azure OpenAI service. | String |
301301
| - `azureOpenAIDeploymentName` | The name of the specific GPT deployment to use. | String|
302302
@@ -353,7 +353,7 @@ The groundedness detection API includes a correction feature that automatically
353353
### Connect your own GPT deployment
354354

355355
> [!TIP]
356-
> Currently, the correction feature supports only **Azure OpenAI GPT-4 Turbo (1106-preview)** resources. To minimize latency and adhere to data privacy guidelines, it's recommended to deploy your GPT-4 Turbo (1106-preview) resources in the same region as your content safety resources. For more details on data privacy, please refer to the [Data, privacy and security guidelines for Azure OpenAI Service](/legal/cognitive-services/openai/data-privacy?context=/azure/ai-services/openai/context/context)
356+
> Currently, the correction feature supports only **Azure OpenAI GPT4o (0513, 0806 version) ** resources. To minimize latency and adhere to data privacy guidelines, it's recommended to deploy your Azure OpenAI GPT4o (0513, 0806 version) in the same region as your content safety resources. For more details on data privacy, please refer to the [Data, privacy and security guidelines for Azure OpenAI Service](/legal/cognitive-services/openai/data-privacy?context=/azure/ai-services/openai/context/context)
357357
and [Data, privacy, and security for Azure AI Content Safety](/legal/cognitive-services/content-safety/data-privacy?context=/azure/ai-services/content-safety/context/context).
358358
359359
To use your Azure OpenAI GPT4o (0513, 0806 version) resource for enabling the correction feature, use Managed Identity to allow your Content Safety resource to access the Azure OpenAI resource. Follow the steps in the [earlier section](#connect-your-own-gpt-deployment) to set up the Managed Identity.
@@ -450,7 +450,7 @@ The parameters in the request body are defined in this table:
450450
| **groundingSources** | (Required) Uses an array of grounding sources to validate AI-generated text. See [Input requirements](./overview.md#input-requirements) for limits. | String Array |
451451
| **correction** | (Optional) Set to `true`, the service uses Azure OpenAI resources to provide the corrected text, ensuring consistency with the grounding sources. Be careful: using correction increases the processing time and incurs extra fees.| Boolean |
452452
| **llmResource** | (Required) If you want to use your own Azure OpenAI GPT4o (0513, 0806 version) resource to enable reasoning, add this field and include the subfields for the resources used. | String |
453-
| - `resourceType `| Specifies the type of resource being used. Currently it only allows `AzureOpenAI`. We only support Azure OpenAI GPT-4 Turbo (1106-preview) resources and do not support other GPT types. | Enum|
453+
| - `resourceType `| Specifies the type of resource being used. Currently it only allows `AzureOpenAI`. We only support Azure OpenAI GPT4o (0513, 0806 version) resources and do not support other GPT types. | Enum|
454454
| - `azureOpenAIEndpoint `| Your endpoint URL for Azure OpenAI service. | String |
455455
| - `azureOpenAIDeploymentName` | The name of the specific GPT deployment to use. | String|
456456

articles/ai-studio/how-to/develop/trace-local-sdk.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -9,8 +9,8 @@ ms.custom:
99
ms.topic: how-to
1010
ms.date: 5/21/2024
1111
ms.reviewer: chenlujiao
12-
ms.author: sgilley
13-
author: sdgilley
12+
ms.author: lagayhar
13+
author: lgayhardt
1414
---
1515

1616
# How to trace your application with prompt flow SDK | Azure AI Studio

0 commit comments

Comments
 (0)