Merge pull request #1107 from MicrosoftDocs/main

PhilKang0704 · web-flow · commit 66cda234be73 · 2024-10-29T13:30:21.000+08:00
10/29 11:00 AM IST Publish
diff --git a/articles/ai-services/content-safety/concepts/custom-categories.md b/articles/ai-services/content-safety/concepts/custom-categories.md
@@ -16,6 +16,38 @@ ms.author: pafarley
 
 Azure AI Content Safety lets you create and manage your own content moderation categories for enhanced moderation and filtering that matches your specific policies or use cases.
 
+## Custom categories Training Pipeline Overview
+![image](https://github.com/user-attachments/assets/2e097136-0e37-4b5e-ba59-cafcfd733d72)
+
+### Pipeline Components
+The training pipeline is designed to leverage a combination of universal data assets, user-provided inputs, and advanced GPT model fine-tuning techniques to produce high-quality models tailored to specific tasks.
+#### Data Assets
+Filtered Universal Data: This component gathers datasets from multiple domains to create a comprehensive and diverse dataset collection. The goal is to have a robust data foundation that provides a variety of contexts for model training.
+User Inputs
+Customer Task Metadata: Metadata provided by customers, which defines the specific requirements and context of the task they wish the model to perform.
+Customer Demonstrations: Sample demonstrations provided by customers that illustrate the expected output or behavior for the model. These demonstrations help optimize the model’s response based on real-world expectations.
+
+#### Optimized Customer Prompt
+Based on the customer metadata and demonstrations, an optimized prompt is generated. This prompt refines the inputs provided to the model, aligning it closely with customer needs and enhancing the model’s task performance.
+
+#### GPTX Synthetic Task-Specific Dataset
+Using the optimized prompt and filtered universal data, a synthetic, task-specific dataset is created. This dataset is tailored to the specific task requirements, enabling the model to understand and learn the desired behaviors and patterns.
+### Model Training and Fine-Tuning
+
+#### Model Options: The pipeline supports multiple language models (LM), including Zcode, SLM, or any other language model (LM) suitable for the task.
+Task-Specific Fine-Tuned Model: The selected language model is fine-tuned on the synthetic task-specific dataset to produce a model that is highly optimized for the specific task.
+User Outputs
+
+#### ONNX Model: The fine-tuned model is converted into an ONNX (Open Neural Network Exchange) model format, ensuring compatibility and efficiency for deployment.
+Deployment: The ONNX model is deployed, enabling users to make inference calls and access the model’s predictions. This deployment step ensures that the model is ready for production use in customer applications.
+Key Features of the Training Pipeline
+
+#### Task Specificity: The pipeline allows for the creation of models finely tuned to specific customer tasks, thanks to the integration of customer metadata and demonstrations.
+- Scalability and Flexibility: The pipeline supports multiple language models, providing flexibility in choosing the model architecture best suited to the task.
+- Efficiency in Deployment: The conversion to ONNX format ensures that the final model is lightweight and efficient, optimized for deployment environments.
+- Continuous Improvement: By using synthetic datasets generated from diverse universal data sources, the pipeline can continuously improve model quality and applicability across various domains.
+
+
 ## Types of customization
 
 There are multiple ways to define and use custom categories, which are detailed and compared in this section.
@@ -49,7 +81,9 @@ This implementation works on text content and image content.
 
 ## How it works
 
-#### [Custom categories (standard) API](#tab/standard)
+### [Custom categories (standard) API](#tab/standard)
+![image](https://github.com/user-attachments/assets/5c377ec4-379b-4b41-884c-13524ca126d0)
+
 
 The Azure AI Content Safety custom categories feature uses a multi-step process for creating, training, and using custom content classification models. Here's a look at the workflow:
 
diff --git a/articles/ai-services/content-safety/includes/storage-account-access.md b/articles/ai-services/content-safety/includes/storage-account-access.md
@@ -10,7 +10,9 @@ ms.author: pafarley
 ---
 
 
-Next, you need to give your Content Safety resource access to read from the Azure Storage resource. Enable system-assigned Managed identity for the Azure AI Content Safety instance and assign the role of **Storage Blob Data Contributor/Owner/Reader** to the identity:
+Next, you need to give your Content Safety resource access to read from the Azure Storage resource. Enable system-assigned Managed identity for the Azure AI Content Safety instance and assign the role of **Storage Blob Data Contributor/Owner** to the identity:
+> [!IMPORTANT]
+> **Only Storage Blob Data Contributor or Storage Blob Data Owner are valid roles to proceed.**
 
 1. Enable managed identity for the Azure AI Content Safety instance. 
 
diff --git a/articles/ai-services/content-safety/quickstart-custom-categories.md b/articles/ai-services/content-safety/quickstart-custom-categories.md
@@ -11,6 +11,7 @@ ms.date: 07/03/2024
 ms.author: pafarley
 ---
 
+
 # Quickstart: Custom categories (standard mode) (preview)
 
 Follow this guide to use Azure AI Content Safety Custom categories (standard) REST API to create your own content categories for your use case and train Azure AI Content Safety to detect them in new text content.
@@ -25,6 +26,13 @@ For more information on Custom categories, see the [Custom categories concept pa
 >
 > The end-to-end execution of custom category training can take from around five hours to ten hours. Plan your moderation pipeline accordingly.
 
+
+## Custom categories (standard mode) (preview) User Flow
+
+![image](https://github.com/user-attachments/assets/2a510f8c-9f88-461e-9082-64d5e05ce13a)
+
+
+
 ## Prerequisites
 
 * An Azure subscription - [Create one for free](https://azure.microsoft.com/free/cognitive-services/)
@@ -59,7 +67,21 @@ In the command below, replace `<your_api_key>`, `<your_endpoint>`, and other nec
 ### Create new category version
 
 ```bash
-curl -X PUT "<your_endpoint>/contentsafety/text/categories/survival-advice?api-version=2024-02-15-preview" \
+curl -X PUT "<your_endpoint>/contentsafety/text/categories/survival-advice?api-version=2024-09-15-preview" \
+     -H "Ocp-Apim-Subscription-Key: <your_api_key>" \
+     -H "Content-Type: application/json" \
+     -d "{
+            \"categoryName\": \"survival-advice\",
+            \"definition\": \"text prompts about survival advice in camping/wilderness situations\",
+            \"sampleBlobUrl\": \"https://<your-azure-storage-url>/example-container/survival-advice.jsonl\"
+        }"
+```
+> [!TIP]
+> Every time you change your category name, definition or samples, a new version will be created. You can use the version number to trace back to previous versions. Please remember this version number, as it will be required in the URL for the next step- training custom categories.
+### Get new category version
+
+```bash
+curl -X PUT "<your_endpoint>/contentsafety/text/categories/survival-advice?api-version=2024-09-15-preview" \
      -H "Ocp-Apim-Subscription-Key: <your_api_key>" \
      -H "Content-Type: application/json" \
      -d "{
@@ -71,10 +93,10 @@ curl -X PUT "<your_endpoint>/contentsafety/text/categories/survival-advice?api-v
 
 ### Start the category build process:
 
-Replace `<your_api_key>` and `<your_endpoint>` with your own values. Allow enough time for model training: the end-to-end execution of custom category training can take from around five hours to ten hours. Plan your moderation pipeline accordingly. After you receive the response, store the operation ID (referred to as `id`) in a temporary location. This ID will be necessary for retrieving the build status using the **Get status** API in the next section.
+Replace <your_api_key> and <your_endpoint> with your own values, and also **append the version number you obtained from the last step.** Allow enough time for model training: the end-to-end execution of custom category training can take from around five hours to ten hours. Plan your moderation pipeline accordingly. After you receive the response, store the operation ID (referred to as `id`) in a temporary location. This ID will be necessary for retrieving the build status using the **Get status** API in the next section.
 
 ```bash
-curl -X POST "<your_endpoint>/contentsafety/text/categories/survival-advice:build?api-version=2024-02-15-preview" \
+curl -X POST "<your_endpoint>/contentsafety/text/categories/survival-advice:build?api-version=2024-09-15-preview**&version={version}**" \
      -H "Ocp-Apim-Subscription-Key: <your_api_key>" \
      -H "Content-Type: application/json"
 ```
@@ -83,7 +105,7 @@ curl -X POST "<your_endpoint>/contentsafety/text/categories/survival-advice:buil
 To retrieve the status, utilize the `id` obtained from the previous API response and place it in the path of the API below.
 
 ```bash
-curl -X GET "<your_endpoint>/contentsafety/text/categories/operations/<id>?api-version=2024-02-15-preview" \
+curl -X GET "<your_endpoint>/contentsafety/text/categories/operations/<id>?api-version=2024-09-15-preview" \
      -H "Ocp-Apim-Subscription-Key: <your_api_key>" \
      -H "Content-Type: application/json"
 ```
@@ -93,7 +115,7 @@ curl -X GET "<your_endpoint>/contentsafety/text/categories/operations/<id>?api-v
 Run the following command to analyze text with your customized category. Replace `<your_api_key>` and `<your_endpoint>` with your own values.
 
 ```bash
-curl -X POST "<your_endpoint>/contentsafety/text:analyzeCustomCategory?api-version=2024-02-15-preview" \
+curl -X POST "<your_endpoint>/contentsafety/text:analyzeCustomCategory?api-version=2024-09-15-preview" \
      -H "Ocp-Apim-Subscription-Key: <your_api_key>" \
      -H "Content-Type: application/json" \
      -d "{
@@ -132,7 +154,7 @@ You can create a new category with *category name*, *definition* and *sample_blo
 
 ```python
 def create_new_category_version(category_name, definition, sample_blob_url):
-    url = f"{ENDPOINT}/contentsafety/text/categories/{category_name}?api-version=2024-02-15-preview"
+    url = f"{ENDPOINT}/contentsafety/text/categories/{category_name}?api-version=2024-09-15-preview"
     data = {
         "categoryName": category_name,
         "definition": definition,
@@ -156,7 +178,7 @@ You can start the category build process with the *category name* and *version n
 
 ```python
 def trigger_category_build_process(category_name, version):
-    url = f"{ENDPOINT}/contentsafety/text/categories/{category_name}:build?api-version=2024-02-15-preview&version={version}"
+    url = f"{ENDPOINT}/contentsafety/text/categories/{category_name}:build?api-version=2024-09-15-preview&version={version}"
     response = requests.post(url, headers=headers)
     return response.status_code
 
@@ -174,7 +196,7 @@ To retrieve the status, utilize the `id` obtained from the previous response.
 
 ```python
 def get_build_status(id):
-    url = f"{ENDPOINT}/contentsafety/text/categories/operations/{id}?api-version=2024-02-15-preview"
+    url = f"{ENDPOINT}/contentsafety/text/categories/operations/{id}?api-version=2024-09-15-preview"
     response = requests.get(url, headers=headers)
     return response.status_code
 
@@ -192,7 +214,7 @@ You need to specify the *category name* and the *version number* (optional; the
 
 ```python
 def analyze_text_with_customized_category(text, category_name, version):
-    url = f"{ENDPOINT}/contentsafety/text:analyzeCustomCategory?api-version=2024-02-15-preview"
+    url = f"{ENDPOINT}/contentsafety/text:analyzeCustomCategory?api-version=2024-09-15-preview"
     data = {
         "text": text,
         "categoryName": category_name,
diff --git a/articles/ai-services/content-safety/quickstart-groundedness.md b/articles/ai-services/content-safety/quickstart-groundedness.md
@@ -137,7 +137,7 @@ The parameters in the request body are defined in this table:
 | - `query`       | (Optional) This represents the question in a QnA task. Character limit: 7,500. | String  |
 | **text**   | (Required) The LLM output text to be checked. Character limit: 7,500. |  String  |
 | **groundingSources**  | (Required) Uses an array of grounding sources to validate AI-generated text. See [Input requirements](./overview.md#input-requirements) for limits. | String array    |
-| **reasoning**  | (Optional) Specifies whether to use the reasoning feature. The default value is `false`. If `true`, you need to bring your own Azure OpenAI GPT-4 Turbo (1106-preview) resources to provide an explanation. Be careful: using reasoning increases the processing time.| Boolean   |
+| **reasoning**  | (Optional) Specifies whether to use the reasoning feature. The default value is `false`. If `true`, you need to bring your own Azure OpenAI GPT4o (0513, 0806 version) to provide an explanation. Be careful: using reasoning increases the processing time.| Boolean   |
 
 ### Interpret the API response
 
@@ -171,7 +171,7 @@ The Groundedness detection API provides the option to include _reasoning_ in the
 ### Connect your own GPT deployment
 
 > [!TIP]
-> We only support **Azure OpenAI GPT-4 Turbo (1106-preview)** resources and do not support other GPT types. You have the flexibility to deploy your GPT-4 Turbo (1106-preview) resources in any region. However, to minimize potential latency and avoid any geographical boundary data privacy and risk concerns, we recommend situating them in the same region as your content safety resources. For comprehensive details on data privacy, refer to the [Data, privacy and security guidelines for Azure OpenAI Service](/legal/cognitive-services/openai/data-privacy) and [Data, privacy, and security for Azure AI Content Safety](/legal/cognitive-services/content-safety/data-privacy?context=%2Fazure%2Fai-services%2Fcontent-safety%2Fcontext%2Fcontext).
+> We only support **Azure OpenAI GPT4o (0513, 0806 version) ** resources and do not support other GPT types. You have the flexibility to deploy your Azure OpenAI GPT4o (0513, 0806 version) resources in any region. However, to minimize potential latency and avoid any geographical boundary data privacy and risk concerns, we recommend situating them in the same region as your content safety resources. For comprehensive details on data privacy, refer to the [Data, privacy and security guidelines for Azure OpenAI Service](/legal/cognitive-services/openai/data-privacy) and [Data, privacy, and security for Azure AI Content Safety](/legal/cognitive-services/content-safety/data-privacy?context=%2Fazure%2Fai-services%2Fcontent-safety%2Fcontext%2Fcontext).
 
 In order to use your Azure OpenAI GPT4o (0513, 0806 version) resource to enable the reasoning feature, use Managed Identity to allow your Content Safety resource to access the Azure OpenAI resource:
 
@@ -296,7 +296,7 @@ The parameters in the request body are defined in this table:
 | **groundingSources**  | (Required) Uses an array of grounding sources to validate AI-generated text. See [Input requirements](./overview.md#input-requirements) for limits, | String array    |
 | **reasoning**  | (Optional) Set to `true`, the service uses Azure OpenAI resources to provide an explanation. Be careful: using reasoning increases the processing time and incurs extra fees.| Boolean   |
 | **llmResource**  | (Required) If you want to use your own Azure OpenAI GPT4o (0513, 0806 version) resource to enable reasoning, add this field and include the subfields for the resources used. | String   |
-| - `resourceType `| Specifies the type of resource being used. Currently it only allows `AzureOpenAI`. We only support Azure OpenAI GPT-4 Turbo (1106-preview) resources and do not support other GPT types. | Enum|
+| - `resourceType `| Specifies the type of resource being used. Currently it only allows `AzureOpenAI`. We only support Azure OpenAI GPT4o (0513, 0806 version)  resources and do not support other GPT types. | Enum|
 | - `azureOpenAIEndpoint `| Your endpoint URL for Azure OpenAI service.  | String |
 | - `azureOpenAIDeploymentName` | The name of the specific GPT deployment to use. | String|
 
@@ -353,7 +353,7 @@ The groundedness detection API includes a correction feature that automatically
 ### Connect your own GPT deployment
 
 > [!TIP]
-> Currently, the correction feature supports only **Azure OpenAI GPT-4 Turbo (1106-preview)** resources. To minimize latency and adhere to data privacy guidelines, it's recommended to deploy your GPT-4 Turbo (1106-preview) resources in the same region as your content safety resources. For more details on data privacy, please refer to the [Data, privacy and security guidelines for Azure OpenAI Service](/legal/cognitive-services/openai/data-privacy?context=/azure/ai-services/openai/context/context)
+> Currently, the correction feature supports only **Azure OpenAI GPT4o (0513, 0806 version) ** resources. To minimize latency and adhere to data privacy guidelines, it's recommended to deploy your Azure OpenAI GPT4o (0513, 0806 version) in the same region as your content safety resources. For more details on data privacy, please refer to the [Data, privacy and security guidelines for Azure OpenAI Service](/legal/cognitive-services/openai/data-privacy?context=/azure/ai-services/openai/context/context)
  and [Data, privacy, and security for Azure AI Content Safety](/legal/cognitive-services/content-safety/data-privacy?context=/azure/ai-services/content-safety/context/context).
 
 To use your Azure OpenAI GPT4o (0513, 0806 version) resource for enabling the correction feature, use Managed Identity to allow your Content Safety resource to access the Azure OpenAI resource. Follow the steps in the [earlier section](#connect-your-own-gpt-deployment) to set up the Managed Identity.
@@ -450,7 +450,7 @@ The parameters in the request body are defined in this table:
 | **groundingSources**  | (Required) Uses an array of grounding sources to validate AI-generated text. See [Input requirements](./overview.md#input-requirements) for limits. | String Array    |
 | **correction**  | (Optional) Set to `true`, the service uses Azure OpenAI resources to provide the corrected text, ensuring consistency with the grounding sources. Be careful: using correction increases the processing time and incurs extra fees.| Boolean   |
 | **llmResource**  | (Required) If you want to use your own Azure OpenAI GPT4o (0513, 0806 version) resource to enable reasoning, add this field and include the subfields for the resources used. | String   |
-| - `resourceType `| Specifies the type of resource being used. Currently it only allows `AzureOpenAI`. We only support Azure OpenAI GPT-4 Turbo (1106-preview) resources and do not support other GPT types. | Enum|
+| - `resourceType `| Specifies the type of resource being used. Currently it only allows `AzureOpenAI`. We only support Azure OpenAI GPT4o (0513, 0806 version) resources and do not support other GPT types. | Enum|
 | - `azureOpenAIEndpoint `| Your endpoint URL for Azure OpenAI service.  | String |
 | - `azureOpenAIDeploymentName` | The name of the specific GPT deployment to use. | String|
 
diff --git a/articles/ai-studio/how-to/develop/trace-local-sdk.md b/articles/ai-studio/how-to/develop/trace-local-sdk.md
@@ -9,8 +9,8 @@ ms.custom:
 ms.topic: how-to
 ms.date: 5/21/2024
 ms.reviewer: chenlujiao
-ms.author: sgilley
-author: sdgilley
+ms.author: lagayhar  
+author: lgayhardt
 ---
 
 # How to trace your application with prompt flow SDK | Azure AI Studio
diff --git a/articles/search/resource-training.md b/articles/search/resource-training.md