Skip to content

Commit beebdfc

Browse files
authored
Merge pull request #475 from MicrosoftDocs/main
9/24/2024 PM Publish
2 parents 62da9cb + cc1f32d commit beebdfc

35 files changed

+139
-155
lines changed

articles/ai-studio/how-to/develop/evaluate-sdk.md

Lines changed: 12 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -9,9 +9,9 @@ ms.custom:
99
- references_regions
1010
ms.topic: how-to
1111
ms.date: 09/24/2024
12-
ms.reviewer: dantaylo
13-
ms.author: eur
14-
author: eric-urban
12+
ms.reviewer: minthigpen
13+
ms.author: lagayhar
14+
author: lgayhardt
1515
---
1616
# Evaluate with the Azure AI Evaluation SDK
1717

@@ -87,17 +87,16 @@ When using AI-assisted performance and quality metrics, you must specify a GPT m
8787
You can run the built-in evaluators by importing the desired evaluator class. Ensure that you set your environment variables.
8888
```python
8989
import os
90-
from promptflow.core import AzureOpenAIModelConfiguration
9190

9291
# Initialize Azure OpenAI Connection with your environment variables
93-
model_config = AzureOpenAIModelConfiguration(
94-
azure_endpoint=os.environ.get("AZURE_OPENAI_ENDPOINT"),
95-
api_key=os.environ.get("AZURE_OPENAI_API_KEY"),
96-
azure_deployment=os.environ.get("AZURE_OPENAI_DEPLOYMENT"),
97-
api_version=os.environ.get("AZURE_OPENAI_API_VERSION"),
98-
)
92+
model_config = {
93+
"azure_endpoint": os.environ.get("AZURE_OPENAI_ENDPOINT"),
94+
"api_key": os.environ.get("AZURE_OPENAI_API_KEY"),
95+
"azure_deployment": os.environ.get("AZURE_OPENAI_DEPLOYMENT"),
96+
"api_version": os.environ.get("AZURE_OPENAI_API_VERSION"),
97+
}
9998

100-
from azure.ai.evaluation.evaluators import RelevanceEvaluator
99+
from azure.ai.evaluation import RelevanceEvaluator
101100

102101
# Initialzing Relevance Evaluator
103102
relevance_eval = RelevanceEvaluator(model_config)
@@ -131,7 +130,7 @@ azure_ai_project = {
131130
"project_name": "<project_name>",
132131
}
133132

134-
from azure.ai.evaluation.evaluators import ViolenceEvaluator
133+
from azure.ai.evaluation import ViolenceEvaluator
135134

136135
# Initializing Violence Evaluator with project information
137136
violence_eval = ViolenceEvaluator(azure_ai_project)
@@ -329,7 +328,7 @@ After logging your custom evaluator to your AI Studio project, you can view it i
329328
After you spot-check your built-in or custom evaluators on a single row of data, you can combine multiple evaluators with the `evaluate()` API on an entire test dataset. In order to ensure the `evaluate()` can correctly parse the data, you must specify column mapping to map the column from the dataset to key words that are accepted by the evaluators. In this case, we specify the data mapping for `ground_truth`.
330329

331330
```python
332-
from azure.ai.evaluation.evaluate import evaluate
331+
from azure.ai.evaluation import evaluate
333332

334333
result = evaluate(
335334
data="data.jsonl", # provide your data here

articles/ai-studio/how-to/develop/simulator-interaction-data.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -10,9 +10,9 @@ ms.custom:
1010
- references_regions
1111
ms.topic: how-to
1212
ms.date: 9/24/2024
13-
ms.reviewer: eur
14-
ms.author: eur
15-
author: eric-urban
13+
ms.reviewer: minthigpen
14+
ms.author: lagayhar
15+
author: lgayhardt
1616
---
1717

1818
# Generate synthetic and simulated data for evaluation
@@ -306,8 +306,8 @@ The `AdversarialSimulator` supports a range of scenarios, hosted in the service,
306306
| Text Rewrite | `ADVERSARIAL_REWRITE` |1000 |Hateful and unfair content, Sexual content, Violent content, Self-harm-related content, Direct Attack (UPIA) Jailbreak |
307307
| Ungrounded Content Generation | `ADVERSARIAL_CONTENT_GEN_UNGROUNDED` |496 | Groundedness |
308308
| Grounded Content Generation | `ADVERSARIAL_CONTENT_GEN_GROUNDED` |475 |Groundedness |
309-
| Protected Material | `ADVERSARIAL_PROTECTED_MATERIAL` | 200 | Protected Material |
310-
|Indirect Attack (XPIA) Jailbreak | `ADVERSARIAL_INDIRECT_JAILBREAK` | 200 | Indirect Attack (XPIA) Jailbreak|
309+
| Protected Material | `ADVERSARIAL_PROTECTED_MATERIAL` | 306 | Protected Material |
310+
|Indirect Attack (XPIA) Jailbreak | `ADVERSARIAL_INDIRECT_JAILBREAK` | 100 | Indirect Attack (XPIA) Jailbreak|
311311

312312
### Simulating jailbreak attacks
313313

articles/ai-studio/toc.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -269,7 +269,7 @@ items:
269269
href: concepts/evaluation-improvement-strategies.md
270270
- name: Manually evaluate prompts in Azure AI Studio playground
271271
href: how-to/evaluate-prompts-playground.md
272-
- name: Generate adversarial simulations for safety evaluation
272+
- name: Generate synthetic and simulated data for evaluation
273273
href: how-to/develop/simulator-interaction-data.md
274274
- name: Evaluate with the Azure AI Evaluation SDK
275275
href: how-to/develop/evaluate-sdk.md

articles/machine-learning/how-to-create-image-labeling-projects.md

Lines changed: 24 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ ms.reviewer: vkann
88
ms.service: azure-machine-learning
99
ms.subservice: mldata
1010
ms.topic: how-to
11-
ms.date: 02/01/2024
11+
ms.date: 09/24/2024
1212
ms.custom: data4ml
1313
monikerRange: 'azureml-api-1 || azureml-api-2'
1414
# Customer intent: As a project manager, I want to set up a project to label images in the project. I want to enable machine learning-assisted labeling to help with the task.
@@ -28,7 +28,7 @@ You can also use the data labeling tool in Azure Machine Learning to [create a t
2828

2929
Azure Machine Learning data labeling is a tool you can use to create, manage, and monitor data labeling projects. Use it to:
3030

31-
- Coordinate data, labels, and team members to efficiently manage labeling tasks.
31+
- Coordinate data, labels, and team members to efficiently manage the labeling tasks.
3232
- Track progress and maintain the queue of incomplete labeling tasks.
3333
- Start and stop the project, and control the labeling progress.
3434
- Review and export the labeled data as an Azure Machine Learning dataset.
@@ -38,20 +38,20 @@ Azure Machine Learning data labeling is a tool you can use to create, manage, an
3838
3939
Image data can be any file that has one of these file extensions:
4040

41-
- *.jpg*
42-
- *.jpeg*
43-
- *.png*
44-
- *.jpe*
45-
- *.jfif*
46-
- *.bmp*
47-
- *.tif*
48-
- *.tiff*
49-
- *.dcm*
50-
- *.dicom*
41+
- `.jpg`
42+
- `.jpeg`
43+
- `.png`
44+
- `.jpe`
45+
- `.jfif`
46+
- `.bmp`
47+
- `.tif`
48+
- `.tiff`
49+
- `.dcm`
50+
- `.dicom`
5151

5252
Each file is an item to be labeled.
5353

54-
You can also use an MLTable data asset as input to an image labeling project, as long as the images in the table are one of the above formats. For more information, see [How to use MLTable data assets](./how-to-mltable.md).
54+
You can also use an `MLTable` data asset as input to an image labeling project, as long as the images in the table are one of the above formats. For more information, see [How to use `MLTable` data assets](./how-to-mltable.md).
5555

5656
## Prerequisites
5757

@@ -76,10 +76,10 @@ You use these items to set up image labeling in Azure Machine Learning:
7676
* To apply only a *single label* to an image from a set of labels, select **Image Classification Multi-class**.
7777
* To apply *one or more* labels to an image from a set of labels, select **Image Classification Multi-label**. For example, a photo of a dog might be labeled with both *dog* and *daytime*.
7878
* To assign a label to each object within an image and add bounding boxes, select **Object Identification (Bounding Box)**.
79-
* To assign a label to each object within an image and draw a polygon around each object, select **Instance Segmentation (Polygon)**.
79+
* To assign a label to each object within an image and draw a polygon around each object, select **Polygon (Instance Segmentation)**.
8080
* To draw masks on an image and assign a label class at the pixel level, select **Semantic Segmentation (Preview)**.
8181

82-
:::image type="content" source="media/how-to-create-labeling-projects/labeling-creation-wizard.png" alt-text="Screenshot that shows creating a labeling project to manage labeling.":::
82+
:::image type="content" source="media/how-to-create-labeling-projects/labeling-creation-wizard.png" alt-text="Screenshot that shows creating a labeling project to manage the labeling task.":::
8383

8484
1. Select **Next** to continue.
8585

@@ -98,7 +98,7 @@ You can also select **Create a dataset** to use an existing Azure datastore or t
9898
9999
### Data column mapping (preview)
100100

101-
If you select an MLTable data asset, an additional **Data Column Mapping** step appears for you to specify the column that contains the image URLs.
101+
If you select an MLTable data asset, another **Data Column Mapping** step appears for you to specify the column that contains the image URLs.
102102

103103
[!INCLUDE [mapping](includes/machine-learning-data-labeling-mapping.md)]
104104

@@ -169,7 +169,7 @@ For bounding boxes, important questions include:
169169
* How should labelers handle an object that isn't the object class of interest but has visual similarities to a relevant object type?
170170

171171
> [!NOTE]
172-
> Labelers can select the first nine labels by using number keys 1 through 9.
172+
> Labelers can select the first nine labels by using number keys 1 through 9. You might want to include this information in your instructions.
173173
174174
## Quality control (preview)
175175

@@ -180,7 +180,7 @@ For bounding boxes, important questions include:
180180
181181
## Use ML-assisted data labeling
182182

183-
To accelerate labeling tasks, on the **ML assisted labeling** page, you can trigger automatic machine learning models. Medical images (files that have a *.dcm* extension) aren't included in assisted labeling. If the project type is **Semantic Segmentation (Preview)**, ML-assisted labeling isn't available.
183+
To accelerate labeling tasks, on the **ML assisted labeling** page, you can trigger automatic machine learning models. Medical images (files that have a `.dcm` extension) aren't included in assisted labeling. If the project type is **Semantic Segmentation (Preview)**, ML-assisted labeling isn't available.
184184

185185
At the start of your labeling project, the items are shuffled into a random order to reduce potential bias. However, the trained model reflects any biases that are present in the dataset. For example, if 80 percent of your items are of a single class, then approximately 80 percent of the data used to train the model lands in that class.
186186

@@ -189,9 +189,9 @@ To enable assisted labeling, select **Enable ML assisted labeling** and specify
189189
ML-assisted labeling consists of two phases:
190190

191191
* Clustering
192-
* Pre-labeling
192+
* Prelabeling
193193

194-
The labeled data item count that's required to start assisted labeling isn't a fixed number. This number can vary significantly from one labeling project to another. For some projects, it's sometimes possible to see pre-label or cluster tasks after 300 items have been manually labeled. ML-assisted labeling uses a technique called *transfer learning*. Transfer learning uses a pre-trained model to jump-start the training process. If the classes of your dataset resemble the classes in the pre-trained model, pre-labels might become available after only a few hundred manually labeled items. If your dataset significantly differs from the data that's used to pre-train the model, the process might take more time.
194+
The labeled data item count that's needed to start assisted labeling isn't a fixed number. This number can vary significantly from one labeling project to another. For some projects, it's sometimes possible to see prelabel or cluster tasks after 300 items are manually labeled. ML-assisted labeling uses a technique called *transfer learning*. Transfer learning uses a pretrained model to jump-start the training process. If the classes of your dataset resemble the classes in the pretrained model, prelabels might become available after only a few hundred manually labeled items. If your dataset significantly differs from the data that's used to pretrain the model, the process might take more time.
195195

196196
When you use consensus labeling, the consensus label is used for training.
197197

@@ -208,11 +208,11 @@ After a machine learning model is trained on your manually labeled data, the mod
208208

209209
The clustering phase doesn't appear for object detection models or text classification.
210210

211-
### Pre-labeling
211+
### Prelabeling
212212

213-
After you submit enough labels for training, either a classification model predicts tags or an object detection model predicts bounding boxes. The labeler now sees pages that contain predicted labels already present on each item. For object detection, predicted boxes are also shown. The task involves reviewing these predictions and correcting any incorrectly labeled images before page submission.
213+
After you submit enough labels for training, either a classification model predicts tags, or an object detection model predicts bounding boxes. The labeler now sees pages that contain predicted labels already present on each item. For object detection, predicted boxes are also shown. The task involves reviewing these predictions and correcting any incorrectly labeled images before page submission.
214214

215-
After a machine learning model is trained on your manually labeled data, the model is evaluated on a test set of manually labeled items. The evaluation helps determine the model's accuracy at different confidence thresholds. The evaluation process sets a confidence threshold beyond which the model is accurate enough to show pre-labels. The model is then evaluated against unlabeled data. Items with predictions that are more confident than the threshold are used for pre-labeling.
215+
After a machine learning model is trained on your manually labeled data, the model is evaluated on a test set of manually labeled items. The evaluation helps determine the model's accuracy at different confidence thresholds. The evaluation process sets a confidence threshold beyond which the model is accurate enough to show prelabels. The model is then evaluated against unlabeled data. Items with predictions that are more confident than the threshold are used for prelabeling.
216216

217217
## Initialize the image labeling project
218218

@@ -224,8 +224,7 @@ After a machine learning model is trained on your manually labeled data, the mod
224224
[!INCLUDE [troubleshoot](includes/machine-learning-data-labeling-troubleshoot.md)]
225225

226226

227-
## Next steps
227+
## Related content
228228

229-
<!-- * [Tutorial: Create your first image classification labeling project](tutorial-labeling.md). -->
230229
* [Manage labeling projects](how-to-manage-labeling-projects.md)
231230
* [How to tag images](how-to-label-data.md)

articles/machine-learning/includes/machine-learning-data-labeling-initialize.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2,11 +2,11 @@
22
author: sgilley
33
ms.service: azure-machine-learning
44
ms.topic: include
5-
ms.date: 10/21/2021
5+
ms.date: 09/24/2024
66
ms.author: sdgilley
77
---
88

99
After the labeling project is initialized, some aspects of the project are immutable. You can't change the task type or dataset. You *can* modify labels and the URL for the task description. Carefully review the settings before you create the project. After you submit the project, you return to the **Data Labeling** overview page, which shows the project as **Initializing**.
1010

1111
> [!NOTE]
12-
> This page might not automatically refresh. After a pause, manually refresh the page to see the project's status as **Created**.
12+
> The overview page might not automatically refresh. After a pause, manually refresh the page to see the project's status as **Created**.

articles/machine-learning/includes/machine-learning-data-labeling-refresh.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22
author: sgilley
33
ms.service: azure-machine-learning
44
ms.topic: include
5-
ms.date: 12/08/2021
5+
ms.date: 09/24/2024
66
ms.author: sdgilley
77
---
88

@@ -15,7 +15,7 @@ Select **Enable incremental refresh at regular intervals** when you want your pr
1515
Clear the selection if you don't want new files in the datastore to automatically be added to your project.
1616

1717
> [!IMPORTANT]
18-
> Don't create a new version for the dataset you want to update. If you do, the updates won't be seen because the data labeling project is pinned to the initial version. Instead, use [Azure Storage Explorer](https://azure.microsoft.com/features/storage-explorer/) to modify your data in the appropriate folder in Blob Storage.
18+
> When incremental refresh is enabled, don't create a new version for the dataset you want to update. If you do, the updates won't be seen because the data labeling project is pinned to the initial version. Instead, use [Azure Storage Explorer](https://azure.microsoft.com/features/storage-explorer/) to modify your data in the appropriate folder in Blob Storage.
1919
>
2020
> Also, don't remove data. Removing data from the dataset your project uses causes an error in the project.
2121

articles/search/hybrid-search-how-to-query.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -28,7 +28,7 @@ To improve relevance, use these parameters:
2828

2929
+ A search index containing `searchable` vector and nonvector fields. See [Create an index](search-how-to-create-search-index.md) and [Add vector fields to a search index](vector-search-how-to-create-index.md).
3030

31-
+ (Optional) If you want [semantic ranking](semantic-how-to-configure.md), your search service must be Basic tier or higher, with [semantic ranking enabled](semantic-how-to-enable-disable.md).
31+
+ (Optional) If you want the [semantic ranker](semantic-search-overview.md), your search service must be Basic tier or higher, with [semantic ranker enabled](semantic-how-to-enable-disable.md).
3232

3333
+ (Optional) If you want text-to-vector conversion of a query string, [create and assign a vectorizer](vector-search-how-to-configure-vectorizer.md) to vector fields in the search index.
3434

@@ -167,7 +167,7 @@ api-key: {{admin-api-key}}
167167

168168
## Semantic hybrid search
169169

170-
Assuming that you [enabled semantic ranking](semantic-how-to-enable-disable.md) and your index definition includes a [semantic configuration](semantic-how-to-query-request.md), you can formulate a query that includes vector search and keyword search, with semantic ranking over the merged result set. Optionally, you can add captions and answers.
170+
Assuming that you [enabled semantic ranker](semantic-how-to-enable-disable.md) and your index definition includes a [semantic configuration](semantic-how-to-query-request.md), you can formulate a query that includes vector search and keyword search, with semantic ranking over the merged result set. Optionally, you can add captions and answers.
171171

172172
```http
173173
POST https://{{search-service-name}}.search.windows.net/indexes/{{index-name}}/docs/search?api-version=2024-07-01
@@ -200,7 +200,7 @@ api-key: {{admin-api-key}}
200200

201201
**Key points:**
202202

203-
+ Semantic ranking accepts up to 50 results from the merged response.
203+
+ Semantic ranker accepts up to 50 results from the merged response.
204204

205205
+ "queryType" and "semanticConfiguration" are required.
206206

@@ -353,9 +353,9 @@ Both "k" and "top" are optional. Unspecified, the default number of results in a
353353
> [!NOTE]
354354
> The semantic ranker can take up to 50 results.
355355
356-
If you're using semantic ranking in 2024-05-01-preview API, it's a best practice to set "k" and "maxTextRecallSize" to sum to at least 50 total. You can then restrict the results returned to the user with the "top" parameter.
356+
If you're using semantic ranker in 2024-05-01-preview API, it's a best practice to set "k" and "maxTextRecallSize" to sum to at least 50 total. You can then restrict the results returned to the user with the "top" parameter.
357357

358-
If you're using semantic ranking in previous APIs do the following:
358+
If you're using semantic ranker in previous APIs do the following:
359359

360360
+ if doing keyword-only search (no vector) set "top" to 50
361361
+ if doing hybrid search set "k" to 50, to ensure that the semantic ranker gets at least 50 results.

0 commit comments

Comments
 (0)