Skip to content

Commit 71f82b0

Browse files
authored
Updates for clarity based on Paul's feedback
1 parent 55d434e commit 71f82b0

File tree

1 file changed

+15
-39
lines changed

1 file changed

+15
-39
lines changed

articles/ai-services/content-understanding/best-practices.md

Lines changed: 15 additions & 39 deletions
Original file line numberDiff line numberDiff line change
@@ -5,20 +5,20 @@ This document provides guidance on how to effectively use Content Understanding
55

66
---
77

8-
## Use Field Descriptions to Guide Output
9-
When defining a schema, it is essential to provide detailed field descriptions and examples. Clear descriptions and relevant examples guide the model to focus on the correct information, improving the accuracy of the output.
8+
## Use field descriptions to guide output
9+
When defining a schema, it is essential to provide detailed field descriptions. Clear and concise descriptions guide the model to focus on the correct information, improving the accuracy of the output.
1010

1111
### Example 1:
1212
If you want to extract the date from an invoice, in addition to naming the field `"Date"`, provide a description such as:
1313
> **"The date when the invoice was issued, typically found at the top right corner of the document."**
1414
1515
### Example 2:
1616
Suppose you want to extract the `"Customer Name"` from an invoice. Your description might read:
17-
> **"It should be the name of the business or person, but not the entire mailing address. The name of the customer or client to whom this invoice is addressed, usually located near the billing address."**
17+
> **"The name of the customer or client to whom this invoice is addressed, usually located near the billing address. It should be the name of the business or person, but not the entire mailing address."**
1818
1919
---
2020

21-
## Fix Mistakes by Editing Field Descriptions
21+
## Fix mistakes by editing field descriptions
2222
If the system’s output isn’t meeting expectations, the first thing to try is refining and updating the field descriptions. By clarifying the context and being more explicit about what you need, you reduce ambiguity and improve accuracy.
2323

2424
### Example:
@@ -29,14 +29,14 @@ This extra context guides the model to the right location in the document.
2929

3030
---
3131

32-
## Use Classification Fields for Specific Outputs
32+
## Use classification fields for specific outputs
3333
When you need the system to choose from a set of predefined options (e.g., document type, product category, or status), use classification fields. When there's ambiguity with the options, provide clear descriptions for each option, enabling the model to categorize the data accurately.
3434

3535
### Example 1:
36-
If you need to classify documents as either `"Invoice"`, `"Claim"`, or `"Report"`, create a classification field with these words as class names.
36+
If you need to classify documents as either `"Invoice"`, `"Claim"`, or `"Report"`, create a classification field with these words as category names.
3737

3838
### Example 2:
39-
When processing product images, you might need to assign them to categories like `"Alcoholic Drinks"`, `"Soft Drinks"`, `"Snacks"`, and `"Dairy Products"`. Since some items may appear similar, providing precise definitions for close-call cases can help. For example:
39+
When processing product images, you might need to assign them to categories like `"AlcoholicDrinks"`, `"SoftDrinks"`, `"Snacks"`, and `"DairyProducts"`. Since some items may appear similar, providing precise definitions for close-call cases can help. For example:
4040

4141
- **`"Alcoholic Drinks"`**: Beverages containing alcohol, such as beer, wine, and spirits. This excludes soft drinks or non-alcoholic beverages.
4242
- **`"Soft Drinks"`**: Carbonated non-alcoholic beverages, such as soda and sparkling water. This does not include juices or alcoholic drinks.
@@ -45,52 +45,28 @@ By clearly defining each category, you ensure that the system correctly classifi
4545

4646
---
4747

48-
## Use Confidence Scores to Determine When Human Review is Needed
48+
## Use confidence scores to determine when human review is needed
4949
Confidence scores help you decide when to involve human reviewers. Customers can interpret confidence scores using thresholds to decide which results need more review, minimizing the risk of errors.
5050

5151
### Example:
52-
For an Invoice review use case, if a key extracted field like `"Total Invoice Amount"` has a confidence score under **80%**, the system routes that document for manual review. This ensures that critical fields like invoice totals or legal statements are verified by a human when necessary.
52+
For an Invoice review use case, if a key extracted field like `"TotalInvoiceAmount"` has a confidence score under **0.80**, route that document to manual review. This ensures that critical fields like invoice totals or legal statements are verified by a human when necessary.
5353

54-
You might set different confidence thresholds based on the type of field. For instance, a lower threshold for a `"comments"` field that’s less critical and a higher one for `"contract termination date"` to ensure no mistakes.
54+
You might set different confidence thresholds based on the type of field. For instance, a lower threshold for a `"Comments"` field that’s less critical and a higher one for `"ContractTerminationDate"` to ensure no mistakes.
5555

5656
---
5757

58-
## Reduce Errors by Narrowing Language Selection for Audio and Video
59-
When working with audio and video content, selecting a narrow set of languages for transcription significantly reduces errors. The more languages you include, the more the system has to guess which language is being spoken, which can lead to increased misclassifications.
58+
## Reduce errors by narrowing language selection for audio and video
59+
When working with audio and video content, selecting a narrow set of languages for transcription can potentially reduce errors. The more languages you include, the more the system has to guess which language is being spoken, which may increase misrecognition.
6060

6161
### Example 1:
62-
If your content is only in English and Spanish, configure your transcription to those two languages only. Avoid adding options like French or Arabic unless you truly need them.
62+
If you are certain that the content only contains English and Spanish, configuring your transcription to these two languages only may improve quality. But if the content accidentally includes another languages, such configuration may actually degrade overall quality.
6363

6464
### Example 2:
6565
If you have conference calls that occasionally include Portuguese speakers, add Portuguese only for those meetings. For all other calls, stick to English if that’s what you expect 90% of the time.
6666

6767
---
6868

69-
## Transcript, OCR Text, and Speaker Data Don’t Require Fields
70-
By default, Content Extraction information such as transcripts, OCR results, and video key frames can be accessed directly for immediate review or custom processing. Fields can be used when advanced transformations are needed (e.g., summarizing transcripts, identifying entities, or extracting specific items from OCR). Each field can instruct the system to extract or generate the content you need.
71-
72-
---
73-
74-
## Connect Analyzer to an Existing Content Understanding Project via REST API
75-
To connect an analyzer created through the REST API to an existing Content Understanding project, add the following `tags` to the analyzer JSON:
76-
77-
```json
78-
"projectId": "<add your project's ID here! You can find the projectId on the Azure Portal in the Overview of the project resource or by reading any analyzer you created via the project in AI Foundry through the REST API>",
79-
"templateId": "postCallAnalytics-2024-12-01"
80-
```
81-
82-
### Example:
83-
```json
84-
"tags": {"projectId": "1232abcdef1234","templateId": "postCallAnalytics-2024-12-01"}
85-
```
86-
87-
### Available Template IDs:
88-
| Modality | Template IDs |
89-
|-----------|-------------|
90-
| **Audio** | postCallAnalytics-2024-12-01, conversationAnalysis-2024-12-01 |
91-
| **Text** | text-2024-12-01 |
92-
| **Image** | image-2024-12-01 |
93-
| **Document** | - |
94-
| **Video** | - |
69+
## Transcript, OCR text, and speaker data do not require fields
70+
By default, Content Extraction information such as transcripts, OCR results, and video key frames can be accessed directly from the analyzer output for immediate review or custom processing. There is no need to define a field in the schema for these items. Fields can be used when additional processing is needed (e.g., summarizing transcripts, identifying entities, or extracting specific items from OCR). Each field can instruct the system to extract or generate the content you need.
9571

9672
---

0 commit comments

Comments
 (0)