Skip to content

Commit ea80b70

Browse files
committed
add joe content
1 parent c7ebf62 commit ea80b70

File tree

1 file changed

+19
-16
lines changed

1 file changed

+19
-16
lines changed

articles/ai-services/content-understanding/concepts/best-practices.md

Lines changed: 19 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -12,56 +12,59 @@ ms.date: 02/24/2025
1212

1313
# Best Practices for Content Understanding
1414

15-
Azure AI Content Understanding in an innovative Generative AI service designed to facilitate the precise and accurate analysis of extensive data sets. The service proficiently processes various content modalities, including documents, images, videos, and audio, transforming them into a user-specified output format. This document provides guidance and best practices to effectively utilize Content Understanding for your data processing and analysis requirements.
15+
Azure AI Content Understanding is an innovative Generative AI service designed to facilitate the precise and accurate analysis of extensive data sets. The service processes various content modalities, including documents, images, videos, and audio, transforming them into user-specified output formats.
1616

17+
This document provides guidance and best practices to effectively utilize Content Understanding for your data processing and analysis requirements.
18+
19+
---
1720

1821
## Use field descriptions to guide output
1922

2023
When defining a schema, it's essential to provide detailed field descriptions. Clear and concise descriptions guide the model to focus on the correct information, improving the accuracy of the output.
2124

2225
#####   ***Example***
2326

24-
* If you want to extract the date from an invoice, in addition to naming the field `"Date"`, provide a description such as:
27+
* If you want to extract the date from an invoice, in addition to naming the field `Date`, provide a description such as:
2528

2629

27-
> **"The date when the invoice was issued, typically found at the top right corner of the document."**
30+
> `The date when the invoice was issued, typically found at the top right corner of the document.`
2831
2932

3033
#####   ***Example***
3134

32-
* Suppose you want to extract the `"Customer Name"` from an invoice. Your description might read:
35+
* Suppose you want to extract the `Customer Name` from an invoice. Your description might read:
3336

34-
**"The name of the customer or client to whom this invoice is addressed, usually located near the billing address. It should be the name of the business or person, but not the entire mailing address."**
37+
> `The name of the customer or client to whom this invoice is addressed, usually located near the billing address. It should be the name of the business or person, but not the entire mailing address.`
3538
---
3639

3740
## Correct mistakes by editing field descriptions
3841

39-
If the system's output isn't meeting expectations, the first thing to try is refining and updating the field descriptions. By clarifying the context and being more explicit about what you need, you reduce ambiguity and improve accuracy.
42+
If the system's output isn't meeting expectations, the first step is to try refining and updating the field descriptions. Clarifying the context and being more explicit about what you need, reduces ambiguity and improves accuracy.
4043

4144
#####   ***Example***
4245

43-
* If the `"shipping date"` field generated inconsistent or incorrect extraction, often after a "Dispatch Date" label, update it to something more precise like:
46+
* If the `Shipping date` field generated inconsistent or incorrect extraction, often after a `Dispatch Date` label, update it to something more precise like:
4447

45-
**"The date when the products were shipped, typically found below the item list. It may also be labeled something similar like Delivery Date or Dispatch Date. Dates should typically have a format like 1/23/2024 or 01-04-2025."**
48+
> `The date when the products were shipped, typically found below the item list. It may also be labeled something similar like Delivery Date or Dispatch Date. Dates should typically have a format like 1/23/2024 or 01-04-2025.`
4649
4750
* This extra context guides the model to the right location in the document.
4851

4952

5053
## Use classification fields for specific outputs
5154

52-
When you need the system to choose from a set of predefined options (for example, document type, product category, or status), use classification fields. When there's ambiguity with the options, provide clear descriptions for each option, enabling the model to categorize the data accurately.
55+
When you need the system to choose from a set of predefined options, for example, document type, product category, or status, use classification fields. Where there's ambiguity with the options, provide clear descriptions for each option, enabling the model to categorize the data accurately.
5356

5457
#####   ***Example***
5558

56-
* If you need to classify documents as either `"Invoice"`, `"Claim"`, or `"Report"`, create a classification field with these words as category names.
59+
* If you need to classify documents as either `Invoice`, `Claim`, or `Report`, create a classification field with these words as category names.
5760

5861
#####   ***Example***
5962

60-
* When processing product images, you might need to assign them to categories like `"AlcoholicDrinks"`, `"SoftDrinks"`, `"Snacks"`, and `"DairyProducts"`. Since some items can appear similar, providing precise definitions for close-call cases can help. For example:
63+
* When processing product images, you might need to assign them to categories like `AlcoholicDrinks`, `SoftDrinks`, `Snacks`, and `DairyProducts`. Since some items can appear similar, providing precise definitions for close-call cases can help. For example:
6164

62-
* **`"Alcoholic Drinks"`**: Beverages containing alcohol, such as beer, wine, and spirits. This category excludes soft drinks or other nonalcoholic beverages.
65+
* **`Alcoholic Drinks`**: Beverages containing alcohol, such as beer, wine, and spirits. This category excludes soft drinks or other nonalcoholic beverages.
6366

64-
* **`"Soft Drinks"`**: Carbonated nonalcoholic beverages, such as soda and sparkling water. This category doesn't include juices or alcoholic drinks.
67+
* **`Soft Drinks`**: Carbonated nonalcoholic beverages, such as soda and sparkling water. This category doesn't include juices or alcoholic drinks.
6568

6669
* By clearly defining each category, you ensure that the system correctly classifies products while minimizing misclassification.
6770

@@ -71,17 +74,17 @@ Confidence scores help you decide when to involve human reviewers. Customers can
7174

7275
#####   ***Example***
7376

74-
* For an invoice review use case, if a key extracted field like `"TotalInvoiceAmount"` has a confidence score under **0.80**, route that document to manual review. This helps ensure that a human verifies critical fields like invoice totals or legal statements when necessary.
77+
* For an invoice review use case, if a key extracted field like `TotalInvoiceAmount` has a confidence score under **0.80**, route that document to manual review. This helps ensure that a human verifies critical fields like invoice totals or legal statements when necessary.
7578

76-
* You might set different confidence thresholds based on the type of field. For instance, a lower threshold for a `"Comments"` field that's less critical and a higher one for `"ContractTerminationDate"` to ensure no mistakes.
79+
* You might set different confidence thresholds based on the type of field. For instance, a lower threshold for a `Comments` field that's less critical and a higher one for `ContractTerminationDate` to ensure no mistakes.
7780

7881
## Reduce errors by narrowing language selection for audio and video
7982

8083
When you're working with audio and video content, selecting a narrow set of languages for transcription can potentially reduce errors. The more languages you include, the more the system has to guess which language is being spoken, which cam increase misrecognition.
8184

8285
#####   ***Example***
8386

84-
* If you're certain that the content only contains English and Spanish, configuring your transcription to these two languages only can improve quality. But if the content accidentally includes other languages, such configuration can actually degrade overall quality.
87+
* If you're certain that the content only contains English and Spanish, configuring your transcription to only these two languages can improve quality. But if the content accidentally includes other languages, such configuration can actually degrade overall quality.
8588

8689

8790
## Transcript, document text, and speaker data don't require fields

0 commit comments

Comments
 (0)