Skip to content

Commit 9dc93d7

Browse files
author
Chien Yuan Chang
committed
[SAMPLE-DOC] update sample description
1 parent 39bc27b commit 9dc93d7

File tree

56 files changed

+462
-232
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

56 files changed

+462
-232
lines changed

sdk/contentunderstanding/azure-ai-contentunderstanding/samples/async_samples/sample_analyze_binary_async.py

Lines changed: 21 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -8,8 +8,10 @@
88
FILE: sample_analyze_binary_async.py
99
1010
DESCRIPTION:
11-
This sample demonstrates how to analyze a PDF file from disk using the `prebuilt-documentSearch`
12-
analyzer (async version).
11+
This sample demonstrates how to analyze a PDF file from disk using the prebuilt-documentSearch
12+
analyzer.
13+
14+
## About analyzing documents from binary data
1315
1416
One of the key values of Content Understanding is taking a content file and extracting the content
1517
for you in one call. The service returns an AnalyzeResult that contains an array of MediaContent
@@ -20,13 +22,24 @@
2022
This sample focuses on document analysis. For prebuilt RAG analyzers covering images, audio, and
2123
video, see sample_analyze_url_async.py.
2224
23-
The prebuilt-documentSearch analyzer transforms unstructured documents into structured, machine-
24-
readable data optimized for RAG scenarios. It generates rich GitHub Flavored Markdown that preserves
25-
document structure and can include structured text, tables (in HTML format), charts and diagrams,
26-
mathematical formulas, hyperlinks, barcodes, annotations, and page metadata.
25+
## Prebuilt analyzers
26+
27+
Content Understanding provides prebuilt RAG analyzers (the prebuilt-*Search analyzers, such as
28+
prebuilt-documentSearch) that return markdown and a one-paragraph Summary for each content item,
29+
making them useful for retrieval-augmented generation (RAG) and other downstream applications:
30+
31+
- prebuilt-documentSearch - Extracts content from documents (PDF, images, Office documents) with
32+
layout preservation, table detection, figure analysis, and structured markdown output.
33+
Optimized for RAG scenarios.
34+
- prebuilt-audioSearch - Transcribes audio content with speaker diarization, timing information,
35+
and conversation summaries. Supports multilingual transcription.
36+
- prebuilt-videoSearch - Analyzes video content with visual frame extraction, audio transcription,
37+
and structured summaries. Provides temporal alignment of visual and audio content.
38+
- prebuilt-imageSearch - Analyzes standalone images and returns a one-paragraph Summary of the
39+
image content. For images that contain text (including hand-written text), use
40+
prebuilt-documentSearch.
2741
28-
For documents that contain images with hand-written text, the prebuilt-documentSearch analyzer
29-
includes OCR capabilities by default.
42+
This sample uses prebuilt-documentSearch to extract structured content from PDF documents.
3043
3144
USAGE:
3245
python sample_analyze_binary_async.py

sdk/contentunderstanding/azure-ai-contentunderstanding/samples/async_samples/sample_analyze_configs_async.py

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -9,11 +9,11 @@
99
1010
DESCRIPTION:
1111
This sample demonstrates how to extract additional features from documents such as charts,
12-
hyperlinks, formulas, and annotations using the `prebuilt-documentSearch` analyzer, which has
12+
hyperlinks, formulas, and annotations using the prebuilt-documentSearch analyzer, which has
1313
formulas, layout, and OCR enabled by default.
1414
1515
ABOUT ANALYSIS CONFIGS:
16-
The `prebuilt-documentSearch` analyzer has the following configurations enabled by default:
16+
The prebuilt-documentSearch analyzer has the following configurations enabled by default:
1717
- ReturnDetails: true - Returns detailed information about document elements
1818
- EnableOcr: true - Performs OCR on documents
1919
- EnableLayout: true - Extracts layout information (tables, figures, hyperlinks, annotations)
@@ -34,7 +34,7 @@
3434
the analyzer.
3535
3636
PREREQUISITES:
37-
To get started you'll need a **Microsoft Foundry resource**. See sample_update_defaults.py
37+
To get started you'll need a Microsoft Foundry resource. See sample_update_defaults.py
3838
for setup guidance.
3939
4040
USAGE:

sdk/contentunderstanding/azure-ai-contentunderstanding/samples/async_samples/sample_analyze_invoice_async.py

Lines changed: 26 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -8,18 +8,40 @@
88
FILE: sample_analyze_invoice_async.py
99
1010
DESCRIPTION:
11-
Analyze an invoice using prebuilt analyzer (async version)
11+
This sample demonstrates how to analyze an invoice from a URL using the prebuilt-invoice analyzer
12+
and extract structured fields from the result.
13+
14+
## About analyzing invoices
15+
16+
Content Understanding provides a rich set of prebuilt analyzers that are ready to use without any
17+
configuration. These analyzers are powered by knowledge bases of thousands of real-world document
18+
examples, enabling them to understand document structure and adapt to variations in format and
19+
content.
20+
21+
Prebuilt analyzers are ideal for:
22+
- Content ingestion in search and retrieval-augmented generation (RAG) workflows
23+
- Intelligent document processing (IDP) to extract structured data from common document types
24+
- Agentic flows as tools for extracting structured representations from input files
25+
26+
### The prebuilt-invoice analyzer
27+
28+
The prebuilt-invoice analyzer is a domain-specific analyzer optimized for processing invoices,
29+
utility bills, sales orders, and purchase orders. It automatically extracts structured fields
30+
including:
1231
13-
This sample demonstrates how to analyze an invoice from a URL using the `prebuilt-invoice` analyzer
14-
and extract structured fields from the result. The prebuilt-invoice analyzer automatically extracts
15-
structured fields including:
1632
- Customer/Vendor information: Name, address, contact details
1733
- Invoice metadata: Invoice number, date, due date, purchase order number
1834
- Line items: Description, quantity, unit price, total for each item
1935
- Financial totals: Subtotal, tax amount, shipping charges, total amount
2036
- Payment information: Payment terms, payment method, remittance address
2137
2238
The analyzer works out of the box with various invoice formats and requires no configuration.
39+
It's part of the financial documents category of prebuilt analyzers, which also includes:
40+
- prebuilt-receipt - Sales receipts from retail and dining establishments
41+
- prebuilt-creditCard - Credit card statements
42+
- prebuilt-bankStatement.us - US bank statements
43+
- prebuilt-check.us - US bank checks
44+
- prebuilt-creditMemo - Credit memos and refund documents
2345
2446
USAGE:
2547
python sample_analyze_invoice_async.py

sdk/contentunderstanding/azure-ai-contentunderstanding/samples/async_samples/sample_analyze_url_async.py

Lines changed: 14 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -9,18 +9,26 @@
99
1010
DESCRIPTION:
1111
Another great value of Content Understanding is its rich set of prebuilt analyzers. Great examples
12-
of these are the RAG analyzers that work for all modalities (prebuilt-documentSearch, prebuilt-imageSearch,
13-
prebuilt-audioSearch, and prebuilt-videoSearch).
12+
of these are the RAG analyzers that work for all modalities (prebuilt-documentSearch,
13+
prebuilt-imageSearch, prebuilt-audioSearch, and prebuilt-videoSearch). This sample demonstrates
14+
these RAG analyzers. Many more prebuilt analyzers are available (for example, prebuilt-invoice);
15+
see the invoice sample or the prebuilt analyzer documentation to explore the full list.
1416
15-
This sample demonstrates these RAG analyzers with URL inputs. Content Understanding supports both
16-
local binary inputs (see sample_analyze_binary_async.py) and URL inputs across all modalities.
17+
## About analyzing URLs across modalities
18+
19+
Content Understanding supports both local binary inputs (see sample_analyze_binary_async.py) and URL
20+
inputs across all modalities. This sample focuses on prebuilt RAG analyzers (the prebuilt-*Search
21+
analyzers, such as prebuilt-documentSearch) with URL inputs.
1722
1823
Important: For URL inputs, use begin_analyze() with AnalyzeInput objects that wrap the URL.
19-
For binary data (local files), use begin_analyze_binary() instead.
24+
For binary data (local files), use begin_analyze_binary() instead. This sample demonstrates
25+
begin_analyze() with URL inputs.
2026
2127
Documents, HTML, and images with text are returned as DocumentContent (derived from MediaContent),
2228
while audio and video are returned as AudioVisualContent (also derived from MediaContent). These
23-
prebuilt RAG analyzers return markdown and a one-paragraph Summary for each content item.
29+
prebuilt RAG analyzers return markdown and a one-paragraph Summary for each content item;
30+
prebuilt-videoSearch can return multiple segments, so iterate over all contents rather than just
31+
the first.
2432
2533
USAGE:
2634
python sample_analyze_url_async.py

sdk/contentunderstanding/azure-ai-contentunderstanding/samples/async_samples/sample_copy_analyzer_async.py

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -9,13 +9,13 @@
99
1010
DESCRIPTION:
1111
This sample demonstrates how to copy an analyzer from source to target within the same
12-
resource using the copy_analyzer API. This is useful for creating copies of analyzers
13-
for testing, staging, or production deployment.
12+
Microsoft Foundry resource using the begin_copy_analyzer API. This is useful for
13+
creating copies of analyzers for testing, staging, or production deployment.
1414
15-
The copy_analyzer API allows you to copy an analyzer within the same Azure resource:
15+
About copying analyzers
16+
The begin_copy_analyzer API allows you to copy an analyzer within the same Azure resource:
1617
- Same-resource copy: Copies an analyzer from one ID to another within the same resource
1718
- Exact copy: The target analyzer is an exact copy of the source analyzer
18-
- Use cases: Testing, staging, production deployment, versioning
1919
2020
Note: For cross-resource copying (copying between different Azure resources or subscriptions),
2121
use the grant_copy_auth sample instead.

sdk/contentunderstanding/azure-ai-contentunderstanding/samples/async_samples/sample_create_analyzer_async.py

Lines changed: 29 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -9,17 +9,38 @@
99
1010
DESCRIPTION:
1111
This sample demonstrates how to create a custom analyzer with a field schema to extract
12-
structured data from documents.
12+
structured data from documents. While this sample shows document modalities, custom analyzers
13+
can also be created for video, audio, and image content. The same concepts apply across all
14+
modalities.
1315
14-
Custom analyzers allow you to:
16+
## About custom analyzers
17+
18+
Custom analyzers allow you to define a field schema that specifies what structured data to
19+
extract from documents. You can:
1520
- Define custom fields (string, number, date, object, array)
16-
- Specify extraction methods:
17-
- extract: Values are extracted as they appear in the content (literal text extraction)
18-
- generate: Values are generated freely based on the content using AI models
19-
- classify: Values are classified against a predefined set of categories
20-
- Use prebuilt analyzers as a base (prebuilt-document, prebuilt-audio, prebuilt-video, prebuilt-image)
21+
- Specify extraction methods to control how field values are extracted:
22+
- generate - Values are generated freely based on the content using AI models (best for
23+
complex or variable fields requiring interpretation)
24+
- classify - Values are classified against a predefined set of categories (best when using
25+
enum with a fixed set of possible values)
26+
- extract - Values are extracted as they appear in the content (best for literal text
27+
extraction from specific locations). Note: This method is only available for document
28+
content. Requires estimateSourceAndConfidence to be set to true for the field.
29+
30+
When not specified, the system automatically determines the best method based on the field
31+
type and description.
32+
- Use prebuilt analyzers as a base. Supported base analyzers include:
33+
- prebuilt-document - for document-based custom analyzers
34+
- prebuilt-audio - for audio-based custom analyzers
35+
- prebuilt-video - for video-based custom analyzers
36+
- prebuilt-image - for image-based custom analyzers
2137
- Configure analysis options (OCR, layout, formulas)
22-
- Enable source and confidence tracking for extracted field values
38+
- Enable source and confidence tracking: Set estimateFieldSourceAndConfidence to true at the
39+
analyzer level (in ContentAnalyzerConfig) or estimateSourceAndConfidence to true at the field
40+
level to get source location (page number, bounding box) and confidence scores for extracted
41+
field values. This is required for fields with method = extract and is useful for validation,
42+
quality assurance, debugging, and highlighting source text in user interfaces. Field-level
43+
settings override analyzer-level settings.
2344
2445
USAGE:
2546
python sample_create_analyzer_async.py

sdk/contentunderstanding/azure-ai-contentunderstanding/samples/async_samples/sample_create_classifier_async.py

Lines changed: 25 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -8,14 +8,31 @@
88
FILE: sample_create_classifier_async.py
99
1010
DESCRIPTION:
11-
This sample demonstrates how to create a classifier analyzer to categorize documents and
12-
use it to analyze documents with and without automatic segmentation.
13-
14-
Classifiers are a type of custom analyzer that categorize documents into predefined categories.
15-
They're useful for:
16-
- Document routing: Automatically route documents to the right processing pipeline
17-
- Content organization: Organize large document collections by type
18-
- Multi-document processing: Process files containing multiple document types by segmenting them
11+
This sample demonstrates how to create a classifier analyzer to categorize documents and use it
12+
to analyze documents with and without automatic segmentation.
13+
14+
## About classifiers
15+
16+
Classifiers are a type of custom analyzer that create classification workflows to categorize
17+
documents into predefined custom categories using ContentCategories. They allow you to perform
18+
classification and content extraction as part of a single API call. Classifiers are useful for:
19+
- Content organization: Organize large document collections by type through categorization
20+
- Data routing (optional): Optionally route your data to specific custom analyzers based on
21+
category, ensuring your data is routed to the best analyzer for processing when needed
22+
- Multi-document processing: Process files containing multiple document types by automatically
23+
segmenting them
24+
25+
Classifiers use custom categories to define the types of documents they can identify. Each
26+
category has a Description that helps the AI model understand what documents belong to that
27+
category. You can define up to 200 category names and descriptions. You can include an "other"
28+
category to handle unmatched content; otherwise, all files are forced to be classified into one
29+
of your defined categories.
30+
31+
The enable_segment property in the analyzer configuration controls whether multi-document files
32+
are split into segments:
33+
- enable_segment = False: Classifies the entire file as a single category (classify only)
34+
- enable_segment = True: Automatically splits the file into segments by category (classify and
35+
segment)
1936
2037
USAGE:
2138
python sample_create_classifier_async.py

sdk/contentunderstanding/azure-ai-contentunderstanding/samples/async_samples/sample_delete_result_async.py

Lines changed: 8 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -12,10 +12,14 @@
1212
This is useful for removing temporary or sensitive analysis results immediately, rather
1313
than waiting for automatic deletion after 24 hours.
1414
15-
Analysis results are stored temporarily and can be deleted using the delete_result API:
16-
- Immediate deletion: Results are marked for deletion and permanently removed
17-
- Automatic deletion: Results are automatically deleted after 24 hours if not manually deleted
18-
- Operation ID required: You need the operation ID from the analysis operation to delete
15+
About deleting results:
16+
Analysis results from analyze or begin_analyze are automatically deleted after 24 hours.
17+
However, you may want to delete results earlier in certain cases:
18+
- Remove sensitive data immediately: Ensure sensitive information is not retained longer than necessary
19+
- Comply with data retention policies: Meet requirements for data deletion
20+
21+
To delete results earlier than the 24-hour automatic deletion, use delete_result.
22+
This method requires the operation ID from the analysis operation.
1923
2024
Important: Once deleted, results cannot be recovered. Make sure you have saved any data
2125
you need before deleting.

sdk/contentunderstanding/azure-ai-contentunderstanding/samples/async_samples/sample_get_analyzer_async.py

Lines changed: 9 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -11,14 +11,18 @@
1111
This sample demonstrates how to retrieve information about analyzers, including prebuilt
1212
analyzers and custom analyzers.
1313
14-
The get_analyzer method allows you to retrieve detailed information about any analyzer:
15-
- Prebuilt analyzers: System-provided analyzers like prebuilt-documentSearch, prebuilt-invoice
14+
## About getting analyzer information
15+
16+
The get_analyzer method allows you to retrieve detailed information about any analyzer,
17+
including:
18+
- Prebuilt analyzers: System-provided analyzers like prebuilt-documentSearch, prebuilt-invoice,
19+
etc.
1620
- Custom analyzers: Analyzers you've created with custom field schemas or classifiers
1721
1822
This is useful for:
19-
- Verifying analyzer configuration
20-
- Inspecting prebuilt analyzers to learn about their capabilities
21-
- Debugging analyzer behavior
23+
- Verifying analyzer configuration: Check the current state of an analyzer
24+
- Inspecting prebuilt analyzers: Learn about available prebuilt analyzers and their capabilities
25+
- Debugging: Understand why an analyzer behaves a certain way
2226
2327
USAGE:
2428
python sample_get_analyzer_async.py

sdk/contentunderstanding/azure-ai-contentunderstanding/samples/async_samples/sample_get_result_file_async.py

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -9,18 +9,18 @@
99
1010
DESCRIPTION:
1111
This sample demonstrates how to retrieve result files (such as keyframe images) from a
12-
video analysis operation using the `get_result_file` API.
12+
video analysis operation using the get_result_file API.
1313
1414
About result files:
1515
When analyzing video content, the Content Understanding service can generate result files such as:
1616
- Keyframe images: Extracted frames from the video at specific timestamps
1717
- Other result files: Additional files generated during analysis
1818
19-
The `get_result_file` API allows you to retrieve these files using:
19+
The get_result_file API allows you to retrieve these files using:
2020
- Operation ID: Extracted from the analysis operation
2121
- File path: The path to the specific result file. In the recording, keyframes were accessed
22-
with paths like `keyframes/733` and `keyframes/9000`, following the
23-
`keyframes/{frameTimeMs}` pattern.
22+
with paths like keyframes/733 and keyframes/9000, following the
23+
keyframes/{frameTimeMs} pattern.
2424
2525
USAGE:
2626
python sample_get_result_file_async.py

0 commit comments

Comments
 (0)