Skip to content

Commit fa05d68

Browse files
authored
Add content extraction sample (#9)
1 parent 682af5e commit fa05d68

File tree

6 files changed

+290
-1
lines changed

6 files changed

+290
-1
lines changed
Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
{
2+
"scenario": "conversation",
3+
"description": "Sample audio transcript analyzer",
4+
"config": {
5+
"returnDetails": true,
6+
"locales": ["en-US"]
7+
},
8+
"fieldSchema": {}
9+
}
Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
{
2+
"description": "Sample document content analyzer",
3+
"scenario": "document",
4+
"fieldSchema": {}
5+
}

analyzer_templates/content_video.json

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,22 @@
1+
{
2+
"description": "Sample video content analyzer",
3+
"scenario": "videoShot",
4+
"config": {
5+
"returnDetails": true,
6+
"locales": [
7+
"en-US",
8+
"es-ES",
9+
"es-MX",
10+
"fr-FR",
11+
"hi-IN",
12+
"it-IT",
13+
"ja-JP",
14+
"ko-KR",
15+
"pt-BR",
16+
"zh-CN"
17+
],
18+
"enableFace": false
19+
},
20+
"fieldSchema": {
21+
}
22+
}

data/audio.wav

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
version https://git-lfs.github.com/spec/v1
2+
oid sha256:40a5ace7beda05bb1e152ff1561792f96f34b10e9c0648febe45a56764a56082
3+
size 5056978

infra/main.bicep

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -29,7 +29,7 @@ var principalType = empty(runningOnGitHub) ? 'User' : 'ServicePrincipal'
2929

3030
var uniqueId = toLower(uniqueString(subscription().id, environmentName, location))
3131
var resourcePrefix = '${environmentName}${uniqueId}'
32-
var tags = {
32+
var tags = {
3333
'azd-env-name': environmentName
3434
owner: 'azure-ai-sample'
3535
}

notebooks/content_extraction.ipynb

Lines changed: 250 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,250 @@
1+
{
2+
"cells": [
3+
{
4+
"cell_type": "markdown",
5+
"metadata": {},
6+
"source": [
7+
"# Extract Content from Your File"
8+
]
9+
},
10+
{
11+
"cell_type": "markdown",
12+
"metadata": {},
13+
"source": [
14+
"This notebook demonstrate you can use Content Understanding API to extract semantic content from multimodal files."
15+
]
16+
},
17+
{
18+
"cell_type": "markdown",
19+
"metadata": {},
20+
"source": [
21+
"## Prerequisites\n",
22+
"1. Ensure Azure AI service is configured following [steps](../README.md#configure-azure-ai-service-resource)\n",
23+
"2. Install the required packages to run the sample."
24+
]
25+
},
26+
{
27+
"cell_type": "code",
28+
"execution_count": null,
29+
"metadata": {},
30+
"outputs": [],
31+
"source": [
32+
"%pip install -r ../requirements.txt"
33+
]
34+
},
35+
{
36+
"cell_type": "markdown",
37+
"metadata": {},
38+
"source": [
39+
"## Create Azure AI Content Understanding Client\n",
40+
"\n",
41+
"> The [AzureContentUnderstandingClient](../python/content_understanding_client.py) is a utility class containing functions to interact with the Content Understanding API. Before the official release of the Content Understanding SDK, it can be regarded as a lightweight SDK.\n"
42+
]
43+
},
44+
{
45+
"cell_type": "code",
46+
"execution_count": null,
47+
"metadata": {},
48+
"outputs": [],
49+
"source": [
50+
"import logging\n",
51+
"import json\n",
52+
"import os\n",
53+
"import sys\n",
54+
"from dotenv import find_dotenv, load_dotenv\n",
55+
"\n",
56+
"load_dotenv(find_dotenv())\n",
57+
"logging.basicConfig(level=logging.INFO)\n",
58+
"\n",
59+
"AZURE_AI_ENDPOINT = os.getenv(\"AZURE_AI_ENDPOINT\")\n",
60+
"AZURE_AI_API_VERSION = os.getenv(\"AZURE_AI_API_VERSION\", \"2024-12-01-preview\")\n",
61+
"\n",
62+
"# Import utility package from python samples root directory\n",
63+
"py_samples_root_dir = os.path.abspath(os.path.join(os.getcwd(), \"..\"))\n",
64+
"sys.path.append(py_samples_root_dir)\n",
65+
"from python.content_understanding_client import AzureContentUnderstandingClient\n",
66+
"\n",
67+
"client = AzureContentUnderstandingClient(\n",
68+
" endpoint=AZURE_AI_ENDPOINT,\n",
69+
" api_version=AZURE_AI_API_VERSION,\n",
70+
" x_ms_useragent=\"azure-ai-content-understanding-python/content_extraction\",\n",
71+
")"
72+
]
73+
},
74+
{
75+
"cell_type": "markdown",
76+
"metadata": {},
77+
"source": [
78+
"## Document Content\n",
79+
"\n",
80+
"Content Understanding API is designed to extract all textual content from a specified document file. In addition to text extraction, it conducts a comprehensive layout analysis to identify and categorize tables and figures within the document. The output is then presented in a structured markdown format, ensuring clarity and ease of interpretation.\n",
81+
"\n"
82+
]
83+
},
84+
{
85+
"cell_type": "code",
86+
"execution_count": null,
87+
"metadata": {},
88+
"outputs": [
89+
{
90+
"name": "stderr",
91+
"output_type": "stream",
92+
"text": [
93+
"INFO:python.content_understanding_client:Analyzer field-extraction-sample-97d1d17d-29b6-4af1-9078-00650666fda1 create request accepted.\n",
94+
"INFO:python.content_understanding_client:Request result is ready after 0.00 seconds.\n",
95+
"INFO:python.content_understanding_client:Analyzing file ../data/purchase_order.jpg with analyzer: field-extraction-sample-97d1d17d-29b6-4af1-9078-00650666fda1\n",
96+
"INFO:python.content_understanding_client:Request cc64dcc4-0797-45d5-b18e-49c77b5b1122 in progress ...\n",
97+
"INFO:python.content_understanding_client:Request cc64dcc4-0797-45d5-b18e-49c77b5b1122 in progress ...\n",
98+
"INFO:python.content_understanding_client:Request result is ready after 4.37 seconds.\n"
99+
]
100+
},
101+
{
102+
"name": "stdout",
103+
"output_type": "stream",
104+
"text": [
105+
"{\n",
106+
" \"id\": \"cc64dcc4-0797-45d5-b18e-49c77b5b1122\",\n",
107+
" \"status\": \"Succeeded\",\n",
108+
" \"result\": {\n",
109+
" \"analyzerId\": \"field-extraction-sample-97d1d17d-29b6-4af1-9078-00650666fda1\",\n",
110+
" \"apiVersion\": \"2024-12-01-preview\",\n",
111+
" \"createdAt\": \"2024-12-10T06:50:03Z\",\n",
112+
" \"warnings\": [],\n",
113+
" \"contents\": [\n",
114+
" {\n",
115+
" \"markdown\": \"Purchase Order\\n\\n\\n# Hero Limited\\n\\nCompany Phone: 555-348-6512\\nWebsite: www.herolimited.com\\nEmail:\\[email protected]\\n\\nPurchase Order\\n\\nDated As: 12/20/2020\\nPurchase Order #: 948284\\n\\nShipped To\\n\\nVendor Name: Hillary Swank\\nCompany Name: Higgly Wiggly Books\\nAddress: 938 NE Burner Road\\nBoulder City, CO 92848\\nPhone: 938-294-2949\\n\\nShipped From\\n\\nName: Bernie Sanders\\nCompany Name: Jupiter Book Supply\\nAddress: 383 N Kinnick Road\\nSeattle, WA 38383\\n\\nPhone: 932-299-0292\\n\\n\\n<table>\\n<tr>\\n<th>Details</th>\\n<th>Quantity</th>\\n<th>Unit Price</th>\\n<th>Total</th>\\n</tr>\\n<tr>\\n<td>Bindings</td>\\n<td>20</td>\\n<td>1.00</td>\\n<td>20.00</td>\\n</tr>\\n<tr>\\n<td>Covers Small</td>\\n<td>20</td>\\n<td>1.00</td>\\n<td>20.00</td>\\n</tr>\\n<tr>\\n<td>Feather Bookmark</td>\\n<td>20</td>\\n<td>5.00</td>\\n<td>100.00</td>\\n</tr>\\n<tr>\\n<td>Copper Swirl Marker</td>\\n<td>20</td>\\n<td>5.00</td>\\n<td>100.00</td>\\n</tr>\\n</table>\\n\\n\\n<table>\\n<tr>\\n<td>SUBTOTAL</td>\\n<td>$140.00</td>\\n</tr>\\n<tr>\\n<td>TAX</td>\\n<td>$4.00</td>\\n</tr>\\n<tr>\\n<td>TOTAL</td>\\n<td>$144.00</td>\\n</tr>\\n</table>\\n\\n\\nBernie Sanders\\n\\nBernie Sanders\\nManager\\n\\nAdditional Notes:\\n\\nDo not Jostle Box. Unpack carefully. Enjoy.\\n\\nJupiter Book Supply will refund you 50% per book if returned within 60 days of reading and\\n\\noffer you 25% off you next total purchase.\\n\",\n",
116+
" \"kind\": \"document\",\n",
117+
" \"startPageNumber\": 1,\n",
118+
" \"endPageNumber\": 1,\n",
119+
" \"unit\": \"pixel\",\n",
120+
" \"pages\": [\n",
121+
" {\n",
122+
" \"pageNumber\": 1,\n",
123+
" \"angle\": 0.05652412,\n",
124+
" \"width\": 1700,\n",
125+
" \"height\": 2200\n",
126+
" }\n",
127+
" ]\n",
128+
" }\n",
129+
" ]\n",
130+
" }\n",
131+
"}\n"
132+
]
133+
}
134+
],
135+
"source": [
136+
"import uuid\n",
137+
"\n",
138+
"ANALYZER_ID = \"content-doc-sample-\" + str(uuid.uuid4())\n",
139+
"ANALYZER_TEMPLATE_FILE = '../analyzer_templates/content_document.json'\n",
140+
"ANALYZER_SAMPLE_FILE = '../data/purchase_order.jpg'\n",
141+
"\n",
142+
"# Create analyzer\n",
143+
"response = client.begin_create_analyzer(ANALYZER_ID, analyzer_template_path=ANALYZER_TEMPLATE_FILE)\n",
144+
"result = client.poll_result(response)\n",
145+
"\n",
146+
"# Analyzer file\n",
147+
"response = client.begin_analyze(ANALYZER_ID, file_location=ANALYZER_SAMPLE_FILE)\n",
148+
"result = client.poll_result(response)\n",
149+
"\n",
150+
"print(json.dumps(result, indent=2))"
151+
]
152+
},
153+
{
154+
"cell_type": "markdown",
155+
"metadata": {},
156+
"source": [
157+
"## Audio Content\n",
158+
"Our API output facilitates detailed analysis of spoken language, allowing developers to utilize the data for various applications, such as voice recognition, customer service analytics, and conversational AI. The structure of the output makes it easy to extract and analyze different components of the conversation for further processing or insights.\n",
159+
"\n",
160+
"1. Speaker Identification: Each phrase is attributed to a specific speaker (in this case, \"Speaker 2\"). This allows for clarity in conversations with multiple participants.\n",
161+
"1. Timing Information: Each transcription includes precise timing data:\n",
162+
" - startTimeMs: The time (in milliseconds) when the phrase begins.\n",
163+
" - endTimeMs: The time (in milliseconds) when the phrase ends.\n",
164+
" This information is crucial for applications like video subtitles, allowing synchronization between the audio and the text.\n",
165+
"1. Text Content: The actual spoken text is provided, which in this instance is \"Thank you for calling Woodgrove Travel.\" This is the main content of the transcription.\n",
166+
"1. Confidence Score: Each transcription phrase includes a confidence score (0.933 in this case), indicating the likelihood that the transcription is accurate. A higher score suggests greater reliability.\n",
167+
"1. Word-Level Breakdown: The transcription is further broken down into individual words, each with its own timing information. This allows for detailed analysis of speech patterns and can be useful for applications such as language processing or speech recognition improvement.\n",
168+
"1. Locale Specification: The locale is specified as \"en-US,\" indicating that the transcription is in American English. This is important for ensuring that the transcription algorithms account for regional dialects and pronunciations.\n"
169+
]
170+
},
171+
{
172+
"cell_type": "code",
173+
"execution_count": null,
174+
"metadata": {},
175+
"outputs": [],
176+
"source": [
177+
"ANALYZER_ID = \"content-audio-sample-\" + str(uuid.uuid4())\n",
178+
"ANALYZER_TEMPLATE_FILE = '../analyzer_templates/audio_transcript.json'\n",
179+
"ANALYZER_SAMPLE_FILE = '../data/audio.wav'\n",
180+
"\n",
181+
"# Create analyzer\n",
182+
"response = client.begin_create_analyzer(ANALYZER_ID, analyzer_template_path=ANALYZER_TEMPLATE_FILE)\n",
183+
"result = client.poll_result(response)\n",
184+
"\n",
185+
"# Analyzer file\n",
186+
"response = client.begin_analyze(ANALYZER_ID, file_location=ANALYZER_SAMPLE_FILE)\n",
187+
"result = client.poll_result(response)\n",
188+
"\n",
189+
"print(json.dumps(result, indent=2))"
190+
]
191+
},
192+
{
193+
"cell_type": "markdown",
194+
"metadata": {},
195+
"source": [
196+
"## Video Content\n",
197+
"Video output provides detailed information about audiovisual content, specifically video shots. Here are the key features it offers:\n",
198+
"\n",
199+
"1. Shot Information: Each shot is defined by a start and end time, along with a unique identifier. For example, Shot 0:0.0 to 0:2.800 includes a transcript and key frames.\n",
200+
"1. Transcript: The API includes a transcript of the audio, formatted in WEBVTT, which allows for easy synchronization with the video. It captures spoken content and specifies the timing of the dialogue.\n",
201+
"1. Key Frames: It provides a series of key frames (images) that represent important moments in the video shot, allowing users to visualize the content at specific timestamps.\n",
202+
"1. Description: Each shot is accompanied by a description, providing context about the visuals presented. This helps in understanding the scene or subject matter without watching the video.\n",
203+
"1. Audio Visual Metadata: Details about the video such as dimensions (width and height), type (audiovisual), and the presence of key frame timestamps are included.\n",
204+
"1. Transcript Phrases: The output includes specific phrases from the transcript, along with timing and speaker information, enhancing the usability for applications like closed captioning or search functionalities."
205+
]
206+
},
207+
{
208+
"cell_type": "code",
209+
"execution_count": null,
210+
"metadata": {},
211+
"outputs": [],
212+
"source": [
213+
"ANALYZER_ID = \"content-video-sample-\" + str(uuid.uuid4())\n",
214+
"ANALYZER_TEMPLATE_FILE = '../analyzer_templates/content_video.json'\n",
215+
"ANALYZER_SAMPLE_FILE = '../data/video.mp4'\n",
216+
"\n",
217+
"# Create analyzer\n",
218+
"response = client.begin_create_analyzer(ANALYZER_ID, analyzer_template_path=ANALYZER_TEMPLATE_FILE)\n",
219+
"result = client.poll_result(response)\n",
220+
"\n",
221+
"# Analyzer file\n",
222+
"response = client.begin_analyze(ANALYZER_ID, file_location=ANALYZER_SAMPLE_FILE)\n",
223+
"result = client.poll_result(response)\n",
224+
"\n",
225+
"print(json.dumps(result, indent=2))"
226+
]
227+
}
228+
],
229+
"metadata": {
230+
"kernelspec": {
231+
"display_name": "Python 3",
232+
"language": "python",
233+
"name": "python3"
234+
},
235+
"language_info": {
236+
"codemirror_mode": {
237+
"name": "ipython",
238+
"version": 3
239+
},
240+
"file_extension": ".py",
241+
"mimetype": "text/x-python",
242+
"name": "python",
243+
"nbconvert_exporter": "python",
244+
"pygments_lexer": "ipython3",
245+
"version": "3.8.10"
246+
}
247+
},
248+
"nbformat": 4,
249+
"nbformat_minor": 2
250+
}

0 commit comments

Comments
 (0)