Skip to content

Commit 61bf1b6

Browse files
committed
Updated to reflect Paul's feedback on the elements docs of the content understanding service.
1 parent 5c08956 commit 61bf1b6

File tree

2 files changed

+191
-155
lines changed

2 files changed

+191
-155
lines changed

articles/ai-services/content-understanding/document/elements.md

Lines changed: 159 additions & 116 deletions
Original file line numberDiff line numberDiff line change
@@ -22,11 +22,18 @@ ms.custom:
2222
2323
## Overview
2424

25-
Azure AI Content Understanding's document analysis capabilities help you transform unstructured document data into structured, machine-readable information. By precisely identifying and extracting document elements while preserving their structural relationships, you can build powerful document processing workflows for a wide range of applications.
25+
Azure AI Content Understanding's analysis capabilities help you transform unstructured data into structured, machine-readable information. By precisely identifying and extracting elements while preserving their structural relationships, you can build powerful processing workflows for a wide range of applications.
2626

27-
This article explains the document analysis features that enable you to extract meaningful content from your documents, preserve document structures, and unlock the full potential of your document data.
27+
The `contents` object with kind "document" supports output for a range of different input files including document, image, text, and structured file. These outputs enable you to extract meaningful content from your files, preserve document structures, and unlock the full potential of your data.
2828

29-
This document provides examples for **document file types** including `.pdf`, `.tiff`, `.jpg`, `.png`, `.bmp`, `.heif`, `.docx`, `.xlsx`, `.pptx`, `.txt`, `.html`, `.md`, `.rtf`, `.eml`, `.msg`, and `.xml` files. For complete details about supported file types, file size limits, and other constraints, see [service quotas and limits](../service-limits.md#analyzers).
29+
**Document content kind** includes output for input files like:
30+
- **Documents**: PDFs, Word documents, PowerPoint presentations, Excel spreadsheets
31+
- **Figures**: Photos, scanned documents, charts, diagrams
32+
- **Text files**: Plain text, HTML, Markdown, RTF
33+
- **Structured content**: XML, JSON, CSV, TSV files
34+
- **Email**: EML and MSG message formats
35+
36+
For complete details about supported file types, file size limits, and other constraints, see [service quotas and limits](../service-limits.md#analyzers).
3037

3138
## JSON response structure
3239

@@ -92,12 +99,16 @@ A `word` is a content element composed of a sequence of characters. Content Unde
9299
**JSON example:**
93100
```json
94101
{
95-
"content": "Example",
96-
"span": {
97-
"length": 7
98-
},
99-
"confidence": 0.992,
100-
"source": "D(1,1.265,1.0836,2.4972,1.0816,2.4964,1.4117,1.2645,1.4117)"
102+
"words": [
103+
{
104+
"content": "Example",
105+
"span": {
106+
"length": 7
107+
},
108+
"confidence": 0.992,
109+
"source": "D(1,1.265,1.0836,2.4972,1.0816,2.4964,1.4117,1.2645,1.4117)"
110+
}
111+
]
101112
}
102113
```
103114

@@ -112,43 +123,57 @@ Content Understanding detects check marks inside table cell as selection marks i
112123
**JSON example:**
113124
```json
114125
{
115-
"content": "",
116-
"span": {
117-
"length": 1
118-
},
119-
"confidence": 0.983,
120-
"source": "D(1,1.258,2.7952,1.3705,2.7949,1.371,2.9098,1.2575,2.9089)"
126+
"words": [
127+
{
128+
"content": "",
129+
"span": {
130+
"length": 1
131+
},
132+
"confidence": 0.983,
133+
"source": "D(1,1.258,2.7952,1.3705,2.7949,1.371,2.9098,1.2575,2.9089)"
134+
}
135+
]
121136
}
122137
```
123138

124139
:::image type="content" source="../media/document/selection-marks.png" alt-text="Screenshot of detected selection marks.":::
125140

126141
#### Barcodes
127142

128-
A `barcode` is a content element that describes both linear (ex. UPC, EAN) and 2D (ex. QR, MaxiCode) barcodes. Content Understanding represents barcodes using its detected type and extracted value. The following barcode formats are currently accepted:
129-
130-
* `QR Code`
131-
* `Code 39`
132-
* `Code 93`
133-
* `Code 128`
134-
* `UPC (UPC-A & UPC-E)`
135-
* `PDF417`
136-
* `EAN-8`
137-
* `EAN-13`
138-
* `Codabar`
139-
* `Databar`
140-
* `Databar (expanded)`
141-
* `ITF`
142-
* `Data Matrix`
143+
A `barcode` is a content element that describes both linear (ex. UPC, EAN) and 2D (ex. QR, MaxiCode) barcodes. Content Understanding represents barcodes using its detected type and extracted value. The following barcode formats are currently supported:
144+
145+
| Barcode Type | Description |
146+
|--------------|-------------|
147+
| `QRCode` | QR code, as defined in ISO/IEC 18004:2015 |
148+
| `PDF417` | PDF417, as defined in ISO 15438 |
149+
| `UPCA` | GS1 12-digit Universal Product Code |
150+
| `UPCE` | GS1 6-digit Universal Product Code |
151+
| `Code39` | Code 39 barcode, as defined in ISO/IEC 16388:2007 |
152+
| `Code128` | Code 128 barcode, as defined in ISO/IEC 15417:2007 |
153+
| `EAN8` | GS1 8-digit International Article Number (European Article Number) |
154+
| `EAN13` | GS1 13-digit International Article Number (European Article Number) |
155+
| `DataBar` | GS1 DataBar barcode |
156+
| `Code93` | Code 93 barcode, as defined in ANSI/AIM BC5-1995 |
157+
| `Codabar` | Codabar barcode, as defined in ANSI/AIM BC3-1995 |
158+
| `DataBarExpanded` | GS1 DataBar Expanded barcode |
159+
| `ITF` | "Interleaved 2 of 5 barcode (ITF)" as defined in ANSI/AIM BC2-1995 |
160+
| `MicroQRCode` | Micro QR code, as defined in ISO/IEC 23941:2022 |
161+
| `Aztec` | Aztec code, as defined in ISO/IEC 24778:2008 |
162+
| `DataMatrix` | Data matrix code, as defined in ISO/IEC 16022:2006 |
163+
| `MaxiCode` | MaxiCode, as defined in ISO/IEC 16023:2000 |
143164

144165
**JSON example:**
145166
```json
146167
{
147-
"kind": "code39",
148-
"value": "Hello World",
149-
"source": "D(1,2.5738,4.8186,3.8617,4.8153,3.8621,4.9894,2.5743,4.9928)",
150-
"span": {"offset": 192, "length": 10 }
151-
"confidence": 0.977
168+
"barcodes": [
169+
{
170+
"kind": "Code39",
171+
"value": "Hello World",
172+
"source": "D(1,2.5738,4.8186,3.8617,4.8153,3.8621,4.9894,2.5743,4.9928)",
173+
"span": {"offset": 192, "length": 10 },
174+
"confidence": 0.977
175+
}
176+
]
152177
}
153178
```
154179

@@ -159,31 +184,39 @@ A `formula` is a content element representing mathematical expressions in the do
159184
**JSON example:**
160185
```json
161186
{
162-
"confidence": 0.708,
163-
"source": "D(1,3.4282,7.0195,4.0452,7.0307,4.0425,7.1803,3.4255,7.1691)",
164-
"span": {
165-
"offset": 394,
166-
"length": 51
167-
}
187+
"formulas": [
188+
{
189+
"confidence": 0.708,
190+
"source": "D(1,3.4282,7.0195,4.0452,7.0307,4.0425,7.1803,3.4255,7.1691)",
191+
"span": {
192+
"offset": 394,
193+
"length": 51
194+
}
195+
}
196+
]
168197
}
169198
```
170199

171-
#### Images
200+
#### Figures
172201

173-
An `image` is a content element that represents an embedded image, figure, or chart in the document. Content Understanding extracts any embedded text from the images, and any associated captions and footnotes.
202+
An `figure` is a content element that represents an embedded image, figure, or chart in the document. Content Understanding extracts any embedded text from the images, and any associated captions and footnotes.
174203

175204
**JSON example:**
176205
```json
177206
{
178-
"source": "D(2,1.3465,1.8481,3.4788,1.8484,3.4779,3.8286,1.3456,3.8282)",
179-
"span": {
180-
"offset": 658,
181-
"length": 42
182-
},
183-
"elements": [
184-
"/paragraphs/14"
185-
],
186-
"id": "2.1"
207+
"figures": [
208+
{
209+
"source": "D(2,1.3465,1.8481,3.4788,1.8484,3.4779,3.8286,1.3456,3.8282)",
210+
"span": {
211+
"offset": 658,
212+
"length": 42
213+
},
214+
"elements": [
215+
"/paragraphs/14"
216+
],
217+
"id": "2.1"
218+
}
219+
]
187220
}
188221
```
189222

@@ -201,30 +234,24 @@ A `page` is a grouping of content that typically corresponds to one side of a sh
201234
**JSON example:**
202235
```json
203236
{
204-
"pageNumber": 1,
205-
"angle": 0.0739153,
206-
"width": 8.5,
207-
"height": 11,
208-
"spans": [
209-
{
210-
"offset": 0,
211-
"length": 620
212-
} ],
213-
"words": [ /* array of word objects */ ],
214-
"barcodes": [
237+
"pages": [
215238
{
216-
"kind": "qrCode",
217-
"value": "Hello World",
218-
"source": "D(1,2.5738,4.8186,3.8617,4.8153,3.8621,4.9894,2.5743,4.9928)",
219-
"span": {
220-
"offset": 192,
221-
"length": 10
222-
},
223-
"confidence": 0.977
239+
"pageNumber": 1,
240+
"angle": 0.0739153,
241+
"width": 8.5,
242+
"height": 11,
243+
"spans": [
244+
{
245+
"offset": 0,
246+
"length": 620
247+
}
248+
],
249+
"words": [ /* array of word objects */ ],
250+
"barcodes": [ /* details of barcodes */ ],
251+
"lines": [ /* array of line objects */ ],
252+
"formulas": [ /* array of formula objects */ ]
224253
}
225-
],
226-
"lines": [ /* array of line objects */ ],
227-
"formulas": [ /* array of formula objects */ ]
254+
]
228255
}
229256
```
230257

@@ -235,13 +262,17 @@ A `paragraph` is an ordered sequence of lines that form a logical unit. Typicall
235262
**JSON example:**
236263
```json
237264
{
238-
"role": "title",
239-
"content": "Example Document",
240-
"source": "D(1,1.264,1.0836,4.1584,1.0795,4.1589,1.4083,1.2644,1.4124)",
241-
"span": {
242-
"offset": 0,
243-
"length": 18
244-
}
265+
"paragraphs": [
266+
{
267+
"role": "title",
268+
"content": "Example Document",
269+
"source": "D(1,1.264,1.0836,4.1584,1.0795,4.1589,1.4083,1.2644,1.4124)",
270+
"span": {
271+
"offset": 0,
272+
"length": 18
273+
}
274+
}
275+
]
245276
}
246277
```
247278

@@ -252,12 +283,16 @@ A `line` is an ordered sequence of consecutive content elements, often separated
252283
**JSON example:**
253284
```json
254285
{
255-
"content": "Example Document",
256-
"source": "D(1,1.264,1.0836,4.1583,1.0795,4.1589,1.4083,1.2645,1.4117)",
257-
"span": {
258-
"offset": 0,
259-
"length": 16
260-
}
286+
"lines": [
287+
{
288+
"content": "Example Document",
289+
"source": "D(1,1.264,1.0836,4.1583,1.0795,4.1589,1.4083,1.2645,1.4117)",
290+
"span": {
291+
"offset": 0,
292+
"length": 16
293+
}
294+
}
295+
]
261296
}
262297
```
263298

@@ -287,28 +322,32 @@ A table might span across consecutive pages of a document. In this situation, ta
287322
**JSON example:**
288323
```json
289324
{
290-
"rowCount": 6,
291-
"columnCount": 2,
292-
"cells": [
325+
"tables": [
293326
{
294-
"kind": "columnHeader",
295-
"rowIndex": 0,
296-
"columnIndex": 0,
297-
"rowSpan": 1,
298-
"columnSpan": 1,
299-
"content": "Category",
300-
"source": "D(2,1.1674,5.0483,4.1733,5.0546,4.1733,5.2358,1.1674,5.2358)",
327+
"rowCount": 6,
328+
"columnCount": 2,
329+
"cells": [
330+
{
331+
"kind": "columnHeader",
332+
"rowIndex": 0,
333+
"columnIndex": 0,
334+
"rowSpan": 1,
335+
"columnSpan": 1,
336+
"content": "Category",
337+
"source": "D(2,1.1674,5.0483,4.1733,5.0546,4.1733,5.2358,1.1674,5.2358)",
338+
"span": {
339+
"offset": 798,
340+
"length": 8
341+
}
342+
}
343+
],
344+
"source": "D(2,1.1566,5.0425,7.1855,5.0428,7.1862,6.1853,1.1574,6.1858)",
301345
"span": {
302-
"offset": 798,
303-
"length": 8
346+
"offset": 781,
347+
"length": 280
304348
}
305349
}
306-
],
307-
"source": "D(2,1.1566,5.0425,7.1855,5.0428,7.1862,6.1853,1.1574,6.1858)",
308-
"span": {
309-
"offset": 781,
310-
"length": 280
311-
}
350+
]
312351
}
313352
```
314353

@@ -321,13 +360,17 @@ A `section` is a logical grouping of related content elements that form a hierar
321360
**JSON example:**
322361
```json
323362
{
324-
"span": {
325-
"offset": 113,
326-
"length": 77
327-
},
328-
"elements": [
329-
"/paragraphs/3",
330-
"/paragraphs/4"
363+
"sections": [
364+
{
365+
"span": {
366+
"offset": 113,
367+
"length": 77
368+
},
369+
"elements": [
370+
"/paragraphs/3",
371+
"/paragraphs/4"
372+
]
373+
}
331374
]
332375
}
333376
```
@@ -346,23 +389,23 @@ The `source` property describes the visual position of the element in the file u
346389
* Bounding polygon: `D({pageNumber},{x1},{y1},{x2},{y2},{x3},{y3},{x4},{y4})`
347390
* Axis-aligned bounding box: `D({pageNumber},{left},{top},{width},{height})`
348391

349-
Page numbers are 1-indexed. The bounding polygon describes a sequence of points, clockwise from the left relative to the natural orientation of the element. For quadrilaterals, the points represent the top-left, top-right, bottom-right, and bottom-left corners. Each point represents the **x**, **y** coordinate in the length unit specified by the `unit` property. In general, the unit of measure for images is pixels while PDFs use inches.
392+
Page numbers are one indexed. The bounding polygon describes a sequence of points, clockwise from the left relative to the natural orientation of the element. For quadrilaterals, the points represent the top-left, top-right, bottom-right, and bottom-left corners. Each point represents the **x**, **y** coordinate in the length unit specified by the `unit` property. In general, the unit of measure for images is pixels while PDFs use inches.
350393

351394
:::image type="content" source="../media/document/bounding-regions.png" alt-text="Screenshot of detected bounding regions.":::
352395

353396
> [!NOTE]
354-
> Currently, Content Understanding only returns 4-point quadrilaterals as bounding polygons. Future versions might return different number of points to describe more complex shapes, such as curved lines or nonrectangular images. Currently, source is only returned for elements from rendered files (pdf/image).
397+
> Currently, Content Understanding only returns a four-point quadrilateral as bounding polygons. Future versions might return different number of points to describe more complex shapes, such as curved lines or nonrectangular images. Currently, source is only returned for elements from rendered files (pdf/image).
355398
356399
## Next steps
357400

358401
* Try processing your document content using Content Understanding in [Azure AI Foundry](https://aka.ms/cu-landing).
359-
* Learn to analyze document content [**analyzer templates**](../quickstart/use-ai-foundry.md).
402+
* Learn to analyze document content with [**analyzer templates**](../quickstart/use-ai-foundry.md).
360403
* Review code samples: [**visual document search**](https://github.com/Azure-Samples/azure-ai-search-with-content-understanding-python/blob/main/notebooks/search_with_visual_document.ipynb).
361404
* Review code sample: [**analyzer templates**](https://github.com/Azure-Samples/azure-ai-content-understanding-python/tree/main/analyzer_templates).
362405

363406
## Complete JSON example
364407

365-
The following example shows the complete JSON response structure from analyzing a document. This represents the full output from Content Understanding when processing a PDF document with multiple element types:
408+
The following example shows the complete JSON response structure from analyzing a document. This JSON represents the full output from Content Understanding when processing a PDF document with multiple element types:
366409

367410
:::image type="content" source="../media/document/demo-pdf-screenshot.png" alt-text="Screenshot of the demo PDF document showing example content including checkboxes, barcodes, formulas, images, and tables.":::
368411

0 commit comments

Comments
 (0)