You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/ai-services/content-understanding/document/elements.md
+11-21Lines changed: 11 additions & 21 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -51,14 +51,14 @@ Content Understanding generates richly formatted markdown that preserves the ori
51
51
52
52
#### Words
53
53
54
-
A `word` is a content element composed of a sequence of characters. Content Understanding uses word boundaries defined by [Unicode Standard Annex #29](https://www.unicode.org/reports/tr29/#Word_Boundaries). For Latin languages, words may be split from punctuation even without intervening spaces. In some language, such as Chinese, supplemental word dictionaries are used to enable word breaking at semantic boundaries. For more information, *see*[Boundary Analysis](https://unicode-org.github.io/icu/userguide/boundaryanalysis/).
54
+
A `word` is a content element composed of a sequence of characters. Content Understanding uses word boundaries defined by [Unicode Standard Annex #29](https://www.unicode.org/reports/tr29/#Word_Boundaries). For Latin languages, words might be split from punctuation even without intervening spaces. In some language, such as Chinese, supplemental word dictionaries are used to enable word breaking at semantic boundaries. For more information, *see*[Boundary Analysis](https://unicode-org.github.io/icu/userguide/boundaryanalysis/).
55
55
56
56
57
57
:::image type="content" source="../media/document/word-boundaries.png" alt-text="Screenshot of detected words.":::
58
58
59
59
#### Selection marks
60
60
61
-
A `selection mark` is a content element that represents a visual glyph indicating the state of a selection. They may be represented as check boxes, check marks, radio buttons, etc. The state of a selection mark can be selected or unselected, with different visual representation to indicate the state. They're encoded as words in the document analysis result using `☒` (selected) and `☐` (unselected).
61
+
A `selection mark` is a content element that represents a visual glyph indicating the state of a selection. They might appear in the document as check boxes, check marks, radio buttons, etc. The state of a selection mark can be selected or unselected, with different visual representation to indicate the state. They're encoded as words in the document analysis result using `☒` (selected) and `☐` (unselected).
62
62
63
63
Content Understanding detects check marks inside table cell as selection marks in the selected state. However, it doesn't detect empty table cells as selection marks in the unselected state.
64
64
@@ -85,7 +85,7 @@ A `barcode` is a content element that describes both linear (ex. UPC, EAN) and 2
85
85
86
86
#### Formulas
87
87
88
-
A `formula` is a content element representing mathematical expressions in the document. It may be an `inline` formula embedded with other text, or an `display` formula that takes up an entire line. Multiline formulas are represented as multiple `display` formula elements grouped into `paragraphs` to preserve mathematical relationships.
88
+
A `formula` is a content element representing mathematical expressions in the document. It might be an `inline` formula embedded with other text, or an `display` formula that takes up an entire line. Multiline formulas are represented as multiple `display` formula elements grouped into `paragraphs` to preserve mathematical relationships.
89
89
90
90
#### Images
91
91
@@ -97,22 +97,22 @@ Document layout elements are visual and structural components, such as pages, ta
97
97
98
98
#### Pages
99
99
100
-
A `page` is a grouping of content that typically corresponds to one side of a sheet of paper. A rendered page is characterized via `width` and `height` in the specified `unit`. In general, images use pixel while PDFs use inch. The `angle` property describes the overall text angle in degrees for pages that may be rotated.
100
+
A `page` is a grouping of content that typically corresponds to one side of a sheet of paper. A rendered page is characterized via `width` and `height` in the specified `unit`. In general, images use pixel while PDFs use inch. The `angle` property describes the overall text angle in degrees for pages that might be rotated.
101
101
102
102
> [!NOTE]
103
103
> For spreadsheets like Excel, each sheet is mapped to a page. For presentations, like PowerPoint, each slide is mapped to a page. For file formats like HTML or Word documents, which lack a native page concept without rendering, the entire main content is treated as a single page.
104
104
105
105
#### Paragraphs
106
106
107
-
A `paragraph` is an ordered sequence of lines that form a logical unit. Typically, the lines share common alignment and spacing between lines. Paragraphs are often delimited via indentation, added spacing, or bullets/numbering. Some paragraphs may have special functional `role` in the document. Currently supported roles include page header, page footer, page number, title, section heading, footnote, and formula block.
107
+
A `paragraph` is an ordered sequence of lines that form a logical unit. Typically, the lines share common alignment and spacing between lines. Paragraphs are often delimited via indentation, added spacing, or bullets/numbering. Some paragraphs have special functional `role` in the document. Currently supported roles include page header, page footer, page number, title, section heading, footnote, and formula block.
108
108
109
109
#### Lines
110
110
111
111
A `line` is an ordered sequence of consecutive content elements, often separated by visual spaces. Content elements in the same horizontal plane (row) but separated by more than a single visual space are most often split into multiple lines. While this feature sometimes splits semantically contiguous content into separate lines, it enables the representation of textual content split into multiple columns or cells. Lines in vertical writing are detected in the vertical direction.
112
112
113
113
#### Tables
114
114
115
-
A `table` organizes content into a group of cells in a grid layout. The rows and columns may be visually separated by grid lines, color banding, or greater spacing. The position of a table cell is specified via its row and column indices. A cell can span across multiple rows and columns.
115
+
A `table` organizes content into a group of cells in a grid layout. The rows and columns might be visually separated by grid lines, color banding, or greater spacing. The position of a table cell is specified via its row and column indices. A cell can span across multiple rows and columns.
116
116
117
117
Based on its position and styling, a cell can be classified as general content, row header, column header, stub head, or description:
118
118
@@ -128,7 +128,7 @@ Based on its position and styling, a cell can be classified as general content,
128
128
129
129
A table caption specifies content that explains the table. A table can further have a set of footnotes. Unlike a description cell, a caption typically lies outside the grid layout. Table footnotes annotate content inside the table, often marked with footnote symbols. They're often found below the table grid.
130
130
131
-
A table may span across consecutive pages of a document. In this situation, table continuations in subsequent pages generally maintain the same column count, width, and styling. They often repeat the column headers. Other than page headers, footers, and page numbers, there's generally no intervening content between the initial table and its continuations.
131
+
A table might span across consecutive pages of a document. In this situation, table continuations in subsequent pages generally maintain the same column count, width, and styling. They often repeat the column headers. Other than page headers, footers, and page numbers, there's generally no intervening content between the initial table and its continuations.
132
132
133
133
> [!NOTE]
134
134
> The span for tables covers only the core content and exclude associated caption and footnotes.
@@ -137,7 +137,7 @@ A table may span across consecutive pages of a document. In this situation, tabl
137
137
138
138
#### Sections
139
139
140
-
A `section` is a logical grouping of related content elements that form a hierarchical structure within the document. It often starts with a section heading as the first paragraph. A section may contain subsections, creating a nested document structure that preserves semantic relationships.
140
+
A `section` is a logical grouping of related content elements that form a hierarchical structure within the document. It often starts with a section heading as the first paragraph. A section might contain subsections, creating a nested document structure that preserves semantic relationships.
141
141
142
142
### Element properties
143
143
@@ -149,26 +149,16 @@ The `span` property specifies the logical position of the element in the documen
149
149
150
150
#### Source
151
151
152
-
The `source` property describes the visual position of the element in the file using an encoded string. For documents, the source string may be in one of the following formats:
152
+
The `source` property describes the visual position of the element in the file using an encoded string. For documents, the source string can be in one of the following formats:
Page numbers are `1-indexed`. The bounding polygon describes a sequence of points, clockwise from the left relative to the natural orientation of the element. For quadrilaterals, the points represent the top-left, top-right, bottom-right, and bottom-left corners. Each point represents the **x**, **y** coordinate in the length unit specified by the `unit` property. In general, the unit of measure for images is pixels while PDFs use inches.
156
+
Page numbers are 1-indexed. The bounding polygon describes a sequence of points, clockwise from the left relative to the natural orientation of the element. For quadrilaterals, the points represent the top-left, top-right, bottom-right, and bottom-left corners. Each point represents the **x**, **y** coordinate in the length unit specified by the `unit` property. In general, the unit of measure for images is pixels while PDFs use inches.
157
157
158
158
:::image type="content" source="../media/document/bounding-regions.png" alt-text="Screenshot of detected bounding regions.":::
159
159
160
160
> [!NOTE]
161
-
> Currently, Content Understanding only returns `4-point` quadrilaterals as bounding polygons. Future versions may return different number of points to describe more complex shapes, such as curved lines or nonrectangular images. Currently, source is only returned for elements from rendered files (pdf/image).
162
-
163
-
## Supported content and layout elements
164
-
165
-
Different file formats support different subsets of content and layout elements. The following table lists the currently supported elements for each file type.
> Currently, Content Understanding only returns 4-point quadrilaterals as bounding polygons. Future versions might return different number of points to describe more complex shapes, such as curved lines or nonrectangular images. Currently, source is only returned for elements from rendered files (pdf/image).
Copy file name to clipboardExpand all lines: articles/ai-services/content-understanding/language-region-support.md
+4Lines changed: 4 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -27,6 +27,10 @@ To use Azure AI Content Understanding, create your Azure AI Service resource in
27
27
28
28
† Australia East doesn't support data zone as a processing location.
29
29
30
+
> [!NOTE]
31
+
>
32
+
> [Pro mode](concepts/standard-pro-modes.md) currently only supports data zone and global as processing location.
33
+
30
34
## Language support
31
35
32
36
Azure AI Content Understanding enables you to process data in multiple languages simultaneously. Our language support capabilities enable users to communicate with your applications in natural ways and empower global outreach.
0 commit comments