You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Azure AI Content Understanding's document analysis capabilities help you transform unstructured document data into [GitHub Flavored Markdown](https://github.github.com/gfm), preserving the original content and layout for higher fidelity downstream applications and workflows. This document describes how each content and layout element is represented in markdown.
17
-
15
+
16
+
Azure AI Content Understanding converts unstructured documents into [GitHub Flavored Markdown](https://github.github.com/gfm), while maintaining content and layout for accurate downstream use. This document describes how each content and layout element is represented in markdown.
17
+
18
18
## Words and selection marks
19
-
20
-
Recognized words and detected selection marks are represented in markdown as plain text. Content may be escaped to avoid ambiguity with markdown formatting syntax.
21
-
19
+
20
+
Recognized words and detected selection marks are represented in markdown as plain text. Content may be escaped to avoid ambiguity with markdown formatting syntax.
21
+
22
22
## Barcodes
23
-
23
+
24
24
Barcodes are represented as markdown images with alt text and title: ``.
| Multi-line |`$$( x + 2 ) ^ 2 = x ^ 2 + 4 x + 4$$`<br/>`$$= x ( x + 4 ) + 4$$`|$$( x + 2 ) ^ 2 = x ^ 2 + 4 x + 4$$$$= x ( x + 4 ) + 4$$|
43
-
43
+
44
44
## Images
45
-
46
-
Detected images, including figures and charts, are currently represented using HTML `<figure>` elements in markdown that wrap the detected text in the image. Any caption is represented via an `<figcaption>` elements. Any associated footnotes appear as text immediately after the figure.
47
-
45
+
46
+
Detected images, including figures and charts, are currently represented using HTML `<figure>` elements in markdown that wrap the detected text in the image. Any caption is represented via an `<figcaption>` elements. Any associated footnotes appear as text immediately after the figure.
47
+
48
48
```md
49
49
<figure>
50
50
<figcaption>Figure 2: Example</figcaption>
51
-
51
+
52
52
Values
53
53
300
54
54
200
55
55
100
56
56
0
57
-
57
+
58
58
Jan Feb Mar Apr May Jun Months
59
-
59
+
60
60
</figure>
61
-
61
+
62
62
This is a footnote.
63
63
```
64
-
64
+
65
65
## Lines and paragraph
66
-
66
+
67
67
Paragraphs are represented in markdown as a block of text separate by blank lines.
68
68
When lines are available, each document line maps to a separate line in the markdown.
69
-
69
+
70
70
## Sections
71
-
72
-
Paragraphs with title or section heading role are converted into markdown headings. Title, if any, is assigned heading level 1. The heading level of all other sections are assigned to preserve the detected hierarchical structure.
73
-
71
+
72
+
Paragraphs with title or section heading role are converted into markdown headings. Title, if any, is assigned heading level 1. The heading levels of all other sections are assigned to preserve the detected hierarchical structure.
73
+
74
74
## Tables
75
-
76
-
Tables are currently represented in markdown using HTML table markup (`<table>`, `<tr>`, `<th>`, `<td>`) to enable support for merged cells via `rowspan` and `colspan` attributes and rich headers via `<th>`. Any caption is represented via an `<caption>` element. Any associated footnotes appear as text immediately after the table.
77
-
75
+
76
+
Tables are currently represented in markdown using HTML table markup (`<table>`, `<tr>`, `<th>`, `<td>`) to enable support for merged cells via `rowspan` and `colspan` attributes and rich headers via `<th>`. Any caption is represented via an `<caption>` element. Any associated footnotes appear as text immediately after the table.
77
+
78
78
:::row:::
79
79
:::column:::
80
-
80
+
81
81
```md
82
82
<table>
83
83
<caption>Table 1. Example</caption>
@@ -87,43 +87,43 @@ Tables are currently represented in markdown using HTML table markup (`<table>`,
87
87
</table>
88
88
This is a footnote.
89
89
```
90
-
90
+
91
91
:::column-end:::
92
92
:::column:::
93
-
93
+
94
+
```md
94
95
<table>
95
96
<caption>Table 1. Example</caption>
96
97
<tr><th>Header A</th><th>Header B</th></tr>
97
98
<tr><td>Cell 1A</td><td>Cell 1B</td></tr>
98
99
<tr><td>Cell 2A</td><td>Cell 2B</td></tr>
99
100
</table>
100
101
This is a footnote.
101
-
102
+
```
102
103
:::column-end:::
103
104
:::row-end:::
104
-
105
+
105
106
## Page metadata
106
-
107
-
Markdown does not natively encode page metadata, such as page numbers, headers, footers, and breaks.
107
+
108
+
Markdown doesn't natively encode page metadata, such as page numbers, headers, footers, and breaks.
108
109
Since this information may be useful for downstream applications, we encode such metadata as HTML comments.
109
-
110
+
110
111
| Metadata | Markdown |
111
112
| --- | --- |
112
113
| Page number |`<!-- PageNumber="1" -->`|
113
114
| Page header |`<!-- PageHeader="Header" -->`|
114
115
| Page footer |`<!-- PageNumber="Footer" -->`|
115
116
| Page break |`<!-- PageBreak -->`|
116
-
117
+
117
118
## Conclusion
118
-
119
+
119
120
Content Understanding's Markdown elements provide a powerful way to represent the structure and content of analyzed documents. By understanding and properly utilizing these Markdown elements, you can enhance your document processing workflows and build more sophisticated content extraction applications.
120
-
121
+
121
122
## Next steps
122
-
123
+
123
124
* Try processing your document content using Content Understanding in [Azure AI Foundry](https://aka.ms/cu-landing).
124
125
* Learn to analyze document content [**analyzer templates**](../quickstart/use-ai-foundry.md).
0 commit comments