Skip to content

Commit 2a4647d

Browse files
committed
add document updates
1 parent 8c0541d commit 2a4647d

File tree

1 file changed

+46
-46
lines changed
  • articles/ai-services/content-understanding/document

1 file changed

+46
-46
lines changed

articles/ai-services/content-understanding/document/markdown.md

Lines changed: 46 additions & 46 deletions
Original file line numberDiff line numberDiff line change
@@ -8,76 +8,76 @@ ms.service: azure-ai-content-understanding
88
ms.topic: conceptual
99
ms.date: 05/19/2025
1010
ms.author: paulhsu
11-
11+
1212
---
13-
13+
1414
# Document analysis: Markdown representation
15-
16-
Azure AI Content Understanding's document analysis capabilities help you transform unstructured document data into [GitHub Flavored Markdown](https://github.github.com/gfm), preserving the original content and layout for higher fidelity downstream applications and workflows. This document describes how each content and layout element is represented in markdown.
17-
15+
16+
Azure AI Content Understanding converts unstructured documents into [GitHub Flavored Markdown](https://github.github.com/gfm), while maintaining content and layout for accurate downstream use. This document describes how each content and layout element is represented in markdown.
17+
1818
## Words and selection marks
19-
20-
Recognized words and detected selection marks are represented in markdown as plain text. Content may be escaped to avoid ambiguity with markdown formatting syntax.
21-
19+
20+
Recognized words and detected selection marks are represented in markdown as plain text. Content may be escaped to avoid ambiguity with markdown formatting syntax.
21+
2222
## Barcodes
23-
23+
2424
Barcodes are represented as markdown images with alt text and title: `![alt text](url "title")`.
25-
25+
2626
| Content Type | Markdown Pattern | Example |
2727
| --- | --- | --- |
2828
| Barcode | `![{barcode.kind}]({barcode.path} "{barcode.value}")` | `![QRCode](barcodes/1.2 "https://www.microsoft.com")` |
29-
29+
3030
## Formulas
31-
31+
3232
Mathematical formulas are encoded using LaTeX in Markdown:
33-
33+
3434
* Inline formulas are enclosed in single dollar signs (`$...$`) to maintain text flow.
3535
* Display formulas use double dollar signs (`$$...$$`) for standalone display.
3636
* Multi-line formulas are represented as consecutive display formulas without intervening empty lines, preserving mathematical relationships.
37-
37+
3838
| Formula Kind | Markdown | Visualization |
3939
| --- | --- | --- |
4040
| Inline | `$\sqrt { -1 } $ is $i$` | $\sqrt { -1 } $ is $i$
4141
| Display | `$$a^2 + b^2 = c^2$$` | $a^2 + b^2 = c^2$ |
4242
| Multi-line | `$$( x + 2 ) ^ 2 = x ^ 2 + 4 x + 4$$`<br/>`$$= x ( x + 4 ) + 4$$` | $$( x + 2 ) ^ 2 = x ^ 2 + 4 x + 4$$ $$= x ( x + 4 ) + 4$$ |
43-
43+
4444
## Images
45-
46-
Detected images, including figures and charts, are currently represented using HTML `<figure>` elements in markdown that wrap the detected text in the image. Any caption is represented via an `<figcaption>` elements. Any associated footnotes appear as text immediately after the figure.
47-
45+
46+
Detected images, including figures and charts, are currently represented using HTML `<figure>` elements in markdown that wrap the detected text in the image. Any caption is represented via an `<figcaption>` elements. Any associated footnotes appear as text immediately after the figure.
47+
4848
``` md
4949
<figure>
5050
<figcaption>Figure 2: Example</figcaption>
51-
51+
5252
Values
5353
300
5454
200
5555
100
5656
0
57-
57+
5858
Jan Feb Mar Apr May Jun Months
59-
59+
6060
</figure>
61-
61+
6262
This is a footnote.
6363
```
64-
64+
6565
## Lines and paragraph
66-
66+
6767
Paragraphs are represented in markdown as a block of text separate by blank lines.
6868
When lines are available, each document line maps to a separate line in the markdown.
69-
69+
7070
## Sections
71-
72-
Paragraphs with title or section heading role are converted into markdown headings. Title, if any, is assigned heading level 1. The heading level of all other sections are assigned to preserve the detected hierarchical structure.
73-
71+
72+
Paragraphs with title or section heading role are converted into markdown headings. Title, if any, is assigned heading level 1. The heading levels of all other sections are assigned to preserve the detected hierarchical structure.
73+
7474
## Tables
75-
76-
Tables are currently represented in markdown using HTML table markup (`<table>`, `<tr>`, `<th>`, `<td>`) to enable support for merged cells via `rowspan` and `colspan` attributes and rich headers via `<th>`. Any caption is represented via an `<caption>` element. Any associated footnotes appear as text immediately after the table.
77-
75+
76+
Tables are currently represented in markdown using HTML table markup (`<table>`, `<tr>`, `<th>`, `<td>`) to enable support for merged cells via `rowspan` and `colspan` attributes and rich headers via `<th>`. Any caption is represented via an `<caption>` element. Any associated footnotes appear as text immediately after the table.
77+
7878
:::row:::
7979
:::column:::
80-
80+
8181
``` md
8282
<table>
8383
<caption>Table 1. Example</caption>
@@ -87,43 +87,43 @@ Tables are currently represented in markdown using HTML table markup (`<table>`,
8787
</table>
8888
This is a footnote.
8989
```
90-
90+
9191
:::column-end:::
9292
:::column:::
93-
93+
94+
```md
9495
<table>
9596
<caption>Table 1. Example</caption>
9697
<tr><th>Header A</th><th>Header B</th></tr>
9798
<tr><td>Cell 1A</td><td>Cell 1B</td></tr>
9899
<tr><td>Cell 2A</td><td>Cell 2B</td></tr>
99100
</table>
100101
This is a footnote.
101-
102+
```
102103
:::column-end:::
103104
:::row-end:::
104-
105+
105106
## Page metadata
106-
107-
Markdown does not natively encode page metadata, such as page numbers, headers, footers, and breaks.
107+
108+
Markdown doesn't natively encode page metadata, such as page numbers, headers, footers, and breaks.
108109
Since this information may be useful for downstream applications, we encode such metadata as HTML comments.
109-
110+
110111
| Metadata | Markdown |
111112
| --- | --- |
112113
| Page number | `<!-- PageNumber="1" -->` |
113114
| Page header | `<!-- PageHeader="Header" -->` |
114115
| Page footer | `<!-- PageNumber="Footer" -->` |
115116
| Page break | `<!-- PageBreak -->` |
116-
117+
117118
## Conclusion
118-
119+
119120
Content Understanding's Markdown elements provide a powerful way to represent the structure and content of analyzed documents. By understanding and properly utilizing these Markdown elements, you can enhance your document processing workflows and build more sophisticated content extraction applications.
120-
121+
121122
## Next steps
122-
123+
123124
* Try processing your document content using Content Understanding in [Azure AI Foundry](https://aka.ms/cu-landing).
124125
* Learn to analyze document content [**analyzer templates**](../quickstart/use-ai-foundry.md).
125126
* Review code samples: [**visual document search**](https://github.com/Azure-Samples/azure-ai-search-with-content-understanding-python/blob/main/notebooks/search_with_visual_document.ipynb).
126127
* Review code sample: [**analyzer templates**](https://github.com/Azure-Samples/azure-ai-content-understanding-python/tree/main/analyzer_templates).
127-
128-
129-
128+
129+

0 commit comments

Comments
 (0)