Skip to content

Commit d02c72f

Browse files
authored
Merge pull request #210221 from sanjeev3/main
Computer Vision and Form Recognizer OCR updates
2 parents 00fd435 + 14224a4 commit d02c72f

File tree

6 files changed

+175
-274
lines changed

6 files changed

+175
-274
lines changed

articles/applied-ai-services/form-recognizer/concept-layout.md

Lines changed: 79 additions & 141 deletions
Original file line numberDiff line numberDiff line change
@@ -98,36 +98,91 @@ Try extracting data from forms and documents using the Form Recognizer Studio. Y
9898

9999
The layout model extracts text, selection marks, tables, paragraphs, and paragraph types (`roles`) from your documents.
100100

101-
### Text lines and words
101+
### Paragraphs <sup>🆕</sup>
102102

103-
Layout API extracts print and handwritten style text as `lines` and `words`. The model outputs bounding `polygon` coordinates and `confidence` for the extracted words. The `styles` collection includes any handwritten style for lines, if detected, along with the spans pointing to the associated text. This feature applies to [supported handwritten languages](language-support.md).
103+
The Layout model extracts all identified blocks of text in the `paragraphs` collection as a top level object under `analyzeResults`. Each entry in this collection represents a text block and includes the extracted text as`content`and the bounding `polygon` coordinates. The `span` information points to the text fragment within the top level `content` property that contains the full text from the document.
104+
105+
```json
106+
"paragraphs": [
107+
{
108+
"spans": [],
109+
"boundingRegions": [],
110+
"content": "While healthcare is still in the early stages of its Al journey, we are seeing pharmaceutical and other life sciences organizations making major investments in Al and related technologies.\" TOM LAWRY | National Director for Al, Health and Life Sciences | Microsoft"
111+
}
112+
]
113+
```
114+
### Paragraph roles<sup> 🆕</sup>
115+
116+
The Layout model may flag certain paragraphs with their specialized type or `role` as predicted by the model. They're best used with unstructured documents to help understand the layout of the extracted content for a richer semantic analysis. The following paragraph roles are supported:
117+
118+
| **Predicted role** | **Description** |
119+
| --- | --- |
120+
| `title` | The main heading(s) in the page |
121+
| `sectionHeading` | One or more subheading(s) on the page |
122+
| `footnote` | Text near the bottom of the page |
123+
| `pageHeader` | Text near the top edge of the page |
124+
| `pageFooter` | Text near the bottom edge of the page |
125+
| `pageNumber` | Page number |
104126

105127
```json
106128
{
107-
"words": [
108-
{
109-
"content": "CONTOSO",
110-
"polygon": [
111-
76,
112-
30,
113-
118,
114-
32,
115-
118,
116-
43,
117-
76,
118-
43
119-
],
120-
"confidence": 1,
121-
"span": {
122-
"offset": 0,
123-
"length": 7
124-
}
125-
}
129+
"paragraphs": [
130+
{
131+
"spans": [],
132+
"boundingRegions": [],
133+
"role": "title",
134+
"content": "NEWS TODAY"
135+
},
136+
{
137+
"spans": [],
138+
"boundingRegions": [],
139+
"role": "sectionHeading",
140+
"content": "Mirjam Nilsson"
141+
}
126142
]
127143
}
128144

129145
```
146+
### Pages
147+
148+
The pages collection is the very first object you see in the service response.
130149

150+
```json
151+
"pages": [
152+
{
153+
"pageNumber": 1,
154+
"angle": 0,
155+
"width": 915,
156+
"height": 1190,
157+
"unit": "pixel",
158+
"words": [],
159+
"lines": [],
160+
"spans": [],
161+
"kind": "document"
162+
}
163+
]
164+
```
165+
### Text lines and words
166+
167+
Read extracts print and handwritten style text as `lines` and `words`. The model outputs bounding `polygon` coordinates and `confidence` for the extracted words. The `styles` collection includes any handwritten style for lines if detected along with the spans pointing to the associated text. This feature applies to [supported handwritten languages](language-support.md).
168+
169+
```json
170+
"words": [
171+
{
172+
"content": "While",
173+
"polygon": [],
174+
"confidence": 0.997,
175+
"span": {}
176+
},
177+
],
178+
"lines": [
179+
{
180+
"content": "While healthcare is still in the early stages of its Al journey, we",
181+
"polygon": [],
182+
"spans": [],
183+
}
184+
]
185+
```
131186
### Selection marks
132187

133188
Layout API also extracts selection marks from documents. Extracted selection marks appear within the `pages` collection for each page. They include the bounding `polygon`, `confidence`, and selection `state` (`selected/unselected`). Any associated text if extracted is also included as the starting index (`offset`) and `length` that references the top level `content` property that contains the full text from the document.
@@ -137,16 +192,7 @@ Layout API also extracts selection marks from documents. Extracted selection mar
137192
"selectionMarks": [
138193
{
139194
"state": "unselected",
140-
"polygon": [
141-
217,
142-
862,
143-
254,
144-
862,
145-
254,
146-
899,
147-
217,
148-
899
149-
],
195+
"polygon": [],
150196
"confidence": 0.995,
151197
"span": {
152198
"offset": 1421,
@@ -155,10 +201,7 @@ Layout API also extracts selection marks from documents. Extracted selection mar
155201
}
156202
]
157203
}
158-
159-
160204
```
161-
162205
### Tables and table headers
163206

164207
Layout API extracts tables in the `pageResults` section of the JSON output. Documents can be scanned, photographed, or digitized. Extracted table information includes the number of columns and rows, row span, and column span. Each cell with its bounding `polygon` is output along with information whether it's recognized as a `columnHeader` or not. The API also works with rotated tables. Each table cell contains the row and column index and bounding polygon coordinates. For the cell text, the model outputs the `span` information containing the starting index (`offset`). The model also outputs the `length` within the top level `content` that contains the full text from the document.
@@ -176,120 +219,15 @@ Layout API extracts tables in the `pageResults` section of the JSON output. Docu
176219
"columnIndex": 0,
177220
"columnSpan": 4,
178221
"content": "(In millions, except earnings per share)",
179-
"boundingRegions": [
180-
{
181-
"pageNumber": 1,
182-
"polygon": [
183-
36,
184-
184,
185-
843,
186-
183,
187-
843,
188-
209,
189-
36,
190-
207
191-
]
192-
}
193-
],
194-
"spans": [
195-
{
196-
"offset": 511,
197-
"length": 40
198-
}
199-
]
222+
"boundingRegions": [],
223+
"spans": []
200224
},
201225
]
202226
}
203-
.
204-
.
205-
.
206227
]
207228
}
208229

209230
```
210-
211-
### Paragraphs
212-
213-
The Layout model extracts all identified blocks of text in the `paragraphs` collection as a top level object under `analyzeResults`. Each entry in this collection represents a text block and includes the extracted text as`content`and the bounding `polygon` coordinates. The `span` information points to the text fragment within the top level `content` property that contains the full text from the document.
214-
215-
```json
216-
{
217-
"paragraphs": [
218-
{
219-
"spans": [
220-
{
221-
"offset": 0,
222-
"length": 21
223-
}
224-
],
225-
"boundingRegions": [
226-
{
227-
"pageNumber": 1,
228-
"polygon": [
229-
75,
230-
30,
231-
118,
232-
31,
233-
117,
234-
68,
235-
74,
236-
67
237-
]
238-
}
239-
],
240-
"content": "Tuesday, Sep 20, YYYY"
241-
}
242-
]
243-
}
244-
245-
```
246-
247-
### Paragraph roles
248-
249-
The Layout model may flag certain paragraphs with their specialized type or `role` as predicted by the model. They're best used with unstructured documents to help understand the layout of the extracted content for a richer semantic analysis. The following paragraph roles are supported:
250-
251-
| **Predicted role** | **Description** |
252-
| --- | --- |
253-
| `title` | The main heading(s) in the page |
254-
| `sectionHeading` | One or more subheading(s) on the page |
255-
| `footnote` | Text near the bottom of the page |
256-
| `pageHeader` | Text near the top edge of the page |
257-
| `pageFooter` | Text near the bottom edge of the page |
258-
| `pageNumber` | Page number |
259-
260-
```json
261-
{
262-
"paragraphs": [
263-
{
264-
"spans": [
265-
{
266-
"offset": 22,
267-
"length": 10
268-
}
269-
],
270-
"boundingRegions": [
271-
{
272-
"pageNumber": 1,
273-
"polygon": [
274-
139,
275-
10,
276-
605,
277-
8,
278-
605,
279-
56,
280-
139,
281-
58
282-
]
283-
}
284-
],
285-
"role": "title",
286-
"content": "NEWS TODAY"
287-
}
288-
]
289-
}
290-
291-
```
292-
293231
### Select page numbers or ranges for text extraction
294232

295233
For large multi-page documents, use the `pages` query parameter to indicate specific page numbers or page ranges for text extraction.

0 commit comments

Comments
 (0)