Skip to content

Commit 8fb0591

Browse files
committed
address pre-review feedback
1 parent 10726e6 commit 8fb0591

13 files changed

+87
-80
lines changed

articles/ai-services/document-intelligence/concept-accuracy-confidence.md

Lines changed: 0 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -71,44 +71,30 @@ With the addition of table, row and cell confidence with the ```2024-02-29-previ
7171

7272
**A:** Yes. The different levels of table confidence (cell, row, and table) are meant to capture the correctness of a prediction at that specific level. A correctly predicted cell that belongs to a row with other possible misses would have high cell confidence, but the row's confidence should be low. Similarly, a correct row in a table with challenges with other rows would have high row confidence whereas the table's overall confidence would be low.
7373

74-
---
75-
7674
**Q:** What is the expected confidence score when cells are merged? Since a merge results in the number of columns identified to change, how are scores affected?<br>
7775

7876
**A:** Regardless of the type of table, the expectation for merged cells is that they should have lower confidence values. Furthermore, the cell that is missing (because it was merged with an adjacent cell) should have `NULL` value with lower confidence as well. How much lower these values might be depends on the training dataset, the general trend of both merged and missing cell having lower scores should hold.
7977

80-
---
81-
8278
**Q:** What is the confidence score when a value is optional? Should you expect a cell with a ``NULL`` value and high confidence score if the value is missing?<br>
8379

8480
**A:** If your training dataset is representative of the optionality of cells, it helps the model know how often a value tends to appear in the training set, and thus what to expect during inference. This feature is used when computing the confidence of either a prediction or of making no prediction at all (`NULL`). You should expect an empty field with high confidence for missing values that are mostly empty in the training set too.
8581

86-
---
87-
8882
**Q:** How are confidence scores affected if a field is optional and not present or missed? Is the expectation that the confidence score answers that question?<br>
8983

9084
**A:** When a value is missing from a row, the cell has a `NULL` value and confidence assigned. A high confidence score here should mean that the model prediction (of there not being a value) is more likely to be correct. In contrast, a low score should signal more uncertainty from the model (and thus the possibility of an error, like the value being missed).
9185

92-
---
93-
9486
**Q:** What should be the expectation for cell confidence and row confidence when extracting a multi-page table with a row split across pages?<br>
9587

9688
**A:** Expect the cell confidence to be high and row confidence to be potentially lower than rows that aren't split. The proportion of split rows in the training data set can affect the confidence score. In general, a split row looks different than the other rows in the table (thus, the model is less certain that it's correct).
9789

98-
---
99-
10090
**Q:** For cross-page tables with rows that cleanly end and start at the page boundaries, is it correct to assume that confidence scores are consistent across pages?
10191

10292
**A:** Yes. Since rows look similar in shape and contents, regardless of where they are in the document (or in which page), their respective confidence scores should be consistent.
10393

104-
---
105-
10694
**Q:** What is the best way to utilize the new confidence scores?<br>
10795

10896
**A:** Look at all levels of table confidence starting in a top-to-bottom approach: begin by checking a table's confidence as a whole, then drill down to the row level and look at individual rows, finally look at cell-level confidences. Depending on the type of table, there are a couple of things of note:
10997

110-
---
111-
11298
For **fixed tables**, cell-level confidence already captures quite a bit of information on the correctness of things. This means that simply going over each cell and looking at its confidence can be enough to help determine the quality of the prediction.
11399
For **dynamic tables**, the levels are meant to build on top of each other, so the top-to-bottom approach is more important.
114100

articles/ai-services/document-intelligence/concept-add-on-capabilities.md

Lines changed: 50 additions & 31 deletions
Original file line numberDiff line numberDiff line change
@@ -54,7 +54,7 @@ Document Intelligence supports more sophisticated and modular analysis capabilit
5454
>
5555
> Not all add-on capabilities are supported by all models. For more information, *see* [model data extraction](concept-model-overview.md#analysis-features).
5656
57-
The following add-on capability isavailable for`2024-02-29-preview`, `2024-02-29-preview`, and later releases:
57+
The following add-on capabilities are available for`2024-02-29-preview`, `2024-02-29-preview`, and later releases:
5858

5959
* [`keyValuePairs`](#key-value-pairs)
6060

@@ -81,20 +81,24 @@ Add-On* - Query fields are priced differently than the other add-on features. Se
8181

8282
## High resolution extraction
8383

84-
The task of recognizing small text from large-size documents, like engineering drawings, is a challenge. Often the text is mixed with other graphical elements and has varying fonts, sizes and orientations. Moreover, the text can be broken into separate parts or connected with other symbols. Document Intelligence now supports extracting content from these types of documents with the `ocr.highResolution` capability. You get improved quality of content extraction from A1/A2/A3 documents by enabling this add-on capability.
84+
The task of recognizing small text from large-size documents, like engineering drawings, is a challenge. Often the text is mixed with other graphical elements and has varying fonts, sizes, and orientations. Moreover, the text can be broken into separate parts or connected with other symbols. Document Intelligence now supports extracting content from these types of documents with the `ocr.highResolution` capability. You get improved quality of content extraction from A1/A2/A3 documents by enabling this add-on capability.
8585

8686
### REST API
8787

8888
::: moniker range="doc-intel-4.0.0"
89-
```REST
90-
https://{your resource}.cognitiveservices.azure.com/documentintelligence/documentModels/prebuilt-layout:analyze?api-version=2024-02-29-preview&features=ocrHighResolution
89+
90+
```bash
91+
{your-resource-endpoint}.cognitiveservices.azure.com/documentintelligence/documentModels/prebuilt-layout:analyze?api-version=2024-02-29-preview&features=ocrHighResolution
9192
```
93+
9294
:::moniker-end
9395

9496
:::moniker range="doc-intel-3.1.0"
95-
```REST
96-
https://{your resource}.cognitiveservices.azure.com/formrecognizer/documentModels/prebuilt-layout:analyze?api-version=2023-07-31&features=ocrHighResolution
97+
98+
```bash
99+
{your-resource-endpoint}.cognitiveservices.azure.com/formrecognizer/documentModels/prebuilt-layout:analyze?api-version=2023-07-31&features=ocrHighResolution
97100
```
101+
98102
:::moniker-end
99103

100104
## Formula extraction
@@ -132,15 +136,19 @@ The `ocr.formula` capability extracts all identified formulas, such as mathemati
132136
### REST API
133137

134138
::: moniker range="doc-intel-4.0.0"
135-
```REST
136-
https://{your resource}.cognitiveservices.azure.com/documentintelligence/documentModels/prebuilt-layout:analyze?api-version=2024-02-29-preview&features=formulas
139+
140+
```bash
141+
{your-resource-endpoint}.cognitiveservices.azure.com/documentintelligence/documentModels/prebuilt-layout:analyze?api-version=2024-02-29-preview&features=formulas
137142
```
143+
138144
:::moniker-end
139145

140146
:::moniker range="doc-intel-3.1.0"
141-
```REST
142-
https://{your resource}.cognitiveservices.azure.com/formrecognizer/documentModels/prebuilt-layout:analyze?api-version=2023-07-31&features=formulas
147+
148+
```bash
149+
{your-resource-endpoint}.cognitiveservices.azure.com/formrecognizer/documentModels/prebuilt-layout:analyze?api-version=2023-07-31&features=formulas
143150
```
151+
144152
:::moniker-end
145153

146154
## Font property extraction
@@ -186,15 +194,19 @@ The `ocr.font` capability extracts all font properties of text extracted in the
186194
### REST API
187195

188196
::: moniker range="doc-intel-4.0.0"
189-
```REST
190-
https://{your resource}.cognitiveservices.azure.com/documentintelligence/documentModels/prebuilt-layout:analyze?api-version=2024-02-29-preview&features=styleFont
197+
198+
```bash
199+
{your-resource-endpoint}.cognitiveservices.azure.com/documentintelligence/documentModels/prebuilt-layout:analyze?api-version=2024-02-29-preview&features=styleFont
191200
```
201+
192202
:::moniker-end
193203

194204
:::moniker range="doc-intel-3.1.0"
195-
```REST
196-
https://{your resource}.cognitiveservices.azure.com/formrecognizer/documentModels/prebuilt-layout:analyze?api-version=2023-07-31&features=styleFont
205+
206+
```bash
207+
{your-resource-endpoint}.cognitiveservices.azure.com/formrecognizer/documentModels/prebuilt-layout:analyze?api-version=2023-07-31&features=styleFont
197208
```
209+
198210
:::moniker-end
199211

200212
## Barcode property extraction
@@ -222,15 +234,19 @@ The `ocr.barcode` capability extracts all identified barcodes in the `barcodes`
222234
### REST API
223235

224236
::: moniker range="doc-intel-4.0.0"
225-
```REST
226-
https://{your resource}.cognitiveservices.azure.com/documentintelligence/documentModels/prebuilt-layout:analyze?api-version=2024-02-29-preview&features=barcodes
237+
238+
```bash
239+
{your-resource-endpoint}.cognitiveservices.azure.com/documentintelligence/documentModels/prebuilt-layout:analyze?api-version=2024-02-29-preview&features=barcodes
227240
```
241+
228242
:::moniker-end
229243

230244
:::moniker range="doc-intel-3.1.0"
231-
```REST
232-
https://{your resource}.cognitiveservices.azure.com/formrecognizer/documentModels/prebuilt-layout:analyze?api-version=2023-07-31&features=barcodes
245+
246+
```bash
247+
{your-resource-endpoint}.cognitiveservices.azure.com/formrecognizer/documentModels/prebuilt-layout:analyze?api-version=2023-07-31&features=barcodes
233248
```
249+
234250
:::moniker-end
235251

236252
## Language detection
@@ -255,15 +271,19 @@ Adding the `languages` feature to the `analyzeResult` request predicts the detec
255271
### REST API
256272

257273
::: moniker range="doc-intel-4.0.0"
258-
```REST
259-
https://{your resource}.cognitiveservices.azure.com/documentintelligence/documentModels/prebuilt-layout:analyze?api-version=2024-02-29-preview&features=languages
274+
275+
```bash
276+
{your-resource-endpoint}.cognitiveservices.azure.com/documentintelligence/documentModels/prebuilt-layout:analyze?api-version=2024-02-29-preview&features=languages
260277
```
278+
261279
:::moniker-end
262280

263281
:::moniker range="doc-intel-3.1.0"
264-
```REST
265-
https://{your resource}.cognitiveservices.azure.com/formrecognizer/documentModels/prebuilt-layout:analyze?api-version=2023-07-31&features=languages
282+
283+
```bash
284+
{your-resource-endpoint}.cognitiveservices.azure.com/formrecognizer/documentModels/prebuilt-layout:analyze?api-version=2023-07-31&features=languages
266285
```
286+
267287
:::moniker-end
268288

269289
:::moniker range="doc-intel-4.0.0"
@@ -278,8 +298,8 @@ Keys can also exist in isolation when the model detects that a key exists, with
278298

279299
### REST API
280300

281-
```REST
282-
https://{your resource}.cognitiveservices.azure.com/documentintelligence/documentModels/prebuilt-layout:analyze?api-version=2024-02-29-preview&features=keyValuePairs
301+
```bash
302+
{your-resource-endpoint}.cognitiveservices.azure.com/documentintelligence/documentModels/prebuilt-layout:analyze?api-version=2024-02-29-preview&features=keyValuePairs
283303
```
284304

285305
## Query Fields
@@ -298,7 +318,7 @@ Query fields are an add-on capability to extend the schema extracted from any pr
298318

299319
> [!NOTE]
300320
>
301-
> Document Intelligence Studio query field extraction is currently available with the Layout and Prebuilt models `2024-02-29-preview` `2023-10-31-preview` API and later releases except for the ```us.tax.*``` models (W2, 1098s and 1099s models).
321+
> Document Intelligence Studio query field extraction is currently available with the Layout and Prebuilt models `2024-02-29-preview` `2023-10-31-preview` API and later releases except for the `US tax` models (W2, 1098s, and 1099s models).
302322
303323
### Query field extraction
304324

@@ -308,7 +328,7 @@ For query field extraction, specify the fields you want to extract and Document
308328

309329
:::image type="content" source="media/studio/query-fields.png" alt-text="Screenshot of the query fields button in Document Intelligence Studio.":::
310330

311-
* You can pass a list of field labels like `Party1`, `Party2`, `TermsOfUse`, `PaymentTerms`, `PaymentDate`, and `TermEndDate`" as part of the `analyze document` request.
331+
* You can pass a list of field labels like `Party1`, `Party2`, `TermsOfUse`, `PaymentTerms`, `PaymentDate`, and `TermEndDate` as part of the `analyze document` request.
312332

313333
:::image type="content" source="media/studio/query-field-select.png" alt-text="Screenshot of query fields selection window in Document Intelligence Studio.":::
314334

@@ -318,8 +338,8 @@ For query field extraction, specify the fields you want to extract and Document
318338

319339
### REST API
320340

321-
```REST
322-
https://{your resource}.cognitiveservices.azure.com/documentintelligence/documentModels/prebuilt-layout:analyze?api-version=2024-02-29-preview&features=queryFields&queryFields=TERMS
341+
```bash
342+
{your-resource-endpoint}.cognitiveservices.azure.com/documentintelligence/documentModels/prebuilt-layout:analyze?api-version=2024-02-29-preview&features=queryFields&queryFields=TERMS
323343
```
324344

325345
:::moniker-end
@@ -328,9 +348,8 @@ https://{your resource}.cognitiveservices.azure.com/documentintelligence/documen
328348

329349
> [!div class="nextstepaction"]
330350
> Learn more:
331-
> [**Read model**](concept-read.md) [**Layout model**](concept-layout.md).
351+
> [**Read model**](concept-read.md) [**Layout model**](concept-layout.md)
332352
333353
> [!div class="nextstepaction"]
334354
> SDK samples:
335-
> [**python**](/python/api/overview/azure/ai-documentintelligence-readme).
336-
355+
> [**python**](/python/api/overview/azure/ai-documentintelligence-readme)

articles/ai-services/document-intelligence/concept-credit-card.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
---
2-
title: Document Intelligence Credit/Debit Card model
2+
title: Document Intelligence credit debit card model
33
titleSuffix: Azure AI services
44
description: Use Document Intelligence credit/debit card model extract key fields from credit and debit cards.
55
author: laujan
@@ -17,13 +17,13 @@ monikerRange: '>=doc-intel-4.0.0'
1717
<!-- markdownlint-disable MD049 -->
1818
<!-- markdownlint-disable MD001 -->
1919

20-
# Document Intelligence Credit/Debit Card model
20+
# Document Intelligence credit card model
2121

2222
**This content applies to:** ![checkmark](media/yes-icon.png) **v4.0 (preview)** ![checkmark](media/yes-icon.png)
2323

2424
The Document Intelligence credit/debit card model uses powerful Optical Character Recognition (OCR) capabilities to analyze and extract key fields from credit and debit cards. Credit cards and debit cards can be of various formats and quality including phone-captured images, scanned documents, and digital PDFs. The API analyzes document text; extracts key information such as Card Number, Issuing Bank, and Expiration Date; and returns a structured JSON data representation. The model currently supports English-language document formats.
2525

26-
## Automated Credit/Debit Card processing
26+
## Automated card processing
2727

2828
Automated Credit/Debit card processing is the process of extracting key fields from bank cards. Historically, bank card analysis process is achieved manually and, hence, very time consuming. Accurate extraction of key data from bank cards s is typically the first and one of the most critical steps in the contract automation process.
2929

@@ -42,7 +42,7 @@ Document Intelligence v4.0 (2024-02-29-preview) supports the following tools, ap
4242

4343
[!INCLUDE [input requirements](./includes/input-requirements.md)]
4444

45-
## Try Credit/Debit card document data extraction
45+
## Try credit card data extraction
4646

4747
To see how data extraction works for the Credit/Debit card service, you need the following resources:
4848

@@ -79,7 +79,7 @@ The following are the fields extracted from a contract in the JSON output respon
7979
| IssuingBank | String | The name of the bank that issued the card| Woodgrove Bank |
8080
| PaymentNetwork | String |The payment network that processes the card transaction| VISA |
8181
| CardHolderName | String |The name of the person who owns the card| JOHN SMITH |
82-
| CardHolderCompanyName | The name of the company that the card is associated with | CONTOSO SOFTWARE |
82+
| CardHolderCompanyName | String| The name of the company that the card is associated with | Contoso, Ltd. |
8383
| ValidDate | Date | Valid from date | 01/16 |
8484
| ExpirationDate | Date | Expiration date| 01/19 |
8585
| CardVerificationValue | String | Card verification value (CVV) | 764 |

articles/ai-services/document-intelligence/concept-custom-lifecycle.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -26,7 +26,7 @@ With the v3.1 (GA) and later APIs, custom models introduce a expirationDateTime
2626

2727
With the v3.1 API, custom models introduce a new model expiration property. The model expiration is set to two years from the date the model is built for all requests that use a GA API to build a model. To continue to use the model past the expiration date, you need to train the model with a current GA API version. The API version can be the one that the model was originally trained with or a later API version. The following figure illustrates the options when you need to retrain an expiring or expired model.
2828

29-
:::image type="content" source="media/model-lifecycle.png" alt-text="Screenshot showing how to chose an API version to re-train a model.":::
29+
:::image type="content" source="media/model-lifecycle.png" alt-text="Screenshot showing how to choose an API version and retrain a model.":::
3030

3131
## Models trained with preview API version
3232

articles/ai-services/document-intelligence/concept-general-document.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,7 @@ ms.author: lajanuar
1919
:::moniker range="doc-intel-4.0.0"
2020

2121
> [!IMPORTANT]
22-
> Starting with `Document Intelligence versions **2024-02-29-preview, 2023-10-31-preview** and going forward, the general document model (prebuilt-document) is deprecated. To extract key-value pairs, selection marks, text, tables, and structure from documents, use the following models:
22+
> Starting with Document Intelligence versions **2024-02-29-preview, 2023-10-31-preview** and going forward, the general document model (prebuilt-document) is deprecated. To extract key-value pairs, selection marks, text, tables, and structure from documents, use the following models:
2323
2424
| Feature | version| Model ID |
2525
|---------- |---------|--------|

articles/ai-services/document-intelligence/concept-incremental-classifier.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
---
2-
title: Document Intelligence support for Incremental Classifier Training.
2+
title: Document Intelligence support for incremental classifier training
33
titleSuffix: Azure AI services
44
description: Incrementally train custom classifiers by adding new samples to existing classes or adding new classes.
55
author: laujan
@@ -74,7 +74,7 @@ The incremental classifier build request is similar to the [classify document bu
7474
}
7575
```
7676

77-
#### POST Response
77+
#### POST response
7878

7979
All Document Intelligence APIs are asynchronous, polling the returned operation location provides a status on the build operation. Classifiers are fast to train and your classifier can be ready to use in a minute or two.
8080

articles/ai-services/document-intelligence/concept-invoice.md

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,7 @@ ms.custom:
1010
ms.topic: conceptual
1111
ms.date: 02/29/2024
1212
ms.author: lajanuar
13+
ms.custom: references_regions
1314
---
1415

1516
<!-- markdownlint-disable MD033 -->
@@ -244,8 +245,6 @@ Following are the line items extracted from an invoice in the JSON output respon
244245

245246
The invoice key-value pairs and line items extracted are in the `documentResults` section of the JSON output.
246247

247-
:::moniker-end
248-
249248
:::moniker range="<=doc-intel-3.1.0"
250249

251250
### Key-value pairs

articles/ai-services/document-intelligence/concept-marriage-certificate.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,7 @@ monikerRange: '>=doc-intel-4.0.0'
2323

2424
The Document Intelligence Marriage Certificate model uses powerful Optical Character Recognition (OCR) capabilities to analyze and extract key fields from Marriage Certificates. Marriage certificates can be of various formats and quality including phone-captured images, scanned documents, and digital PDFs. The API analyzes document text; extracts key information such as Spouse names, Issue date, and marriage place; and returns a structured JSON data representation. The model currently supports English-language document formats.
2525

26-
## Automated Marriage Certificate processing
26+
## Automated marriage certificate processing
2727

2828
Automated marriage certificate processing is the process of extracting key fields from Marriage certificates. Historically, the marriage certificate analysis process is achieved manually and, hence, very time consuming. Accurate extraction of key data from marriage certificates is typically the first and one of the most critical steps in the marriage certificate automation process.
2929

0 commit comments

Comments
 (0)