Skip to content

Commit 22d73b8

Browse files
committed
edit for pub
1 parent 0137e6c commit 22d73b8

14 files changed

+119
-122
lines changed

articles/ai-services/.openpublishing.redirection.ai-services.json

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,5 @@
1010
"redirect_url": "/articles/ai-services/document-intelligence/studio-overview",
1111
"redirect_document_id": false
1212
}
13-
14-
1513
]
1614
}

articles/ai-services/document-intelligence/concept-accuracy-confidence.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ ms.service: azure-ai-document-intelligence
88
ms.custom:
99
- ignite-2023
1010
ms.topic: conceptual
11-
ms.date: 04/16/2023
11+
ms.date: 07/09/2024
1212
ms.author: lajanuar
1313
---
1414

articles/ai-services/document-intelligence/concept-composed-models.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ ms.service: azure-ai-document-intelligence
88
ms.custom:
99
- ignite-2023
1010
ms.topic: conceptual
11-
ms.date: 05/23/2024
11+
ms.date: 07/09/2024
1212
ms.author: lajanuar
1313
---
1414

@@ -57,7 +57,7 @@ With the introduction of [**custom classification models**](./concept-custom-cla
5757
5858
* With the model compose operation, you can assign up to 200 models to a single model ID. If the number of models that I want to compose exceeds the upper limit of a composed model, you can use one of these alternatives:
5959

60-
* Classify the documents before calling the custom model. You can use the [read model](concept-read.md) and build a classification based on the extracted text from the documents and certain phrases by using sources like code, regular expressions, or search.
60+
* Classify the documents before calling the custom model. You can use the [Read model](concept-read.md) and build a classification based on the extracted text from the documents and certain phrases by using sources like code, regular expressions, or search.
6161

6262
* If you want to extract the same fields from various structured, semi-structured, and unstructured documents, consider using the deep-learning [custom neural model](concept-custom-neural.md). Learn more about the [differences between the custom template model and the custom neural model](concept-custom.md#compare-model-features).
6363

articles/ai-services/document-intelligence/concept-custom-classifier.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ author: vkurpad
66
manager: nitinme
77
ms.service: azure-ai-document-intelligence
88
ms.topic: conceptual
9-
ms.date: 06/26/2024
9+
ms.date: 07/09/2024
1010
ms.author: lajanuar
1111
ms.custom:
1212
- references_regions
@@ -98,7 +98,7 @@ Classification models can now be trained on documents of different languages. Se
9898

9999
Supported file formats:
100100

101-
|Model | PDF |Image:<br>jpeg/jpg, png, bmp, tiff, heif| Microsoft Office:<br> Word (docx), Excel (xlxs), PowerPoint (pptx)|
101+
|Model | PDF |Image:<br>`jpeg/jpg`, `png`, `bmp`, `tiff`, `heif`| Microsoft Office:<br> Word (docx), Excel (xlxs), PowerPoint (pptx)|
102102
|--------|:----:|:-----:|:---------------:|
103103
|Read ||||
104104
|Layout ||| ✔ (2024-02-29-preview, 2023-10-31-preview, and later) |
@@ -130,7 +130,7 @@ Supported file formats:
130130
When you have more than one document in a file, the classifier can identify the different document types contained within the input file. The classifier response contains the page ranges for each of the identified document types contained within a file. This response can include multiple instances of the same document type.
131131

132132
::: moniker range=">=doc-intel-4.0.0"
133-
The analyze operation now includes a `splitMode` property that gives you granular control over the splitting behavior.
133+
The `analyze` operation now includes a `splitMode` property that gives you granular control over the splitting behavior.
134134

135135
* To treat the entire input file as a single document for classification set the splitMode to `none`. When you do so, the service returns just one class for the entire input file.
136136
* To classify each page of the input file, set the splitMode to `perPage`. The service attempts to classify each page as an individual document.

articles/ai-services/document-intelligence/concept-custom.md

Lines changed: 11 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ ms.service: azure-ai-document-intelligence
88
ms.custom:
99
- ignite-2023
1010
ms.topic: conceptual
11-
ms.date: 05/23/2024
11+
ms.date: 07/09/2024
1212
ms.author: lajanuar
1313
monikerRange: '<=doc-intel-4.0.0'
1414
---
@@ -35,7 +35,7 @@ monikerRange: '<=doc-intel-4.0.0'
3535

3636
Document Intelligence uses advanced machine learning technology to identify documents, detect and extract information from forms and documents, and return the extracted data in a structured JSON output. With Document Intelligence, you can use document analysis models, pre-built/pre-trained, or your trained standalone custom models.
3737

38-
Custom models now include [custom classification models](./concept-custom-classifier.md) for scenarios where you need to identify the document type before invoking the extraction model. Classifier models are available starting with the ```2023-07-31 (GA)``` API. A classification model can be paired with a custom extraction model to analyze and extract fields from forms and documents specific to your business to create a document processing solution. Standalone custom extraction models can be combined to create [composed models](concept-composed-models.md).
38+
Custom models now include [custom classification models](./concept-custom-classifier.md) for scenarios where you need to identify the document type before invoking the extraction model. Classifier models are available starting with the ```2023-07-31 (GA)``` API. A classification model can be paired with a custom extraction model to analyze and extract fields from forms and documents specific to your business. Standalone custom extraction models can be combined to create [composed models](concept-composed-models.md).
3939

4040
::: moniker range=">=doc-intel-3.0.0"
4141

@@ -51,7 +51,7 @@ To create a custom extraction model, label a dataset of documents with the value
5151

5252
> [!IMPORTANT]
5353
>
54-
> Starting with version 4.0 2024-02-29-preview API, custom neural models now support **overlapping fields** and **table, row and cell level confidence**.
54+
> Starting with version 4.0 (2024-02-29-preview) API, custom neural models now support **overlapping fields** and **table, row and cell level confidence**.
5555
>
5656
5757
The custom neural (custom document) model uses deep learning models and base model trained on a large collection of documents. This model is then fine-tuned or adapted to your data when you train the model with a labeled dataset. Custom neural models support structured, semi-structured, and unstructured documents to extract fields. Custom neural models currently support English-language documents. When you're choosing between the two model types, start with a neural model to determine if it meets your functional needs. See [neural models](concept-custom-neural.md) to learn more about custom document models.
@@ -76,7 +76,7 @@ If the language of your documents and extraction scenarios supports custom neura
7676

7777
* Supported file formats:
7878

79-
|Model | PDF |Image: </br>jpeg/jpg, png, bmp, tiff, heif | Microsoft Office: </br> Word (docx), Excel (xlsx), PowerPoint (pptx)|
79+
|Model | PDF |Image: </br>`jpeg/jpg`, `png`, `bmp`, `tiff`, `heif` | Microsoft Office: </br> Word (docx), Excel (xlsx), PowerPoint (pptx)|
8080
|--------|:----:|:-----:|:---------------:
8181
|Read ||||
8282
|Layout ||| ✔ (2024-02-29-preview, 2023-10-31-preview, and later) |
@@ -105,29 +105,29 @@ If the language of your documents and extraction scenarios supports custom neura
105105

106106
### Optimal training data
107107

108-
Training input data is the foundation of any machine learning model. It determines the quality, accuracy, and performance of the model. Therefore, it is crucial to create the best training input data possible for your Document Intelligence project. When you use the Document Intelligence custom model, you provide your own training data. Here are a few tips to help train your models effectively:
108+
Training input data is the foundation of any machine learning model. It determines the quality, accuracy, and performance of the model. Therefore, it's crucial to create the best training input data possible for your Document Intelligence project. When you use the Document Intelligence custom model, you provide your own training data. Here are a few tips to help train your models effectively:
109109

110-
* Use text*based instead of image*based PDFs when possible. One way to identify an image*based PDF is to try selecting specific text in the document. If you can select only the entire image of the text, the document is image based, not text based.
110+
* Use text-based instead of image-based PDFs when possible. One way to identify an image*based PDF is to try selecting specific text in the document. If you can select only the entire image of the text, the document is image based, not text based.
111111

112112
* Organize your training documents by using a subfolder for each format (JPEG/JPG, PNG, BMP, PDF, or TIFF).
113113

114114
* Use forms that have all of the available fields completed.
115115

116116
* Use forms with differing values in each field.
117117

118-
* If your images are low quality, use a larger dataset (more than five training documents).
118+
* Use a larger dataset (more than five training documents) if your images are low quality.
119119

120120
* Determine if you need to use a single model or multiple models composed into a single model.
121121

122-
* Model accuracy can decrease when you have different formats analyzed with a single model. Plan on segmenting your dataset into folders, where each folder is a unique template. Train one model per folder, and compose the resulting models into a single endpoint.
122+
* Consider segmenting your dataset into folders, where each folder is a unique template. Train one model per folder, and compose the resulting models into a single endpoint. Model accuracy can decrease when you have different formats analyzed with a single model.
123123

124-
* Custom forms rely on a consistent visual template. If your form has variations with formats and page breaks, consider segmenting your dataset to train multiple models.
124+
* Consider segmenting your dataset to train multiple models if your form has variations with formats and page breaks. Custom forms rely on a consistent visual template.
125125

126126
* Ensure that you have a balanced dataset by accounting for formats, document types, and structure.
127127

128128
### Build mode
129129

130-
The build custom model operation adds support for the *template* and *neural* custom models. Previous versions of the REST API and client libraries only supported a single build mode that is now known as the *template* mode.
130+
The `build custom model` operation adds support for the *template* and *neural* custom models. Previous versions of the REST API and client libraries only supported a single build mode that is now known as the *template* mode.
131131

132132
* Template models only accept documents that have the same basic page structure—a uniform visual appearance—or the same relative positioning of elements within the document.
133133

@@ -173,7 +173,7 @@ Document Intelligence v3.1 and later models support the following tools, applica
173173

174174
## Custom model life cycle
175175

176-
The life cycle of a custom model depends on the API version that is used to train it. If the API version is a general availability (GA) version, the custom model will have the same life cycle as that version. The custom model will not be available for inference when the API version is deprecated. If the API version is a preview version, the custom model will have the same life cycle as the preview version of the API.
176+
The life cycle of a custom model depends on the API version that is used to train it. If the API version is a general availability (GA) version, the custom model has the same life cycle as that version. The custom model isn't available for inference when the API version is deprecated. If the API version is a preview version, the custom model has the same life cycle as the preview version of the API.
177177

178178
:::moniker-end
179179

articles/ai-services/document-intelligence/concept-layout.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ ms.service: azure-ai-document-intelligence
88
ms.custom:
99
- ignite-2023
1010
ms.topic: conceptual
11-
ms.date: 05/23/2024
11+
ms.date: 07/09/2024
1212
ms.author: lajanuar
1313
---
1414

@@ -311,7 +311,7 @@ The Layout model extracts all identified blocks of text in the `paragraphs` coll
311311

312312
### Paragraph roles
313313

314-
The new machine-learning based page object detection extracts logical roles like titles, section headings, page headers, page footers, and more. The Document Intelligence Layout model assigns certain text blocks in the `paragraphs` collection with their specialized role or type predicted by the model. They're best used with unstructured documents to help understand the layout of the extracted content for a richer semantic analysis. The following paragraph roles are supported:
314+
The new machine-learning based page object detection extracts logical roles like titles, section headings, page headers, page footers, and more. The Document Intelligence Layout model assigns certain text blocks in the `paragraphs` collection with their specialized role or type predicted by the model. It's best to use paragraph roles with unstructured documents to help understand the layout of the extracted content for a richer semantic analysis. The following paragraph roles are supported:
315315

316316
| **Predicted role** | **Description** | **Supported file types** |
317317
| --- | --- | --- |
@@ -577,11 +577,11 @@ Here are a few factors to consider when using the Document Intelligence bale ext
577577

578578
* Is the data that you want to extract presented as a table, and is the table structure meaningful?
579579

580-
* If the data isn't in a table format, can the data fit in a two-dimensional grid?
580+
* Can the data fit in a two-dimensional grid if the data isn't in a table format?
581581

582582
* Do your tables span multiple pages? If so, to avoid having to label all the pages, split the PDF into pages before sending it to Document Intelligence. After the analysis, post-process the pages to a single table.
583583

584-
* If you're creating custom models, refer to [Labeling as tables](quickstarts/try-document-intelligence-studio.md#labeling-as-tables). Dynamic tables have a variable number of rows for each column. Fixed tables have a constant number of rows for each column.
584+
* Refer to [Labeling as tables](quickstarts/try-document-intelligence-studio.md#labeling-as-tables) if you're creating custom models. Dynamic tables have a variable number of rows for each column. Fixed tables have a constant number of rows for each column.
585585

586586
> [!NOTE]
587587
> Table is not supported if the input file is XLSX.

0 commit comments

Comments
 (0)