Skip to content

Commit d1bed3b

Browse files
authored
Merge pull request #283509 from laujan/jp-updates-preview-release
JP updates preview release
2 parents f49aceb + d7a1643 commit d1bed3b

28 files changed

+304
-224
lines changed

articles/ai-services/document-intelligence/concept-accuracy-confidence.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ ms.service: azure-ai-document-intelligence
88
ms.custom:
99
- ignite-2023
1010
ms.topic: conceptual
11-
ms.date: 07/11/2024
11+
ms.date: 08/07/2024
1212
ms.author: lajanuar
1313
---
1414

@@ -134,4 +134,4 @@ Variances in the visual structure of your documents affect the accuracy of your
134134
## Next step
135135

136136
> [!div class="nextstepaction"]
137-
> [Learn to create custom models](quickstarts/try-document-intelligence-studio.md#custom-models)
137+
> [Learn more about custom models](concept-custom.md)

articles/ai-services/document-intelligence/concept-custom-classifier.md

Lines changed: 91 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ author: vkurpad
66
manager: nitinme
77
ms.service: azure-ai-document-intelligence
88
ms.topic: conceptual
9-
ms.date: 07/09/2024
9+
ms.date: 08/07/2024
1010
ms.author: lajanuar
1111
ms.custom:
1212
- references_regions
@@ -31,7 +31,7 @@ monikerRange: '>=doc-intel-3.1.0'
3131

3232
> [!IMPORTANT]
3333
>
34-
> * The `2024-02-29-preview` API, custom classification model won't split documents by default during the analyzing process.
34+
> * The `2024-07-31-preview` API, custom classification model won't split documents by default during the analyzing process.
3535
> * You need to explicitly set the ``splitMode`` property to auto to preserve the behavior from previous releases. The default for `splitMode` is `none`.
3636
> * If your input file contains multiple documents, you need to enable splitting by setting the ``splitMode`` to ``auto``.
3737
@@ -59,7 +59,7 @@ Custom classification models can analyze a single- or multi-file documents to id
5959

6060
✔️ The maximum allowed number of classes is `500`. The maximum allowed number of document samples per class is `100`.
6161

62-
The model classifies each page of the input document to one of the classes in the labeled dataset. To set the threshold for your application, use the confidence score from the response.
62+
The model classifies each page of the input document, unless specified, to one of the classes in the labeled dataset. You can specify the page numbers to analyze in the input document as well. To set the threshold for your application, use the confidence score from the response.
6363
### Incremental training
6464

6565
With custom models, you need to maintain access to the training dataset to update your classifier with new samples for an existing class, or add new classes. Classifier models now support incremental training where you can reference an existing classifier and append new samples for an existing class or add new classes with samples. Incremental training enables scenarios where data retention is a challenge and the classifier needs to be updated to align with changing business needs. Incremental training is supported with models trained with API version `2024-02-29-preview` and later.
@@ -146,7 +146,7 @@ The classifier attempts to assign each document to one of the classes, if you ex
146146

147147
## Training a model
148148

149-
Custom classification models are supported by **v4.0:2024-02-29-preview** and **v3.1:2023-07-31 (GA)** APIs. [Document Intelligence Studio](https://formrecognizer.appliedai.azure.com/studio) provides a no-code user interface to interactively train a custom classifier. Follow the [how to guide](how-to-guides/build-a-custom-classifier.md) to get started.
149+
Custom classification models are supported by **v4.0: 2024-02-29-preview, 2024-07-31-preview** and **v3.1: 2023-07-31 (GA)** APIs. [Document Intelligence Studio](https://formrecognizer.appliedai.azure.com/studio) provides a no-code user interface to interactively train a custom classifier. Follow the [how to guide](how-to-guides/build-a-custom-classifier.md) to get started.
150150

151151
When using the REST API, if you organize your documents by folders, you can use the `azureBlobSource` property of the request to train a classification model.
152152

@@ -250,7 +250,7 @@ Alternatively, if you have a flat list of files or only plan to use a few select
250250
```
251251

252252
As an example, the file list `car-maint.jsonl` contains the following files.
253-
253+
254254
```json
255255
{"file":"classifier/car-maint/Commercial Motor Vehicle - Adatum.pdf"}
256256
{"file":"classifier/car-maint/Commercial Motor Vehicle - Fincher.pdf"}
@@ -259,6 +259,90 @@ As an example, the file list `car-maint.jsonl` contains the following files.
259259
{"file":"classifier/car-maint/Commercial Motor Vehicle - Trey.pdf"}
260260
```
261261

262+
::: moniker range=">=doc-intel-4.0.0"
263+
## Overwriting a model
264+
265+
> [!NOTE]
266+
> Starting with the `2024-07-31-preview` API, custom classification models support overwriting a model in-place.
267+
268+
You can now update the custom classification in-place. Directly overwriting the model would lose you the ability to compare model quality before deciding to replace the existing model. Model overwriting is allowed when the `allowOverwrite` property is explicitly specified in the request body. It's impossible to recover the overwritten, original model once this action is performed.
269+
270+
```json
271+
272+
273+
{
274+
"classifierId": "existingClassifierName",
275+
"allowOverwrite": true, // Default=false
276+
...
277+
}
278+
279+
```
280+
281+
## Copy a model
282+
283+
> [!NOTE]
284+
> Starting with the `2024-07-31-preview` API, custom clasification models support copying a model to and from any of the follwing regions:
285+
> * **East US**
286+
> * **West US2**
287+
> * **West Europe**
288+
>
289+
> Use the [**REST API**](/rest/api/aiservices/operation-groups?view=rest-aiservices-2024-07-31-preview&preserve-view=true) or [**Document Intelligence Studio**](https://documentintelligence.ai.azure.com/studio/document-classifier/projects) to copy a model to another region.
290+
291+
### Generate Copy authorization request
292+
293+
The following HTTP request gets copy authorization from your target resource. You need to enter the endpoint and key of your target resource as headers.
294+
295+
```http
296+
POST https://myendpoint.cognitiveservices.azure.com/documentintelligence/documentClassifiers:authorizeCopy?api-version=2024-07-31-preview
297+
Ocp-Apim-Subscription-Key: {<your-key>}
298+
```
299+
300+
Request body
301+
302+
```json
303+
{
304+
"classifierId": "targetClassifier",
305+
"description": "Target classifier description"
306+
}
307+
```
308+
309+
You receive a `200` response code with response body that contains the JSON payload required to initiate the copy.
310+
311+
```json
312+
{
313+
"targetResourceId": "/subscriptions/targetSub/resourceGroups/targetRG/providers/Microsoft.CognitiveServices/accounts/targetService",
314+
"targetResourceRegion": "targetResourceRegion",
315+
"targetClassifierId": "targetClassifier",
316+
"targetClassifierLocation": "https://targetEndpoint.cognitiveservices.azure.com/documentintelligence/documentClassifiers/targetClassifier",
317+
"accessToken": "accessToken",
318+
"expirationDateTime": "timestamp"
319+
}
320+
```
321+
322+
### Start Copy operation
323+
324+
The following HTTP request starts the copy operation on the source resource. You need to enter the endpoint and key of your source resource as the url and header. Notice that the request URL contains the classifier ID of the source classifier you want to copy.
325+
326+
```http
327+
POST {endpoint}/documentintelligence/documentClassifiers/{classifierId}:copyTo?api-version=2024-07-31-preview
328+
Ocp-Apim-Subscription-Key: {<your-key>}
329+
```
330+
331+
The body of your request is the response from the previous step.
332+
333+
```json
334+
{
335+
"targetResourceId": "/subscriptions/targetSub/resourceGroups/targetRG/providers/Microsoft.CognitiveServices/accounts/targetService",
336+
"targetResourceRegion": "targetResourceRegion",
337+
"targetClassifierId": "targetClassifier",
338+
"targetClassifierLocation": "https://targetEndpoint.cognitiveservices.azure.com/documentintelligence/documentClassifiers/targetClassifier",
339+
"accessToken": "accessToken",
340+
"expirationDateTime": "timestamp"
341+
}
342+
```
343+
344+
:::moniker-end
345+
262346
## Model response
263347

264348
Analyze an input file with the document classification model.
@@ -269,6 +353,8 @@ Analyze an input file with the document classification model.
269353
https://{endpoint}/documentintelligence/documentClassifiers/{classifier}:analyze?api-version=2024-02-29-preview
270354
```
271355

356+
Starting with the `2024-07-31-preview` API, you can specify pages to analyze from the input document using the `pages` query parameter in the request.
357+
272358
:::moniker-end
273359

274360
:::moniker range="doc-intel-3.1.0"

articles/ai-services/document-intelligence/concept-layout.md

Lines changed: 2 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -5,8 +5,6 @@ description: Extract text, tables, selections, titles, section headings, page he
55
author: laujan
66
manager: nitinme
77
ms.service: azure-ai-document-intelligence
8-
ms.custom:
9-
- ignite-2023
108
ms.topic: conceptual
119
ms.date: 08/07/2024
1210
ms.author: lajanuar
@@ -602,7 +600,7 @@ Here are a few factors to consider when using the Document Intelligence bale ext
602600

603601
* Do your tables span multiple pages? If so, to avoid having to label all the pages, split the PDF into pages before sending it to Document Intelligence. After the analysis, post-process the pages to a single table.
604602

605-
* Refer to [Labeling as tables](quickstarts/try-document-intelligence-studio.md#labeling-as-tables) if you're creating custom models. Dynamic tables have a variable number of rows for each column. Fixed tables have a constant number of rows for each column.
603+
* Refer to [Tabular fields](concept-custom-label.md#tabular-fields) if you're creating custom models. Dynamic tables have a variable number of rows for each column. Fixed tables have a constant number of rows for each column.
606604

607605
> [!NOTE]
608606
>
@@ -840,7 +838,7 @@ Learn how to accelerate your business processes by automating text extraction wi
840838
Figures (charts, images) in documents play a crucial role in complementing and enhancing the textual content, providing visual representations that aid in the understanding of complex information. The figures object detected by the Layout model has key properties like `boundingRegions` (the spatial locations of the figure on the document pages, including the page number and the polygon coordinates that outline the figure's boundary), `spans` (details the text spans related to the figure, specifying their offsets and lengths within the document's text. This connection helps in associating the figure with its relevant textual context), `elements` (the identifiers for text elements or paragraphs within the document that are related to or describe the figure) and `caption` if there's any.
841839

842840
When *output=figures* is specified during the initial analyze operation, the service generates cropped images for all detected figures that can be accessed via `/analyeResults/{resultId}/figures/{figureId}`.
843-
`FigureId` will be included in each figure object, following an undocumented convention of `{pageNumber}.{figureIndex}` where `figureIndex` resets to one per page.
841+
`FigureId` is included in each figure object, following an undocumented convention of `{pageNumber}.{figureIndex}` where `figureIndex` resets to one per page.
844842

845843
> [!NOTE]
846844
> Starting with *2024-07-31-preview*, the bounding regions for figures and tables cover only the core content and exclude associated caption and footnotes.

articles/ai-services/document-intelligence/concept-model-overview.md

Lines changed: 2 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -5,10 +5,8 @@ description: Document processing models for OCR, document layout, invoices, iden
55
author: laujan
66
manager: nitinme
77
ms.service: azure-ai-document-intelligence
8-
ms.custom:
9-
- ignite-2023
108
ms.topic: conceptual
11-
ms.date: 07/09/2024
9+
ms.date: 08/07/2024
1210
ms.author: lajanuar
1311
---
1412

@@ -327,7 +325,7 @@ Custom models can be broadly classified into two types. Custom classification mo
327325

328326
Custom document models analyze and extract data from forms and documents specific to your business. They recognize form fields within your distinct content and extract key-value pairs and table data. You only need one example of the form type to get started.
329327

330-
Version v3.0 and later custom models support signature detection in custom template (form) and cross-page tables in both template and neural models. [Signature detection](quickstarts/try-document-intelligence-studio.md#signature-detection) looks for the presence of a signature, not the identity of the person who signs the document. If the model returns **unsigned** for signature detection, the model didn't find a signature in the defined field.
328+
Version v3.0 and later custom models support signature detection in custom template (form) and cross-page tables in both template and neural models. [Signature detection](concept-custom-template.md#model-capabilities) looks for the presence of a signature, not the identity of the person who signs the document. If the model returns **unsigned** for signature detection, the model didn't find a signature in the defined field.
331329

332330
***Sample custom template processed using [Document Intelligence Studio](https://formrecognizer.appliedai.azure.com/studio/customform/projects)***:
333331

articles/ai-services/document-intelligence/concept-read.md

Lines changed: 1 addition & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -152,9 +152,6 @@ GET /documentModels/prebuilt-read/analyzeResults/{resultId}/pdf
152152
Content-Type: application/pdf
153153
```
154154
155-
## Pricing
156-
157-
158155
### Pages
159156
160157
The pages collection is a list of pages within the document. Each page is represented sequentially within the document and includes the orientation angle indicating if the page is rotated and the width and height (dimensions in pixels). The page units in the model output are computed as shown:
@@ -185,6 +182,7 @@ The pages collection is a list of pages within the document. Each page is repres
185182
}
186183
]
187184
```
185+
188186
::: moniker-end
189187
190188
::: moniker range="doc-intel-3.1.0"

0 commit comments

Comments
 (0)