You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
> * The `2024-02-29-preview` API, custom classification model won't split documents by default during the analyzing process.
34
+
> * The `2024-07-31-preview` API, custom classification model won't split documents by default during the analyzing process.
35
35
> * You need to explicitly set the ``splitMode`` property to auto to preserve the behavior from previous releases. The default for `splitMode` is `none`.
36
36
> * If your input file contains multiple documents, you need to enable splitting by setting the ``splitMode`` to ``auto``.
37
37
@@ -59,7 +59,7 @@ Custom classification models can analyze a single- or multi-file documents to id
59
59
60
60
✔️ The maximum allowed number of classes is `500`. The maximum allowed number of document samples per class is `100`.
61
61
62
-
The model classifies each page of the input documentto one of the classes in the labeled dataset. To set the threshold for your application, use the confidence score from the response.
62
+
The model classifies each page of the input document, unless specified, to one of the classes in the labeled dataset. You can specify the page numbers to analyze in the input document as well. To set the threshold for your application, use the confidence score from the response.
63
63
### Incremental training
64
64
65
65
With custom models, you need to maintain access to the training dataset to update your classifier with new samples for an existing class, or add new classes. Classifier models now support incremental training where you can reference an existing classifier and append new samples for an existing class or add new classes with samples. Incremental training enables scenarios where data retention is a challenge and the classifier needs to be updated to align with changing business needs. Incremental training is supported with models trained with API version `2024-02-29-preview` and later.
@@ -146,7 +146,7 @@ The classifier attempts to assign each document to one of the classes, if you ex
146
146
147
147
## Training a model
148
148
149
-
Custom classification models are supported by **v4.0:2024-02-29-preview** and **v3.1:2023-07-31 (GA)** APIs. [Document Intelligence Studio](https://formrecognizer.appliedai.azure.com/studio) provides a no-code user interface to interactively train a custom classifier. Follow the [how to guide](how-to-guides/build-a-custom-classifier.md) to get started.
149
+
Custom classification models are supported by **v4.0:2024-02-29-preview, 2024-07-31-preview** and **v3.1:2023-07-31 (GA)** APIs. [Document Intelligence Studio](https://formrecognizer.appliedai.azure.com/studio) provides a no-code user interface to interactively train a custom classifier. Follow the [how to guide](how-to-guides/build-a-custom-classifier.md) to get started.
150
150
151
151
When using the REST API, if you organize your documents by folders, you can use the `azureBlobSource` property of the request to train a classification model.
152
152
@@ -250,7 +250,7 @@ Alternatively, if you have a flat list of files or only plan to use a few select
250
250
```
251
251
252
252
As an example, the file list `car-maint.jsonl` contains the following files.
253
-
253
+
254
254
```json
255
255
{"file":"classifier/car-maint/Commercial Motor Vehicle - Adatum.pdf"}
256
256
{"file":"classifier/car-maint/Commercial Motor Vehicle - Fincher.pdf"}
@@ -259,6 +259,90 @@ As an example, the file list `car-maint.jsonl` contains the following files.
259
259
{"file":"classifier/car-maint/Commercial Motor Vehicle - Trey.pdf"}
260
260
```
261
261
262
+
::: moniker range=">=doc-intel-4.0.0"
263
+
## Overwriting a model
264
+
265
+
> [!NOTE]
266
+
> Starting with the `2024-07-31-preview` API, custom classification models support overwriting a model in-place.
267
+
268
+
You can now update the custom classification in-place. Directly overwriting the model would lose you the ability to compare model quality before deciding to replace the existing model. Model overwriting is allowed when the `allowOverwrite` property is explicitly specified in the request body. It's impossible to recover the overwritten, original model once this action is performed.
269
+
270
+
```json
271
+
272
+
273
+
{
274
+
"classifierId": "existingClassifierName",
275
+
"allowOverwrite": true, // Default=false
276
+
...
277
+
}
278
+
279
+
```
280
+
281
+
## Copy a model
282
+
283
+
> [!NOTE]
284
+
> Starting with the `2024-07-31-preview` API, custom clasification models support copying a model to and from any of the follwing regions:
285
+
> ***East US**
286
+
> ***West US2**
287
+
> ***West Europe**
288
+
>
289
+
> Use the [**REST API**](/rest/api/aiservices/operation-groups?view=rest-aiservices-2024-07-31-preview&preserve-view=true) or [**Document Intelligence Studio**](https://documentintelligence.ai.azure.com/studio/document-classifier/projects) to copy a model to another region.
290
+
291
+
### Generate Copy authorization request
292
+
293
+
The following HTTP request gets copy authorization from your target resource. You need to enter the endpoint and key of your target resource as headers.
294
+
295
+
```http
296
+
POST https://myendpoint.cognitiveservices.azure.com/documentintelligence/documentClassifiers:authorizeCopy?api-version=2024-07-31-preview
297
+
Ocp-Apim-Subscription-Key: {<your-key>}
298
+
```
299
+
300
+
Request body
301
+
302
+
```json
303
+
{
304
+
"classifierId": "targetClassifier",
305
+
"description": "Target classifier description"
306
+
}
307
+
```
308
+
309
+
You receive a `200` response code with response body that contains the JSON payload required to initiate the copy.
The following HTTP request starts the copy operation on the source resource. You need to enter the endpoint and key of your source resource as the url and header. Notice that the request URL contains the classifier ID of the source classifier you want to copy.
325
+
326
+
```http
327
+
POST {endpoint}/documentintelligence/documentClassifiers/{classifierId}:copyTo?api-version=2024-07-31-preview
328
+
Ocp-Apim-Subscription-Key: {<your-key>}
329
+
```
330
+
331
+
The body of your request is the response from the previous step.
@@ -602,7 +600,7 @@ Here are a few factors to consider when using the Document Intelligence bale ext
602
600
603
601
* Do your tables span multiple pages? If so, to avoid having to label all the pages, split the PDF into pages before sending it to Document Intelligence. After the analysis, post-process the pages to a single table.
604
602
605
-
* Refer to [Labeling as tables](quickstarts/try-document-intelligence-studio.md#labeling-as-tables) if you're creating custom models. Dynamic tables have a variable number of rows for each column. Fixed tables have a constant number of rows for each column.
603
+
* Refer to [Tabular fields](concept-custom-label.md#tabular-fields) if you're creating custom models. Dynamic tables have a variable number of rows for each column. Fixed tables have a constant number of rows for each column.
606
604
607
605
> [!NOTE]
608
606
>
@@ -840,7 +838,7 @@ Learn how to accelerate your business processes by automating text extraction wi
840
838
Figures (charts, images) in documents play a crucial role in complementing and enhancing the textual content, providing visual representations that aid in the understanding of complex information. The figures object detected by the Layout model has key properties like `boundingRegions` (the spatial locations of the figure on the document pages, including the page number and the polygon coordinates that outline the figure's boundary), `spans` (details the text spans related to the figure, specifying their offsets and lengths within the document's text. This connection helps in associating the figure with its relevant textual context), `elements` (the identifiers for text elements or paragraphs within the document that are related to or describe the figure) and `caption` if there's any.
841
839
842
840
When *output=figures* is specified during the initial analyze operation, the service generates cropped images for all detected figures that can be accessed via `/analyeResults/{resultId}/figures/{figureId}`.
843
-
`FigureId`will be included in each figure object, following an undocumented convention of `{pageNumber}.{figureIndex}` where `figureIndex` resets to one per page.
841
+
`FigureId`is included in each figure object, following an undocumented convention of `{pageNumber}.{figureIndex}` where `figureIndex` resets to one per page.
844
842
845
843
> [!NOTE]
846
844
> Starting with *2024-07-31-preview*, the bounding regions for figures and tables cover only the core content and exclude associated caption and footnotes.
@@ -327,7 +325,7 @@ Custom models can be broadly classified into two types. Custom classification mo
327
325
328
326
Custom document models analyze and extract data from forms and documents specific to your business. They recognize form fields within your distinct content and extract key-value pairs and table data. You only need one example of the form type to get started.
329
327
330
-
Version v3.0 and later custom models support signature detection in custom template (form) and cross-page tables in both template and neural models. [Signature detection](quickstarts/try-document-intelligence-studio.md#signature-detection) looks for the presence of a signature, not the identity of the person who signs the document. If the model returns **unsigned** for signature detection, the model didn't find a signature in the defined field.
328
+
Version v3.0 and later custom models support signature detection in custom template (form) and cross-page tables in both template and neural models. [Signature detection](concept-custom-template.md#model-capabilities) looks for the presence of a signature, not the identity of the person who signs the document. If the model returns **unsigned** for signature detection, the model didn't find a signature in the defined field.
331
329
332
330
***Sample custom template processed using [Document Intelligence Studio](https://formrecognizer.appliedai.azure.com/studio/customform/projects)***:
Copy file name to clipboardExpand all lines: articles/ai-services/document-intelligence/concept-read.md
+1-3Lines changed: 1 addition & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -152,9 +152,6 @@ GET /documentModels/prebuilt-read/analyzeResults/{resultId}/pdf
152
152
Content-Type: application/pdf
153
153
```
154
154
155
-
## Pricing
156
-
157
-
158
155
### Pages
159
156
160
157
The pages collection is a list of pages within the document. Each page is represented sequentially within the document and includes the orientation angle indicating if the page is rotated and the width and height (dimensions in pixels). The page units in the model output are computed as shown:
@@ -185,6 +182,7 @@ The pages collection is a list of pages within the document. Each page is repres
0 commit comments