You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/ai-services/document-intelligence/prebuilt/batch-analysis.md
+47-32Lines changed: 47 additions & 32 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -6,35 +6,22 @@ author: laujan
6
6
ms.service: azure-ai-document-intelligence
7
7
ms.topic: conceptual
8
8
ms.date: 11/19/2024
9
-
ms.author: ginle
9
+
ms.author: lajanuar
10
10
monikerRange: '>=doc-intel-4.0.0'
11
11
---
12
12
13
13
# Document Intelligence batch analysis
14
14
15
-
The batch analysis API allows you to bulk process multiple documents using one asynchronous request. Rather than having to submit documents individually and track multiple request IDs, you can analyze a collection of invoices, a series of loan documents, or a group of custom model training documents simultaneously.
15
+
The batch analysis API allows you to bulk process multiple documents using one asynchronous request. Rather than having to submit documents individually and track multiple request IDs, you can analyze a collection of documents like invoices, a series of loan documents, or a group of custom documents simultaneously. The batch API supports reading the documents from Azure blob storage and writing the results to blob storage.
16
16
17
17
* To utilize batch analysis, you need an Azure Blob storage account with specific containers for both your source documents and the processed outputs.
18
18
* Upon completion, the batch operation result lists all of the individual documents processed with their status, such as `succeeded`, `skipped`, or `failed`.
19
-
* The Batch API version is available via pay-as-you-go pricing.
20
-
21
-
The following models support batch analysis:
22
-
23
-
*[**Read**](../prebuilt/read.md). Extract text lines, words, detected languages, and handwritten style from forms and document.
24
-
25
-
*[**Layout**](../prebuilt/layout.md). Extract text, tables, selection marks, and structure information from forms and documents.
26
-
27
-
*[**Custom Template**](../train/custom-template.md). Train models to extract key-value pairs, selection marks, tables, signature fields, and regions from structured forms.
28
-
29
-
*[**Custom Neural**](../train/custom-neural.md). Train models to extract specified data fields from structured, semi-structured, and unstructured documents.
30
-
19
+
* The Batch API preview version is available via pay-as-you-go pricing.
31
20
32
21
## Batch analysis guidance
33
22
34
23
* The maximum number of documents processed per single batch analyze request (including skipped documents) is 10,000.
35
24
36
-
* The `azureBlobFileListSource` parameter can be used to break larger requests into smaller ones.
37
-
38
25
* Operation results are retained for 24 hours after completion. The documents and results are in the storage account provided, but operation status is no longer available 24 hours after completion.
39
26
40
27
Ready to get started?
@@ -83,35 +70,63 @@ To learn more, *see* [**Create SAS tokens**](../authentication/create-sas-tokens
83
70
84
71
* Specify the Azure Blob Storage container URL for your source document set within the `azureBlobSource` or `azureBlobFileListSource` objects.
85
72
86
-
* Specify the Azure Blob Storage container URL for your batch analysis results using `resultContainerUrl`. To avoid accidental overwriting, we recommend using separate containers for source and processed documents.
73
+
### Specify the input files
74
+
75
+
The batch API supports two options for specifying the files to be processed. If you need all files in a container or folder processed, and the number of files is less than the 10000 limit for a single batch request, use the ```azureBlobSource``` container.
76
+
77
+
If you have specific files in the container or folder to process or the number of files to be processed is over the max limit for a single batch, use the ```azureBlobFileListSource```. Split the dataset into multiple batches and add a file with the list of files to be processed in a JSONL format in the root folder of the container. An example of the file list format is.
78
+
79
+
```JSON
80
+
{"file": "Adatum Corporation.pdf"}
81
+
{"file": "Best For You Organics Company.pdf"}
82
+
```
83
+
### Specify the results location
84
+
85
+
Specify the Azure Blob Storage container URL for your batch analysis results using `resultContainerUrl`. To avoid accidental overwriting, we recommend using separate containers for source and processed documents.
86
+
87
+
Set the ```overwriteExisting``` boolean property to false if you don't want any existing results with the same file names overwritten. This setting doesn't affect the billing and only prevents results from being overwritten after the input file is processed.
88
+
89
+
Set the ```resultPrefix``` to namespace the results from this run of the batch API.
87
90
88
-
* If you use the same container, set `resultContainerUrl` and `resultPrefix` to match your input `azureBlobSource`.
91
+
* If you plan to use the same container for both input and output, set `resultContainerUrl` and `resultPrefix` to match your input `azureBlobSource`.
89
92
* When using the same container, you can include the `overwriteExisting` field to decide whether to overwrite any files with the analysis result files.
90
93
91
94
## Build and run the POST request
92
95
93
96
Before you run the POST request, replace {your-source-container-SAS-URL} and {your-result-container-SAS-URL} with the values from your Azure Blob storage container instances.
94
97
98
+
The following sample shows how to add the ```azureBlobSource``` property to the request:
99
+
95
100
**Allow only one either `azureBlobSource` or `azureBlobFileListSource`.**
0 commit comments