Skip to content

Commit eeec38b

Browse files
authored
Merge pull request #1456 from laujan/vinod-batch-analysis-updates
Vinod batch analysis updates
2 parents 660115d + 460ee43 commit eeec38b

File tree

1 file changed

+47
-32
lines changed

1 file changed

+47
-32
lines changed

articles/ai-services/document-intelligence/prebuilt/batch-analysis.md

Lines changed: 47 additions & 32 deletions
Original file line numberDiff line numberDiff line change
@@ -6,35 +6,22 @@ author: laujan
66
ms.service: azure-ai-document-intelligence
77
ms.topic: conceptual
88
ms.date: 11/19/2024
9-
ms.author: ginle
9+
ms.author: lajanuar
1010
monikerRange: '>=doc-intel-4.0.0'
1111
---
1212

1313
# Document Intelligence batch analysis
1414

15-
The batch analysis API allows you to bulk process multiple documents using one asynchronous request. Rather than having to submit documents individually and track multiple request IDs, you can analyze a collection of invoices, a series of loan documents, or a group of custom model training documents simultaneously.
15+
The batch analysis API allows you to bulk process multiple documents using one asynchronous request. Rather than having to submit documents individually and track multiple request IDs, you can analyze a collection of documents like invoices, a series of loan documents, or a group of custom documents simultaneously. The batch API supports reading the documents from Azure blob storage and writing the results to blob storage.
1616

1717
* To utilize batch analysis, you need an Azure Blob storage account with specific containers for both your source documents and the processed outputs.
1818
* Upon completion, the batch operation result lists all of the individual documents processed with their status, such as `succeeded`, `skipped`, or `failed`.
19-
* The Batch API version is available via pay-as-you-go pricing.
20-
21-
The following models support batch analysis:
22-
23-
* [**Read**](../prebuilt/read.md). Extract text lines, words, detected languages, and handwritten style from forms and document.
24-
25-
* [**Layout**](../prebuilt/layout.md). Extract text, tables, selection marks, and structure information from forms and documents.
26-
27-
* [**Custom Template**](../train/custom-template.md). Train models to extract key-value pairs, selection marks, tables, signature fields, and regions from structured forms.
28-
29-
* [**Custom Neural**](../train/custom-neural.md). Train models to extract specified data fields from structured, semi-structured, and unstructured documents.
30-
19+
* The Batch API preview version is available via pay-as-you-go pricing.
3120

3221
## Batch analysis guidance
3322

3423
* The maximum number of documents processed per single batch analyze request (including skipped documents) is 10,000.
3524

36-
* The `azureBlobFileListSource` parameter can be used to break larger requests into smaller ones.
37-
3825
* Operation results are retained for 24 hours after completion. The documents and results are in the storage account provided, but operation status is no longer available 24 hours after completion.
3926

4027
Ready to get started?
@@ -83,35 +70,63 @@ To learn more, *see* [**Create SAS tokens**](../authentication/create-sas-tokens
8370

8471
* Specify the Azure Blob Storage container URL for your source document set within the `azureBlobSource` or `azureBlobFileListSource` objects.
8572

86-
* Specify the Azure Blob Storage container URL for your batch analysis results using `resultContainerUrl`. To avoid accidental overwriting, we recommend using separate containers for source and processed documents.
73+
### Specify the input files
74+
75+
The batch API supports two options for specifying the files to be processed. If you need all files in a container or folder processed, and the number of files is less than the 10000 limit for a single batch request, use the ```azureBlobSource``` container.
76+
77+
If you have specific files in the container or folder to process or the number of files to be processed is over the max limit for a single batch, use the ```azureBlobFileListSource```. Split the dataset into multiple batches and add a file with the list of files to be processed in a JSONL format in the root folder of the container. An example of the file list format is.
78+
79+
```JSON
80+
{"file": "Adatum Corporation.pdf"}
81+
{"file": "Best For You Organics Company.pdf"}
82+
```
83+
### Specify the results location
84+
85+
Specify the Azure Blob Storage container URL for your batch analysis results using `resultContainerUrl`. To avoid accidental overwriting, we recommend using separate containers for source and processed documents.
86+
87+
Set the ```overwriteExisting``` boolean property to false if you don't want any existing results with the same file names overwritten. This setting doesn't affect the billing and only prevents results from being overwritten after the input file is processed.
88+
89+
Set the ```resultPrefix``` to namespace the results from this run of the batch API.
8790

88-
* If you use the same container, set `resultContainerUrl` and `resultPrefix` to match your input `azureBlobSource`.
91+
* If you plan to use the same container for both input and output, set `resultContainerUrl` and `resultPrefix` to match your input `azureBlobSource`.
8992
* When using the same container, you can include the `overwriteExisting` field to decide whether to overwrite any files with the analysis result files.
9093

9194
## Build and run the POST request
9295

9396
Before you run the POST request, replace {your-source-container-SAS-URL} and {your-result-container-SAS-URL} with the values from your Azure Blob storage container instances.
9497

98+
The following sample shows how to add the ```azureBlobSource``` property to the request:
99+
95100
**Allow only one either `azureBlobSource` or `azureBlobFileListSource`.**
96101

97102
```bash
98103
POST /documentModels/{modelId}:analyzeBatch
99104

100-
[
101-
{
102-
"azureBlobSource": {
103-
"containerUrl": "{your-source-container-SAS-URL}",
104-
"prefix": "trainingDocs/"
105-
},
106-
"azureBlobFileListSource": {
107-
"containerUrl": "{your-source-container-SAS-URL}",
105+
{
106+
"azureBlobSource": {
107+
"containerUrl": "https://myStorageAccount.blob.core.windows.net/myContainer?mySasToken",
108+
"prefix": "trainingDocs/"
109+
},
110+
"resultContainerUrl": "https://myStorageAccount.blob.core.windows.net/myOutputContainer?mySasToken",
111+
"resultPrefix": "layoutresult/",
112+
"overwriteExisting": true
113+
}
114+
115+
```
116+
The following sample shows how to add the ```azureBlobFileListSource``` property to the request:
117+
118+
```bash
119+
POST /documentModels/{modelId}:analyzeBatch
120+
121+
{
122+
"azureBlobFileListSource": {
123+
"containerUrl": "https://myStorageAccount.blob.core.windows.net/myContainer?mySasToken",
108124
"fileList": "myFileList.jsonl"
109125
},
110-
"resultContainerUrl": "{your-result-container-SAS-URL}",
111-
"resultPrefix": "trainingDocsResult/",
112-
"overwriteExisting": false
113-
}
114-
]
126+
"resultContainerUrl": "https://myStorageAccount.blob.core.windows.net/myOutputContainer?mySasToken",
127+
"resultPrefix": "customresult/",
128+
"overwriteExisting": true
129+
}
115130

116131
```
117132

@@ -235,4 +250,4 @@ The batch analysis results help you identify which files are successfully analyz
235250
236251
## Next steps
237252

238-
[View code samples on GitHub.](https://github.com/Azure-Samples/document-intelligence-code-samples/tree/main/Python(v4.0)/Prebuilt_model)
253+
[View code samples on GitHub.](https://github.com/Azure-Samples/document-intelligence-code-samples/tree/main/Python(v4.0)/Prebuilt_model)

0 commit comments

Comments
 (0)