Skip to content

Commit a37ca95

Browse files
authored
Merge pull request #99493 from PatrickFarley/formre-updates
[cog serv] Formre updates
2 parents 9460c2b + 950f307 commit a37ca95

File tree

8 files changed

+92
-70
lines changed

8 files changed

+92
-70
lines changed

articles/cognitive-services/form-recognizer/includes/python-custom-analyze.md

Lines changed: 19 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -9,11 +9,11 @@ ms.author: pafarley
99

1010
## Analyze forms for key-value pairs and tables
1111

12-
Next, you'll use your newly trained model to analyze a document and extract key-value pairs and tables from it. Call the **Analyze Form** API by running the following code in a new Python script. Before you run the script, make these changes:
12+
Next, you'll use your newly trained model to analyze a document and extract key-value pairs and tables from it. Call the **[Analyze Form](https://westus2.dev.cognitive.microsoft.com/docs/services/form-recognizer-api-v2-preview/operations/AnalyzeWithCustomForm)** API by running the following code in a new Python script. Before you run the script, make these changes:
1313

14-
1. Replace `<path to your form>` with the file path of your form (for example, C:\temp\file.pdf). This can also be the URL of a remote file. For this quickstart, you can use the files under the **Test** folder of the [sample data set](https://go.microsoft.com/fwlink/?linkid=2090451).
15-
1. Replace `<modelID>` with the model ID you received in the previous section.
16-
1. Replace `<Endpoint>` with the endpoint that you obtained with your Form Recognizer subscription key. You can find it on your Form Recognizer resource **Overview** tab.
14+
1. Replace `<file path>` with the file path of your form (for example, C:\temp\file.pdf). This can also be the URL of a remote file. For this quickstart, you can use the files under the **Test** folder of the [sample data set](https://go.microsoft.com/fwlink/?linkid=2090451).
15+
1. Replace `<model_id>` with the model ID you received in the previous section.
16+
1. Replace `<endpoint>` with the endpoint that you obtained with your Form Recognizer subscription key. You can find it on your Form Recognizer resource **Overview** tab.
1717
1. Replace `<file type>` with the file type. Supported types: `application/pdf`, `image/jpeg`, `image/png`, `image/tiff`.
1818
1. Replace `<subscription key>` with your subscription key.
1919

@@ -24,12 +24,11 @@ Next, you'll use your newly trained model to analyze a document and extract key-
2424
from requests import get, post
2525

2626
# Endpoint URL
27-
endpoint = r"<Endpoint>"
28-
apim_key = "<Subscription Key>"
29-
model_id = "<modelID>"
27+
endpoint = r"<endpoint>"
28+
apim_key = "<subsription key>"
29+
model_id = "<model_id>"
3030
post_url = endpoint + "/formrecognizer/v2.0-preview/custom/models/%s/analyze" % model_id
31-
source = r"<path or url to your form>"
32-
prefix = "<prefix string>"
31+
source = r"<file path>"
3332
params = {
3433
"includeTextDetails": True
3534
}
@@ -45,7 +44,7 @@ Next, you'll use your newly trained model to analyze a document and extract key-
4544
try:
4645
resp = post(url = post_url, data = data_bytes, headers = headers, params = params)
4746
if resp.status_code != 202:
48-
print("POST analyze failed:\n%s" % resp.text)
47+
print("POST analyze failed:\n%s" % json.dumps(resp.json()))
4948
quit()
5049
print("POST analyze succeeded:\n%s" % resp.headers)
5150
get_url = resp.headers["operation-location"]
@@ -65,28 +64,31 @@ When you call the **Analyze Form** API, you'll receive a `201 (Success)` respons
6564
Add the following code to the bottom of your Python script. This uses the ID value from the previous call in a new API call to retrieve the analysis results. The **Analyze Form** operation is asynchronous, so this script calls the API at regular intervals until the results are available. We recommend an interval of one second or more.
6665

6766
```python
68-
n_tries = 10
67+
n_tries = 15
6968
n_try = 0
70-
wait_sec = 6
69+
wait_sec = 5
70+
max_wait_sec = 60
7171
while n_try < n_tries:
7272
try:
7373
resp = get(url = get_url, headers = {"Ocp-Apim-Subscription-Key": apim_key})
74-
resp_json = json.loads(resp.text)
74+
resp_json = resp.json()
7575
if resp.status_code != 200:
76-
print("GET analyze results failed:\n%s" % resp_json)
76+
print("GET analyze results failed:\n%s" % json.dumps(resp_json))
7777
quit()
7878
status = resp_json["status"]
7979
if status == "succeeded":
80-
print("Analysis succeeded:\n%s" % resp_json)
80+
print("Analysis succeeded:\n%s" % json.dumps(resp_json))
8181
quit()
8282
if status == "failed":
83-
print("Analysis failed:\n%s" % resp_json)
83+
print("Analysis failed:\n%s" % json.dumps(resp_json))
8484
quit()
8585
# Analysis still running. Wait and retry.
8686
time.sleep(wait_sec)
87-
n_try += 1
87+
n_try += 1
88+
wait_sec = min(2*wait_sec, max_wait_sec)
8889
except Exception as e:
8990
msg = "GET analyze results failed:\n%s" % str(e)
9091
print(msg)
9192
quit()
93+
print("Analyze operation did not complete within the allocated time.")
9294
```

articles/cognitive-services/form-recognizer/quickstarts/curl-receipts.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -35,7 +35,7 @@ To complete this quickstart, you must have:
3535

3636
## Analyze a receipt
3737

38-
To start analyzing a receipt, you call the **Analyze Receipt** API using the cURL command below. Before you run the command, make these changes:
38+
To start analyzing a receipt, you call the **[Analyze Receipt](https://westus2.dev.cognitive.microsoft.com/docs/services/form-recognizer-api-v2-preview/operations/AnalyzeReceiptAsync)** API using the cURL command below. Before you run the command, make these changes:
3939

4040
1. Replace `<Endpoint>` with the endpoint that you obtained with your Form Recognizer subscription.
4141
1. Replace `<your receipt URL>` with the URL address of a receipt image.
@@ -53,7 +53,7 @@ https://cognitiveservice/formrecognizer/v2.0-preview/prebuilt/receipt/operations
5353

5454
## Get the receipt results
5555

56-
After you've called the **Analyze Receipt** API, you call the **Get Receipt Result** API to get the status of the operation and the extracted data. Before you run the command, make these changes:
56+
After you've called the **Analyze Receipt** API, you call the **[Get Analyze Receipt Result](https://westus2.dev.cognitive.microsoft.com/docs/services/form-recognizer-api-v2-preview/operations/GetAnalyzeReceiptResult)** API to get the status of the operation and the extracted data. Before you run the command, make these changes:
5757

5858
1. Replace `<Endpoint>` with the endpoint that you obtained with your Form Recognizer subscription key. You can find it on your Form Recognizer resource **Overview** tab.
5959
1. Replace `<operationId>` with the operation ID from the previous step.

articles/cognitive-services/form-recognizer/quickstarts/curl-train-extract.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -40,7 +40,7 @@ First, you'll need a set of training data in an Azure Storage blob. You should h
4040
> [!NOTE]
4141
> You can use the labeled data feature to manually label some or all of your training data beforehand. This is a more complex process but results in a better trained model. See the [Train with labels](../overview.md#train-with-labels) section of the overview to learn more about this feature.
4242
43-
To train a Form Recognizer model with the documents in your Azure blob container, call the **Train Custom Model** API by running the following cURL command. Before you run the command, make these changes:
43+
To train a Form Recognizer model with the documents in your Azure blob container, call the **[Train Custom Model](https://westus2.dev.cognitive.microsoft.com/docs/services/form-recognizer-api-v2-preview/operations/TrainCustomModelAsync)** API by running the following cURL command. Before you run the command, make these changes:
4444

4545
1. Replace `<Endpoint>` with the endpoint that you obtained with your Form Recognizer subscription.
4646
1. Replace `<subscription key>` with the subscription key you copied from the previous step.
@@ -54,7 +54,7 @@ You'll receive a `201 (Success)` response with a **Location** header. The value
5454

5555
## Get training results
5656

57-
After you've started the train operation, you use a new operation, **Get Custom Model** to check the training status. Pass the model ID into this API call to check the training status:
57+
After you've started the train operation, you use a new operation, **[Get Custom Model](https://westus2.dev.cognitive.microsoft.com/docs/services/form-recognizer-api-v2-preview/operations/GetCustomModel)** to check the training status. Pass the model ID into this API call to check the training status:
5858

5959
1. Replace `<Endpoint>` with the endpoint that you obtained with your Form Recognizer subscription key.
6060
1. Replace `<subscription key>` with your subscription key
@@ -136,7 +136,7 @@ The `"modelId"` field contains the ID of the model you're training. You'll need
136136

137137
## Analyze forms for key-value pairs and tables
138138

139-
Next, you'll use your newly trained model to analyze a document and extract key-value pairs and tables from it. Call the **Analyze Form** API by running the following cURL command. Before you run the command, make these changes:
139+
Next, you'll use your newly trained model to analyze a document and extract key-value pairs and tables from it. Call the **[Analyze Form](https://westus2.dev.cognitive.microsoft.com/docs/services/form-recognizer-api-v2-preview/operations/AnalyzeWithCustomForm)** API by running the following cURL command. Before you run the command, make these changes:
140140

141141
1. Replace `<Endpoint>` with the endpoint that you obtained from your Form Recognizer subscription key. You can find it on your Form Recognizer resource **Overview** tab.
142142
1. Replace `<model ID>` with the model ID that you received in the previous section.

articles/cognitive-services/form-recognizer/quickstarts/python-labeled-data.md

Lines changed: 33 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -55,8 +55,8 @@ All of these files should occupy the same sub-folder and be in the following for
5555

5656
You need OCR result files in order for the service to consider the corresponding input files for labeled training. To obtain OCR results for a given source form, follow the steps below:
5757

58-
1. Call the **/formrecognizer/v2.0-preview/layout/analyze** API on the read Layout container with the input file as part of the request body. Save the ID found in the response's **Operation-Location** header.
59-
1. Call the **/formrecognizer/v2.0-preview/layout/analyzeResults/{id}** API, using operation ID from the previous step.
58+
1. Call the **[Analyze Layout](https://westus2.dev.cognitive.microsoft.com/docs/services/form-recognizer-api-v2-preview/operations/AnalyzeLayoutAsync)** API on the read Layout container with the input file as part of the request body. Save the ID found in the response's **Operation-Location** header.
59+
1. Call the **[Get Analyze Layout Result](https://westus2.dev.cognitive.microsoft.com/docs/services/form-recognizer-api-v2-preview/operations/GetAnalyzeLayoutResult)** API, using the operation ID from the previous step.
6060
1. Get the response and write the contents to a file. For each source form, the corresponding OCR file should have the original file name appended with `.ocr.json`. The OCR JSON output should have the following format. See the [sample OCR file](https://github.com/Azure-Samples/cognitive-services-REST-api-samples/blob/master/curl/form-recognizer/Invoice_1.pdf.ocr.json) for a full example.
6161

6262
```json
@@ -187,11 +187,11 @@ For each source form, the corresponding label file should have the original file
187187

188188
## Train a model using labeled data
189189

190-
To train a model with labeled data, call the **Train Custom Model** API by running the following python code. Before you run the code, make these changes:
190+
To train a model with labeled data, call the **[Train Custom Model](https://westus2.dev.cognitive.microsoft.com/docs/services/form-recognizer-api-v2-preview/operations/TrainCustomModelAsync)** API by running the following python code. Before you run the code, make these changes:
191191

192192
1. Replace `<Endpoint>` with the endpoint URL for your Form Recognizer resource.
193193
1. Replace `<SAS URL>` with the Azure Blob storage container's shared access signature (SAS) URL. To retrieve the SAS URL, open the Microsoft Azure Storage Explorer, right-click your container, and select **Get shared access signature**. Make sure the **Read** and **List** permissions are checked, and click **Create**. Then copy the value in the **URL** section. It should have the form: `https://<storage account>.blob.core.windows.net/<container name>?<SAS value>`.
194-
1. Replace `<prefix>` with the folder name in your blob container where the input data is located. Or, if your data is at the root, leave this blank and remove the `"prefix"` field from the body of the HTTP request.
194+
1. Replace `<Blob folder name>` with the folder name in your blob container where the input data is located. Or, if your data is at the root, leave this blank and remove the `"prefix"` field from the body of the HTTP request.
195195

196196
```python
197197
########### Python Form Recognizer Labeled Async Train #############
@@ -203,14 +203,14 @@ from requests import get, post
203203
endpoint = r"<Endpoint>"
204204
post_url = endpoint + r"/formrecognizer/v2.0-preview/custom/models"
205205
source = r"<SAS URL>"
206-
prefix = "<folder name>"
206+
prefix = "<Blob folder name>"
207207
includeSubFolders = False
208208
useLabelFile = True
209209

210210
headers = {
211211
# Request headers
212212
'Content-Type': 'application/json',
213-
'Ocp-Apim-Subscription-Key': '<Subscription Key>',
213+
'Ocp-Apim-Subscription-Key': '<subsription key>',
214214
}
215215

216216
body = {
@@ -225,7 +225,7 @@ body = {
225225
try:
226226
resp = post(url = post_url, json = body, headers = headers)
227227
if resp.status_code != 201:
228-
print("POST model failed:\n%s" % resp.text)
228+
print("POST model failed (%s):\n%s" % (resp.status_code, json.dumps(resp.json())))
229229
quit()
230230
print("POST model succeeded:\n%s" % resp.headers)
231231
get_url = resp.headers["location"]
@@ -236,25 +236,36 @@ except Exception as e:
236236

237237
## Get training results
238238

239-
After you've started the train operation, you use the returned ID to get the status of the operation. Add the following code to the bottom of your Python script. This extracts the ID value from the training call and passes it to a new API call. The training operation is asynchronous, so this script calls the API at regular intervals until the training status is completed. We recommend an interval of one second or more.
239+
After you've started the train operation, you use the returned ID to get the status of the operation. Add the following code to the bottom of your Python script. This uses the ID value from the training call in a new API call. The training operation is asynchronous, so this script calls the API at regular intervals until the training status is completed. We recommend an interval of one second or more.
240240

241241
```python
242-
operationId = operationURL.split("operations/")[1]
243-
244-
conn = http.client.HTTPSConnection('<Endpoint>')
245-
while True:
242+
n_tries = 15
243+
n_try = 0
244+
wait_sec = 5
245+
max_wait_sec = 60
246+
while n_try < n_tries:
246247
try:
247-
conn.request("GET", f"/formrecognizer/v1.0-preview/custom/models/{operationId}", "", headers)
248-
responseString = conn.getresponse().read().decode('utf-8')
249-
responseDict = json.loads(responseString)
250-
conn.close()
251-
print(responseString)
252-
if 'status' in responseDict and responseDict['status'] not in ['creating','created']:
253-
break
254-
time.sleep(1)
248+
resp = get(url = get_url, headers = headers)
249+
resp_json = resp.json()
250+
if resp.status_code != 200:
251+
print("GET model failed (%s):\n%s" % (resp.status_code, json.dumps(resp_json)))
252+
quit()
253+
model_status = resp_json["modelInfo"]["status"]
254+
if model_status == "ready":
255+
print("Training succeeded:\n%s" % json.dumps(resp_json))
256+
quit()
257+
if model_status == "invalid":
258+
print("Training failed. Model is invalid:\n%s" % json.dumps(resp_json))
259+
quit()
260+
# Training still running. Wait and retry.
261+
time.sleep(wait_sec)
262+
n_try += 1
263+
wait_sec = min(2*wait_sec, max_wait_sec)
255264
except Exception as e:
256-
print(e)
257-
exit()
265+
msg = "GET model failed:\n%s" % str(e)
266+
print(msg)
267+
quit()
268+
print("Train operation did not complete within the allocated time.")
258269
```
259270

260271
When the training process is completed, you'll receive a `201 (Success)` response with JSON content like the following. The response has been shortened for simplicity.

articles/cognitive-services/form-recognizer/quickstarts/python-layout.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -31,7 +31,7 @@ To complete this quickstart, you must have:
3131

3232
## Analyze the form layout
3333

34-
To start analyzing the layout, you call the **Analyze Layout** API using the Python script below. Before you run the script, make these changes:
34+
To start analyzing the layout, you call the **[Analyze Layout](https://westus2.dev.cognitive.microsoft.com/docs/services/form-recognizer-api-v2-preview/operations/AnalyzeLayoutAsync)** API using the Python script below. Before you run the script, make these changes:
3535

3636
1. Replace `<Endpoint>` with the endpoint that you obtained with your Form Recognizer subscription.
3737
1. Replace `<path to your form>` with the path to your local form document.
@@ -82,7 +82,7 @@ https://cognitiveservice/formrecognizer/v2.0-preview/layout/operations/54f0b076-
8282

8383
## Get the layout results
8484

85-
After you've called the **Analyze Layout** API, you call the **Get Analyze Layout Result** API to get the status of the operation and the extracted data. Add the following code to the bottom of your Python script. This uses the operation ID value in a new API call. This script calls the API at regular intervals until the results are available. We recommend an interval of one second or more.
85+
After you've called the **Analyze Layout** API, you call the **[Get Analyze Layout Result](https://westus2.dev.cognitive.microsoft.com/docs/services/form-recognizer-api-v2-preview/operations/GetAnalyzeLayoutResult)** API to get the status of the operation and the extracted data. Add the following code to the bottom of your Python script. This uses the operation ID value in a new API call. This script calls the API at regular intervals until the results are available. We recommend an interval of one second or more.
8686

8787
```python
8888
n_tries = 10

articles/cognitive-services/form-recognizer/quickstarts/python-receipts.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -35,7 +35,7 @@ To complete this quickstart, you must have:
3535

3636
## Analyze a receipt
3737

38-
To start analyzing a receipt, you call the **Analyze Receipt** API using the Python script below. Before you run the script, make these changes:
38+
To start analyzing a receipt, you call the **[Analyze Receipt](https://westus2.dev.cognitive.microsoft.com/docs/services/form-recognizer-api-v2-preview/operations/AnalyzeReceiptAsync)** API using the Python script below. Before you run the script, make these changes:
3939

4040
1. Replace `<Endpoint>` with the endpoint that you obtained with your Form Recognizer subscription.
4141
1. Replace `<your receipt URL>` with the URL address of a receipt image.
@@ -91,7 +91,7 @@ https://cognitiveservice/formrecognizer/v2.0-preview/prebuilt/receipt/operations
9191

9292
## Get the receipt results
9393

94-
After you've called the **Analyze Receipt** API, you call the **Get Receipt Result** API to get the status of the operation and the extracted data. Add the following code to the bottom of your Python script. This uses the operation ID value in a new API call. This script calls the API at regular intervals until the results are available. We recommend an interval of one second or more.
94+
After you've called the **Analyze Receipt** API, you call the **[Get Analyze Receipt Result](https://westus2.dev.cognitive.microsoft.com/docs/services/form-recognizer-api-v2-preview/operations/GetAnalyzeReceiptResult)** API to get the status of the operation and the extracted data. Add the following code to the bottom of your Python script. This uses the operation ID value in a new API call. This script calls the API at regular intervals until the results are available. We recommend an interval of one second or more.
9595

9696
```python
9797
n_tries = 10

0 commit comments

Comments
 (0)