You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
#Customer intent: As the developer of form-processing software, I want to learn what the Form Recognizer service does so I can determine if I should use its features.
13
+
#Customer intent: As a developer of form-processing software, I want to learn what the Form Recognizer service does so I can determine if I should use it.
14
14
---
15
15
16
16
# What is Form Recognizer?
17
17
18
-
Azure Form Recognizer is a cognitive service that uses machine learning technology to identify and extract key-value pairs and table data from form documents. It then outputs structured data that includes the relationships in the original file. You can call your custom Form Recognizer model using a simple REST API in order to reduce complexity and easily integrate it into your workflow or application. You only need five form documents or an empty form of the same type as your input material to get started. You can get results quickly, accurately and tailored to your specific content without the need for heavy manual intervention or extensive data science expertise.
18
+
Azure Form Recognizer is a cognitive service that uses machine learning technology to identify and extract key-value pairs and table data from form documents. It then outputs structured data that includes the relationships in the original file. You can call your custom Form Recognizer model by using a simple REST API to reduce complexity and easily integrate it into your workflow or application. To get started, you just need five form documents or an empty form of the same type as your input material. You quickly get accurate results that are tailored to your specific content without heavy manual intervention or extensive data science expertise.
19
19
20
20
## Request access
21
-
Form Recognizer is available as a limited-access preview. To get access to the preview, please fill out and submit the [Cognitive Services Form Recognizer access request](https://aka.ms/FormRecognizerRequestAccess) form. The form requests information about you, your company, and the user scenario for which you'll use Form Recognizer. If your request is approved by the Azure Cognitive Services team, you'll receive an email with instructions on how to access the service.
21
+
Form Recognizer is available in a limited-access preview. To get access to the preview, fill out and submit the [Form Recognizer access request](https://aka.ms/FormRecognizerRequestAccess) form. The form requests information about you, your company, and the user scenario for which you'll use Form Recognizer. If your request is approved by the Azure Cognitive Services team, you'll receive an email with instructions for accessing the service.
22
22
23
23
## What it does
24
24
25
-
When you submit your input data, the algorithm trains to it, clusters the forms by types, discovers what keys and tables are present, and learns to associate values to keys and entries to tables. Unsupervised learning allows the model to understand the layout and relationships between fields and entries without manual data labeling or intensive coding and maintenance. By contrast, pre-trained machine learning models require standardized data and are less accurate with input material that deviates from traditional formats, like industry-specific forms.
25
+
When you submit your input data, the algorithm trains to it, clusters the forms by types, discovers what keys and tables are present, and learns to associate values to keys and entries to tables. Unsupervised learning allows the model to understand the layout and relationships between fields and entries without manual data labeling or intensive coding and maintenance. By contrast, pre-trained machine learning models require standardized data and are less accurate when used with input material that deviates from traditional formats, like industry-specific forms.
26
26
27
-
Once the model is trained, you can test, retrain, and eventually use it to reliably extract data from more forms according to your needs.
27
+
After you train the model, you can test and retrain it and eventually use it to reliably extract data from more forms according to your needs.
28
28
29
29
## What it includes
30
30
31
-
Form Recognizer is available as a REST API. You can create, train and score a model by invoking the API, and you can optionally run the model in a local Docker container.
31
+
Form Recognizer is available as a REST API. You can create, train, and score a model by invoking the API. If you want, you can run the model in a local Docker container.
32
32
33
33
## Input requirements
34
34
35
-
Form Recognizer works on input documents that meet the following requirements:
35
+
Form Recognizer works on input documents that meet these requirements:
36
36
37
-
* JPG, PNG, or PDF format (text or scanned). Textembedded PDFs are preferable because there is no possibility of error in character extraction and location.
38
-
* File size must be less than 4 megabytes (MB)
39
-
* For images, dimensions must be between 50x50 and 4200x4200 pixels
40
-
* If scanned from paper documents, forms should be high-quality scans
41
-
*Must use the Latin alphabet (English characters)
42
-
*Printed data (not handwritten)
43
-
*Must contain keys and values
37
+
*Format must be JPG, PNG, or PDF (text or scanned). Text-embedded PDFs are best because there's no possibility of error in character extraction and location.
38
+
* File size must be less than 4 megabytes (MB).
39
+
* For images, dimensions must be between 50 x 50 pixels and 4200 x 4200 pixels.
40
+
* If scanned from paper documents, forms should be high-quality scans.
41
+
*Text must use the Latin alphabet (English characters).
42
+
*Data must be printed (not handwritten).
43
+
*Data must contain keys and values.
44
44
* Keys can appear above or to the left of the values, but not below or to the right.
45
45
46
-
Additionally, Form Recognizer does not yet support the following types of input data:
46
+
Form Recognizer doesn't currently support these types of input data:
47
47
48
-
* Complex tables (nested tables, merged headers or cells, and so on)
49
-
* Checkboxes or radio buttons
50
-
* PDF documents longer than 50 pages
48
+
* Complex tables (nested tables, merged headers or cells, and so on).
49
+
* Checkboxes or radio buttons.
50
+
* PDF documents longer than 50 pages.
51
51
52
52
## Where do I start?
53
53
54
54
**Step 1:** Create a Form Recognizer resource in the Azure portal.
55
55
56
56
**Step 2:** Try a quickstart for hands-on experience:
57
-
*[Quickstart: Train a Form Recognizer model and extract form data using REST API with cURL](quickstarts/curl-train-extract.md)
58
-
*[Quickstart: Train a Form Recognizer model and extract form data using REST API with Python](quickstarts/python-train-extract.md)
57
+
*[Quickstart: Train a Form Recognizer model and extract form data by using the REST API with cURL](quickstarts/curl-train-extract.md)
58
+
*[Quickstart: Train a Form Recognizer model and extract form data by using the REST API with Python](quickstarts/python-train-extract.md)
59
59
60
-
We recommend the Free service for learning purposes, but be aware that the number of free pages is limited to 500 pages per month.
60
+
We recommend that you use the Free service when you're learning the technology, but keep in mind that the number of free pages is limited to 500 pages per month.
61
+
62
+
**Step 3:** Review the REST APIs
61
63
62
-
**Step 3:** Review the REST API
63
64
Use the following APIs to train and extract structured data from forms.
64
65
65
66
| REST API | Description |
66
67
|-----|-------------|
67
-
| Train | Train a new model to analyze your forms using 5 forms from the same type or an empty form. |
68
+
| Train | Train a new model to analyze your forms by using five forms from the same type or an empty form. |
68
69
| Analyze |Analyze a single document passed in as a stream to extract key-value pairs and tables from the form with your custom model. |
69
70
70
71
Explore the [REST API reference document](https://aka.ms/form-recognizer/api).
71
72
72
73
## Data privacy and security
73
74
74
-
The service is offered as a [Preview](https://azure.microsoft.com/support/legal/preview-supplemental-terms/) of an Azure Service under the [Online Service Terms](https://www.microsoftvolumelicensing.com/DocumentSearch.aspx?Mode=3&DocumentTypeId=31). As with all the Cognitive Services, developers using the Form Recognizer service should be aware of Microsoft's policies on customer data. See the [Cognitive Services page](https://www.microsoft.com/trustcenter/cloudservices/cognitiveservices) on the Microsoft Trust Center to learn more.
75
+
This service is offered as a [preview](https://azure.microsoft.com/support/legal/preview-supplemental-terms/) of an Azure service under the [Online Service Terms](https://www.microsoftvolumelicensing.com/DocumentSearch.aspx?Mode=3&DocumentTypeId=31). As with all the cognitive services, developers using the Form Recognizer service should be aware of Microsoft policies on customer data. See the [Cognitive Services page](https://www.microsoft.com/trustcenter/cloudservices/cognitiveservices) on the Microsoft Trust Center to learn more.
75
76
76
77
## Next steps
77
78
78
-
Follow a [quickstart](quickstarts/curl-train-extract.md) to get started using the [Form Recognizer APIs](https://aka.ms/form-recognizer/api).
79
+
Complete a [quickstart](quickstarts/curl-train-extract.md) to get started with the [Form Recognizer APIs](https://aka.ms/form-recognizer/api).
Copy file name to clipboardExpand all lines: articles/cognitive-services/form-recognizer/quickstarts/curl-train-extract.md
+25-24Lines changed: 25 additions & 24 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,7 +1,7 @@
1
1
---
2
-
title: "Quickstart: Train a model and extract form data using cURL - Form Recognizer"
2
+
title: "Quickstart: Train a model and extract form data by using cURL - Form Recognizer"
3
3
titleSuffix: Azure Cognitive Services
4
-
description: In this quickstart, you will use the Form Recognizer REST API with cURL to train a model and extract data from forms.
4
+
description: In this quickstart, you'll use the Form Recognizer REST API with cURL to train a model and extract data from forms.
5
5
author: PatrickFarley
6
6
manager: nitinme
7
7
@@ -13,34 +13,34 @@ ms.author: pafarley
13
13
#Customer intent: As a developer or data scientist familiar with cURL, I want to learn how to use Form Recognizer to extract my form data.
14
14
---
15
15
16
-
# Quickstart: Train a Form Recognizer model and extract form data using REST API with cURL
16
+
# Quickstart: Train a Form Recognizer model and extract form data by using the REST API with cURL
17
17
18
-
In this quickstart, you will use the Form Recognizer's REST API with cURL to train and score forms to extract key-value pairs and tables.
18
+
In this quickstart, you'll use the Azure Form Recognizer REST API with cURL to train and score forms to extract key-value pairs and tables.
19
19
20
20
If you don't have an Azure subscription, create a [free account](https://azure.microsoft.com/free/?WT.mc_id=A261C142F) before you begin.
21
21
22
22
## Prerequisites
23
-
24
-
-You got access to the Form Recognizer limited-access preview. To get access to the preview, please fill out and submit the [Cognitive Services Form Recognizer access request](https://aka.ms/FormRecognizerRequestAccess) form.
25
-
-You must have [cURL](https://curl.haxx.se/windows/).
26
-
-You must have a subscription key for Form Recognizer. Follow the single-service subscription instructions in [Create a Cognitive Services account](https://docs.microsoft.com/azure/cognitive-services/cognitive-services-apis-create-account#single-service-subscription) to subscribe to Form Recognizer and get your key. Do not use the multi-service subscription, as this will not include the Form Recognizer service.
27
-
-You must have a minimum set of five forms of the same type. You can use a [sample dataset](https://go.microsoft.com/fwlink/?linkid=2090451) for this quickstart.
23
+
To complete this quickstart, you must have:
24
+
-Access to the Form Recognizer limited-access preview. To get access to the preview, fill out and submit the [Form Recognizer access request](https://aka.ms/FormRecognizerRequestAccess) form.
25
+
-[cURL](https://curl.haxx.se/windows/) installed.
26
+
-A subscription key for Form Recognizer. Follow the single-service subscription instructions in [Create a Cognitive Services account](https://docs.microsoft.com/azure/cognitive-services/cognitive-services-apis-create-account#single-service-subscription) to subscribe to Form Recognizer and get your key. Don't use a multi-service subscription, because it won't include the Form Recognizer service.
27
+
-A set of at least five forms of the same type. You can use a [sample dataset](https://go.microsoft.com/fwlink/?linkid=2090451) for this quickstart.
28
28
29
29
## Train a Form Recognizer model
30
30
31
-
First, you will need a set of training data. You can use data in an Azure Blob or your own local training data. You should have a minimum of five sample forms (PDF documents and/or images) of the same type/structure as your main input data. Alternatively, you can use a single empty form; the form's filename includes the word "empty."
31
+
First, you'll need a set of training data. You can use data in an Azure blob or your own local training data. You should have a minimum of five sample forms (PDF documents and/or images) of the same type/structure as your main input data. Or you can use a single empty form. The form's file name needs to include the word "empty."
32
32
33
-
To train a Form Recognizer model using the documents in your Azure Blob container, call the **Train** API by executing the cURL command below. Before running the command, make the following changes:
33
+
To train a Form Recognizer model by using the documents in your Azure blob container, call the **Train** API by running the cURL command that follows. Before you run the command, make these changes:
34
34
35
-
* Replace `<Endpoint>` with the endpoint you obtained from your Form Recognizer subscription key. You can find it in your Form Recognizer resource overview tab.
36
-
* Replace `<SAS URL>` with an Azure Blob Storage container shared access signature (SAS) URL where the training data is located.
37
-
* Replace `<subscription key>` with your subscription key.
35
+
1. Replace `<Endpoint>` with the endpoint that you obtained from your Form Recognizer subscription key. You can find it on your Form Recognizer resource **Overview** tab.
36
+
1. Replace `<SAS URL>` with an Azure Blob storage container shared access signature (SAS) URL of the location of the training data.
37
+
1. Replace `<subscription key>` with your subscription key.
You will receive a `200 (Success)` response with the following JSON output:
43
+
You'll receive a `200 (Success)` response with the following JSON output:
44
44
45
45
```json
46
46
{
@@ -81,25 +81,26 @@ You will receive a `200 (Success)` response with the following JSON output:
81
81
}
82
82
```
83
83
84
-
Take note of the `"modelId"` value; you will need it for the following steps.
84
+
Note the `"modelId"` value. You'll need it in the following steps.
85
85
86
86
## Extract key-value pairs and tables from forms
87
87
88
-
Next, you will analyze a document and extract key-value pairs and tables from it. Call the **Model - Analyze** API by executing the cURL command below. Before running the command, make the following changes:
88
+
Next, you'll analyze a document and extract key-value pairs and tables from it. Call the **Model - Analyze** API by running the cURL command that follows. Before you run the command, make these changes:
89
+
90
+
1. Replace `<Endpoint>` with the endpoint that you obtained from your Form Recognizer subscription key. You can find it on your Form Recognizer resource **Overview** tab.
91
+
1. Replace `<modelID>` with the model ID that you received in the previous section.
92
+
1. Replace `<path to your form>` with the file path of your form.
93
+
1. Replace `<file type>` with the file type. Supported types: pdf, image/jpeg, image/png.
94
+
1. Replace `<subscription key>` with your subscription key.
89
95
90
-
* Replace `<Endpoint>` with the endpoint you obtained from your Form Recognizer subscription key. You can find it in your Form Recognizer resource **Overview** tab.
91
-
* Replace `<modelID>` with the model ID you received in the previous step of training the model.
92
-
* Replace `<path to your form>` with the file path to your form.
93
-
* Replace `<subscription key>` with your subscription key.
94
-
* Replace `<file type>` with the file type - supported types pdf, image/jpeg, image/png.
95
96
96
97
```bash
97
98
curl -X POST "https://<Endpoint>/formrecognizer/v1.0-preview/custom/models/<modelID>/analyze" -H "Content-Type: multipart/form-data" -F "form=@\"<path to your form>\";type=application/<file type>" -H "Ocp-Apim-Subscription-Key: <subscription key>"
98
99
```
99
100
100
101
### Examine the response
101
102
102
-
A successful response is returned in JSON and represents the extracted key-value pairs and tables from the form.
103
+
A success response is returned in JSON. It represents the key-value pairs and tables extracted from the form:
103
104
104
105
```bash
105
106
{
@@ -424,7 +425,7 @@ A successful response is returned in JSON and represents the extracted key-value
424
425
425
426
## Next steps
426
427
427
-
In this guide, you used the Form Recognizer REST APIs with cURL to train a model and run it in a sample case. Next, see the reference documentation to explore the Form Recognizer API in more depth.
428
+
In this quickstart, you used the Form Recognizer REST API with cURL to train a model and run it in a sample scenario. Next, see the reference documentation to explore the Form Recognizer API in more depth.
428
429
429
430
> [!div class="nextstepaction"]
430
431
> [REST API reference documentation](https://aka.ms/form-recognizer/api)
0 commit comments