Skip to content

Commit 601e182

Browse files
authored
Merge pull request #76536 from v-albemi/3-form-recognizer-articles
Edit pass: 3 form recognizer articles
2 parents 26e3978 + 9666d8e commit 601e182

File tree

3 files changed

+75
-75
lines changed

3 files changed

+75
-75
lines changed

articles/cognitive-services/form-recognizer/overview.md

Lines changed: 26 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -10,69 +10,70 @@ ms.subservice: form-recognizer
1010
ms.topic: overview
1111
ms.date: 04/08/2019
1212
ms.author: pafarley
13-
#Customer intent: As the developer of form-processing software, I want to learn what the Form Recognizer service does so I can determine if I should use its features.
13+
#Customer intent: As a developer of form-processing software, I want to learn what the Form Recognizer service does so I can determine if I should use it.
1414
---
1515

1616
# What is Form Recognizer?
1717

18-
Azure Form Recognizer is a cognitive service that uses machine learning technology to identify and extract key-value pairs and table data from form documents. It then outputs structured data that includes the relationships in the original file. You can call your custom Form Recognizer model using a simple REST API in order to reduce complexity and easily integrate it into your workflow or application. You only need five form documents or an empty form of the same type as your input material to get started. You can get results quickly, accurately and tailored to your specific content without the need for heavy manual intervention or extensive data science expertise.
18+
Azure Form Recognizer is a cognitive service that uses machine learning technology to identify and extract key-value pairs and table data from form documents. It then outputs structured data that includes the relationships in the original file. You can call your custom Form Recognizer model by using a simple REST API to reduce complexity and easily integrate it into your workflow or application. To get started, you just need five form documents or an empty form of the same type as your input material. You quickly get accurate results that are tailored to your specific content without heavy manual intervention or extensive data science expertise.
1919

2020
## Request access
21-
Form Recognizer is available as a limited-access preview. To get access to the preview, please fill out and submit the [Cognitive Services Form Recognizer access request](https://aka.ms/FormRecognizerRequestAccess) form. The form requests information about you, your company, and the user scenario for which you'll use Form Recognizer. If your request is approved by the Azure Cognitive Services team, you'll receive an email with instructions on how to access the service.
21+
Form Recognizer is available in a limited-access preview. To get access to the preview, fill out and submit the [Form Recognizer access request](https://aka.ms/FormRecognizerRequestAccess) form. The form requests information about you, your company, and the user scenario for which you'll use Form Recognizer. If your request is approved by the Azure Cognitive Services team, you'll receive an email with instructions for accessing the service.
2222

2323
## What it does
2424

25-
When you submit your input data, the algorithm trains to it, clusters the forms by types, discovers what keys and tables are present, and learns to associate values to keys and entries to tables. Unsupervised learning allows the model to understand the layout and relationships between fields and entries without manual data labeling or intensive coding and maintenance. By contrast, pre-trained machine learning models require standardized data and are less accurate with input material that deviates from traditional formats, like industry-specific forms.
25+
When you submit your input data, the algorithm trains to it, clusters the forms by types, discovers what keys and tables are present, and learns to associate values to keys and entries to tables. Unsupervised learning allows the model to understand the layout and relationships between fields and entries without manual data labeling or intensive coding and maintenance. By contrast, pre-trained machine learning models require standardized data and are less accurate when used with input material that deviates from traditional formats, like industry-specific forms.
2626

27-
Once the model is trained, you can test, retrain, and eventually use it to reliably extract data from more forms according to your needs.
27+
After you train the model, you can test and retrain it and eventually use it to reliably extract data from more forms according to your needs.
2828

2929
## What it includes
3030

31-
Form Recognizer is available as a REST API. You can create, train and score a model by invoking the API, and you can optionally run the model in a local Docker container.
31+
Form Recognizer is available as a REST API. You can create, train, and score a model by invoking the API. If you want, you can run the model in a local Docker container.
3232

3333
## Input requirements
3434

35-
Form Recognizer works on input documents that meet the following requirements:
35+
Form Recognizer works on input documents that meet these requirements:
3636

37-
* JPG, PNG, or PDF format (text or scanned). Text embedded PDFs are preferable because there is no possibility of error in character extraction and location.
38-
* File size must be less than 4 megabytes (MB)
39-
* For images, dimensions must be between 50x50 and 4200x4200 pixels
40-
* If scanned from paper documents, forms should be high-quality scans
41-
* Must use the Latin alphabet (English characters)
42-
* Printed data (not handwritten)
43-
* Must contain keys and values
37+
* Format must be JPG, PNG, or PDF (text or scanned). Text-embedded PDFs are best because there's no possibility of error in character extraction and location.
38+
* File size must be less than 4 megabytes (MB).
39+
* For images, dimensions must be between 50 x 50 pixels and 4200 x 4200 pixels.
40+
* If scanned from paper documents, forms should be high-quality scans.
41+
* Text must use the Latin alphabet (English characters).
42+
* Data must be printed (not handwritten).
43+
* Data must contain keys and values.
4444
* Keys can appear above or to the left of the values, but not below or to the right.
4545

46-
Additionally, Form Recognizer does not yet support the following types of input data:
46+
Form Recognizer doesn't currently support these types of input data:
4747

48-
* Complex tables (nested tables, merged headers or cells, and so on)
49-
* Checkboxes or radio buttons
50-
* PDF documents longer than 50 pages
48+
* Complex tables (nested tables, merged headers or cells, and so on).
49+
* Checkboxes or radio buttons.
50+
* PDF documents longer than 50 pages.
5151

5252
## Where do I start?
5353

5454
**Step 1:** Create a Form Recognizer resource in the Azure portal.
5555

5656
**Step 2:** Try a quickstart for hands-on experience:
57-
* [Quickstart: Train a Form Recognizer model and extract form data using REST API with cURL](quickstarts/curl-train-extract.md)
58-
* [Quickstart: Train a Form Recognizer model and extract form data using REST API with Python](quickstarts/python-train-extract.md)
57+
* [Quickstart: Train a Form Recognizer model and extract form data by using the REST API with cURL](quickstarts/curl-train-extract.md)
58+
* [Quickstart: Train a Form Recognizer model and extract form data by using the REST API with Python](quickstarts/python-train-extract.md)
5959

60-
We recommend the Free service for learning purposes, but be aware that the number of free pages is limited to 500 pages per month.
60+
We recommend that you use the Free service when you're learning the technology, but keep in mind that the number of free pages is limited to 500 pages per month.
61+
62+
**Step 3:** Review the REST APIs
6163

62-
**Step 3:** Review the REST API
6364
Use the following APIs to train and extract structured data from forms.
6465

6566
| REST API | Description |
6667
|-----|-------------|
67-
| Train | Train a new model to analyze your forms using 5 forms from the same type or an empty form. |
68+
| Train | Train a new model to analyze your forms by using five forms from the same type or an empty form. |
6869
| Analyze |Analyze a single document passed in as a stream to extract key-value pairs and tables from the form with your custom model. |
6970

7071
Explore the [REST API reference document](https://aka.ms/form-recognizer/api).
7172

7273
## Data privacy and security
7374

74-
The service is offered as a [Preview](https://azure.microsoft.com/support/legal/preview-supplemental-terms/) of an Azure Service under the [Online Service Terms](https://www.microsoftvolumelicensing.com/DocumentSearch.aspx?Mode=3&DocumentTypeId=31). As with all the Cognitive Services, developers using the Form Recognizer service should be aware of Microsoft's policies on customer data. See the [Cognitive Services page](https://www.microsoft.com/trustcenter/cloudservices/cognitiveservices) on the Microsoft Trust Center to learn more.
75+
This service is offered as a [preview](https://azure.microsoft.com/support/legal/preview-supplemental-terms/) of an Azure service under the [Online Service Terms](https://www.microsoftvolumelicensing.com/DocumentSearch.aspx?Mode=3&DocumentTypeId=31). As with all the cognitive services, developers using the Form Recognizer service should be aware of Microsoft policies on customer data. See the [Cognitive Services page](https://www.microsoft.com/trustcenter/cloudservices/cognitiveservices) on the Microsoft Trust Center to learn more.
7576

7677
## Next steps
7778

78-
Follow a [quickstart](quickstarts/curl-train-extract.md) to get started using the [Form Recognizer APIs](https://aka.ms/form-recognizer/api).
79+
Complete a [quickstart](quickstarts/curl-train-extract.md) to get started with the [Form Recognizer APIs](https://aka.ms/form-recognizer/api).

articles/cognitive-services/form-recognizer/quickstarts/curl-train-extract.md

Lines changed: 25 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
---
2-
title: "Quickstart: Train a model and extract form data using cURL - Form Recognizer"
2+
title: "Quickstart: Train a model and extract form data by using cURL - Form Recognizer"
33
titleSuffix: Azure Cognitive Services
4-
description: In this quickstart, you will use the Form Recognizer REST API with cURL to train a model and extract data from forms.
4+
description: In this quickstart, you'll use the Form Recognizer REST API with cURL to train a model and extract data from forms.
55
author: PatrickFarley
66
manager: nitinme
77

@@ -13,34 +13,34 @@ ms.author: pafarley
1313
#Customer intent: As a developer or data scientist familiar with cURL, I want to learn how to use Form Recognizer to extract my form data.
1414
---
1515

16-
# Quickstart: Train a Form Recognizer model and extract form data using REST API with cURL
16+
# Quickstart: Train a Form Recognizer model and extract form data by using the REST API with cURL
1717

18-
In this quickstart, you will use the Form Recognizer's REST API with cURL to train and score forms to extract key-value pairs and tables.
18+
In this quickstart, you'll use the Azure Form Recognizer REST API with cURL to train and score forms to extract key-value pairs and tables.
1919

2020
If you don't have an Azure subscription, create a [free account](https://azure.microsoft.com/free/?WT.mc_id=A261C142F) before you begin.
2121

2222
## Prerequisites
23-
24-
- You got access to the Form Recognizer limited-access preview. To get access to the preview, please fill out and submit the [Cognitive Services Form Recognizer access request](https://aka.ms/FormRecognizerRequestAccess) form.
25-
- You must have [cURL](https://curl.haxx.se/windows/).
26-
- You must have a subscription key for Form Recognizer. Follow the single-service subscription instructions in [Create a Cognitive Services account](https://docs.microsoft.com/azure/cognitive-services/cognitive-services-apis-create-account#single-service-subscription) to subscribe to Form Recognizer and get your key. Do not use the multi-service subscription, as this will not include the Form Recognizer service.
27-
- You must have a minimum set of five forms of the same type. You can use a [sample dataset](https://go.microsoft.com/fwlink/?linkid=2090451) for this quickstart.
23+
To complete this quickstart, you must have:
24+
- Access to the Form Recognizer limited-access preview. To get access to the preview, fill out and submit the [Form Recognizer access request](https://aka.ms/FormRecognizerRequestAccess) form.
25+
- [cURL](https://curl.haxx.se/windows/) installed.
26+
- A subscription key for Form Recognizer. Follow the single-service subscription instructions in [Create a Cognitive Services account](https://docs.microsoft.com/azure/cognitive-services/cognitive-services-apis-create-account#single-service-subscription) to subscribe to Form Recognizer and get your key. Don't use a multi-service subscription, because it won't include the Form Recognizer service.
27+
- A set of at least five forms of the same type. You can use a [sample dataset](https://go.microsoft.com/fwlink/?linkid=2090451) for this quickstart.
2828

2929
## Train a Form Recognizer model
3030

31-
First, you will need a set of training data. You can use data in an Azure Blob or your own local training data. You should have a minimum of five sample forms (PDF documents and/or images) of the same type/structure as your main input data. Alternatively, you can use a single empty form; the form's filename includes the word "empty."
31+
First, you'll need a set of training data. You can use data in an Azure blob or your own local training data. You should have a minimum of five sample forms (PDF documents and/or images) of the same type/structure as your main input data. Or you can use a single empty form. The form's file name needs to include the word "empty."
3232

33-
To train a Form Recognizer model using the documents in your Azure Blob container, call the **Train** API by executing the cURL command below. Before running the command, make the following changes:
33+
To train a Form Recognizer model by using the documents in your Azure blob container, call the **Train** API by running the cURL command that follows. Before you run the command, make these changes:
3434

35-
* Replace `<Endpoint>` with the endpoint you obtained from your Form Recognizer subscription key. You can find it in your Form Recognizer resource overview tab.
36-
* Replace `<SAS URL>` with an Azure Blob Storage container shared access signature (SAS) URL where the training data is located.
37-
* Replace `<subscription key>` with your subscription key.
35+
1. Replace `<Endpoint>` with the endpoint that you obtained from your Form Recognizer subscription key. You can find it on your Form Recognizer resource **Overview** tab.
36+
1. Replace `<SAS URL>` with an Azure Blob storage container shared access signature (SAS) URL of the location of the training data.
37+
1. Replace `<subscription key>` with your subscription key.
3838

3939
```bash
4040
curl -X POST "https://<Endpoint>/formrecognizer/v1.0-preview/custom/train" -H "Content-Type: application/json" -H "Ocp-Apim-Subscription-Key: <subscription key>" --data-ascii "{ \"source\": \""<SAS URL>"\"}"
4141
```
4242

43-
You will receive a `200 (Success)` response with the following JSON output:
43+
You'll receive a `200 (Success)` response with the following JSON output:
4444

4545
```json
4646
{
@@ -81,25 +81,26 @@ You will receive a `200 (Success)` response with the following JSON output:
8181
}
8282
```
8383

84-
Take note of the `"modelId"` value; you will need it for the following steps.
84+
Note the `"modelId"` value. You'll need it in the following steps.
8585

8686
## Extract key-value pairs and tables from forms
8787

88-
Next, you will analyze a document and extract key-value pairs and tables from it. Call the **Model - Analyze** API by executing the cURL command below. Before running the command, make the following changes:
88+
Next, you'll analyze a document and extract key-value pairs and tables from it. Call the **Model - Analyze** API by running the cURL command that follows. Before you run the command, make these changes:
89+
90+
1. Replace `<Endpoint>` with the endpoint that you obtained from your Form Recognizer subscription key. You can find it on your Form Recognizer resource **Overview** tab.
91+
1. Replace `<modelID>` with the model ID that you received in the previous section.
92+
1. Replace `<path to your form>` with the file path of your form.
93+
1. Replace `<file type>` with the file type. Supported types: pdf, image/jpeg, image/png.
94+
1. Replace `<subscription key>` with your subscription key.
8995

90-
* Replace `<Endpoint>` with the endpoint you obtained from your Form Recognizer subscription key. You can find it in your Form Recognizer resource **Overview** tab.
91-
* Replace `<modelID>` with the model ID you received in the previous step of training the model.
92-
* Replace `<path to your form>` with the file path to your form.
93-
* Replace `<subscription key>` with your subscription key.
94-
* Replace `<file type>` with the file type - supported types pdf, image/jpeg, image/png.
9596

9697
```bash
9798
curl -X POST "https://<Endpoint>/formrecognizer/v1.0-preview/custom/models/<modelID>/analyze" -H "Content-Type: multipart/form-data" -F "form=@\"<path to your form>\";type=application/<file type>" -H "Ocp-Apim-Subscription-Key: <subscription key>"
9899
```
99100

100101
### Examine the response
101102

102-
A successful response is returned in JSON and represents the extracted key-value pairs and tables from the form.
103+
A success response is returned in JSON. It represents the key-value pairs and tables extracted from the form:
103104

104105
```bash
105106
{
@@ -424,7 +425,7 @@ A successful response is returned in JSON and represents the extracted key-value
424425

425426
## Next steps
426427

427-
In this guide, you used the Form Recognizer REST APIs with cURL to train a model and run it in a sample case. Next, see the reference documentation to explore the Form Recognizer API in more depth.
428+
In this quickstart, you used the Form Recognizer REST API with cURL to train a model and run it in a sample scenario. Next, see the reference documentation to explore the Form Recognizer API in more depth.
428429

429430
> [!div class="nextstepaction"]
430431
> [REST API reference documentation](https://aka.ms/form-recognizer/api)

0 commit comments

Comments
 (0)