You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/applied-ai-services/form-recognizer/concept-general-document.md
+6-38Lines changed: 6 additions & 38 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -7,16 +7,15 @@ manager: nitinme
7
7
ms.service: applied-ai-services
8
8
ms.subservice: forms-recognizer
9
9
ms.topic: conceptual
10
-
ms.date: 06/06/2022
10
+
ms.date: 07/20/2022
11
11
ms.author: lajanuar
12
12
recommendations: false
13
13
---
14
14
<!-- markdownlint-disable MD033 -->
15
15
16
16
# Form Recognizer general document model (preview)
17
17
18
-
The General document preview model combines powerful Optical Character Recognition (OCR) capabilities with deep learning models to extract key-value pairs, selection marks, and entities from documents. General document is only available with the preview (v3.0) API. For more information on using the preview (v3.0) API, see our [migration guide](v3-migration-guide.md).
19
-
18
+
The General document preview model combines powerful Optical Character Recognition (OCR) capabilities with deep learning models to extract key-value pairs, tables, and selection marks from documents. General document is only available with the preview (v3.0) API. For more information on using the preview (v3.0) API, see our [migration guide](v3-migration-guide.md).
20
19
21
20
The general document API supports most form types and will analyze your documents and extract keys and associated values. It's ideal for extracting common key-value pairs from documents. You can use the general document model as an alternative to training a custom model without labels.
22
21
@@ -27,7 +26,7 @@ The general document API supports most form types and will analyze your document
27
26
28
27
* The general document model is a pre-trained model; it doesn't require labels or training.
29
28
30
-
* A single API extracts key-value pairs, selection marks, entities, text, tables, and structure from documents.
29
+
* A single API extracts key-value pairs, selection marks, text, tables, and structure from documents.
31
30
32
31
* The general document model supports structured, semi-structured, and unstructured documents.
33
32
@@ -81,21 +80,11 @@ Key-value pairs are specific spans within the document that identify a label or
81
80
82
81
Keys can also exist in isolation when the model detects that a key exists, with no associated value or when processing optional fields. For example, a middle name field may be left blank on a form in some instances. Key-value pairs are spans of text contained in the document. If you have documents where the same value is described in different ways, for example, customer and user, the associated key will be either customer or user based on context.
83
82
84
-
## Entities
85
-
86
-
Natural language processing models can identify parts of speech and classify each token or word. The named entity recognition model is able to identify entities like people, locations, and dates to provide for a richer experience. Identifying entities enables you to distinguish between customer types, for example, an individual or an organization.
87
-
88
-
The key-value pair extraction model and entity identification model are run in parallel on the entire document—not just on the values of the extracted key-value pairs. This process ensures that complex structures where a key can't be identified are still enriched by identifying the entities referenced. You can still match keys or values to entities based on the offsets of the identified spans.
89
-
90
-
* The general document is a pre-trained model and can be directly invoked via the REST API.
91
-
92
-
* The general document model supports named entity recognition (NER) for several entity categories. NER is the ability to identify different entities in text and categorize them into pre-defined classes or types such as: person, location, event, product, and organization. Extracting entities can be useful in scenarios where you want to validate extracted values. The entities are extracted from the entire content and not just the extracted values.
| Product | String |Physical objects of various categories. |
127
-
| Skill | String | A capability, skill, or expertise. |
128
-
| Address | String | Full mailing addresses. |
129
-
| Phone number | String| Phone numbers. |
130
-
| Email | String | Email address. |
131
-
| URL | String | Website URLs and links. |
132
-
| IP Address | String | Network IP addresses. |
133
-
| DateTime | String | Dates and times of day. |
134
-
| Quantity | String | Numerical measurements and units. |
135
-
136
106
## Considerations
137
107
138
-
* Extracting entities can be useful in scenarios where you want to validate extracted values. The entities are extracted on the entire contents of the documents and not just the extracted values.
139
-
140
108
* Keys are spans of text extracted from the document, for semi structured documents, keys may need to be mapped to an existing dictionary of keys.
141
109
142
110
* Expect to see key-value pairs with a key, but no value. For example if a user chose to not provide an email address on the form.
Copy file name to clipboardExpand all lines: articles/applied-ai-services/form-recognizer/quickstarts/try-v3-csharp-sdk.md
-1Lines changed: 0 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -168,7 +168,6 @@ Analyze and extract text, tables, structure, key-value pairs, and named entities
168
168
> * For this example, you'll need a **form document file from a URI**. You can use our [sample form document](https://raw.githubusercontent.com/Azure-Samples/cognitive-services-REST-api-samples/master/curl/form-recognizer/sample-layout.pdf) for this quickstart.
169
169
> * To analyze a given file at a URI, you'll use the `StartAnalyzeDocumentFromUri` method and pass `prebuilt-document` as the model ID. The returned value is an `AnalyzeResult` object containing data about the submitted document.
170
170
> * We've added the file URI value to the `Uri fileUri` variable at the top of the script.
171
-
> * For simplicity, all the entity fields that the service returns are not shown here. To see the list of all supported fields and corresponding types, see the [General document](../concept-general-document.md#named-entity-recognition-ner-categories) concept page.
172
171
173
172
**Add the following code sample to the Program.cs file. Make sure you update the key and endpoint variables with values from your Azure portal Form Recognizer instance:**
Copy file name to clipboardExpand all lines: articles/applied-ai-services/form-recognizer/quickstarts/try-v3-java-sdk.md
-1Lines changed: 0 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -147,7 +147,6 @@ Extract text, tables, structure, key-value pairs, and named entities from docume
147
147
> * For this example, you'll need a **form document file at a URI**. You can use our [sample form document](https://raw.githubusercontent.com/Azure-Samples/cognitive-services-REST-api-samples/master/curl/form-recognizer/sample-layout.pdf) for this quickstart.
148
148
> * To analyze a given file at a URI, you'll use the `beginAnalyzeDocumentFromUrl` method and pass `prebuilt-document` as the model Id. The returned value is an `AnalyzeResult` object containing data about the submitted document.
149
149
> * We've added the file URI value to the `documentUrl` variable in the main method.
150
-
> * For simplicity, all the entity fields that the service returns are not shown here. To see the list of all supported fields and corresponding types, see our [General document](../concept-general-document.md#named-entity-recognition-ner-categories) concept page.
151
150
152
151
**Add the following code sample to the `FormRecognizer.java` file. Make sure you update the key and endpoint variables with values from your Azure portal Form Recognizer instance:**
Copy file name to clipboardExpand all lines: articles/applied-ai-services/form-recognizer/quickstarts/try-v3-javascript-sdk.md
-1Lines changed: 0 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -112,7 +112,6 @@ Extract text, tables, structure, key-value pairs, and named entities from docume
112
112
> * For this example, you'll need a **form document file from a URL**. You can use our [sample form document](https://raw.githubusercontent.com/Azure-Samples/cognitive-services-REST-api-samples/master/curl/form-recognizer/sample-layout.pdf) for this quickstart.
113
113
> * To analyze a given file from a URL, you'll use the `beginAnalyzeDocuments` method and pass in `prebuilt-document` as the model Id.
114
114
> * We've added the file URL value to the `formUrl` variable near the top of the file.
115
-
> * To see the list of all supported fields and corresponding types, see our [General document](../concept-general-document.md#named-entity-recognition-ner-categories) concept page.
116
115
117
116
**Add the following code sample to the `index.js` file. Make sure you update the key and endpoint variables with values from your Azure portal Form Recognizer instance:**
Copy file name to clipboardExpand all lines: articles/applied-ai-services/form-recognizer/quickstarts/try-v3-python-sdk.md
-1Lines changed: 0 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -88,7 +88,6 @@ Extract text, tables, structure, key-value pairs, and named entities from docume
88
88
> * For this example, you'll need a **form document file from a URL**. You can use our [sample form document](https://raw.githubusercontent.com/Azure-Samples/cognitive-services-REST-api-samples/master/curl/form-recognizer/sample-layout.pdf) for this quickstart.
89
89
> * To analyze a given file at a URL, you'll use the `begin_analyze_document_from_url` method and pass in `prebuilt-document` as the model Id. The returned value is a `result` object containing data about the submitted document.
90
90
> * We've added the file URL value to the `docUrl` variable in the `analyze_general_documents` function.
91
-
> * For simplicity, all the entity fields that the service returns are not shown here. To see the list of all supported fields and corresponding types, see our [General document](../concept-general-document.md#named-entity-recognition-ner-categories) concept page.
92
91
93
92
<!-- markdownlint-disable MD036 -->
94
93
**Add the following code sample to your form_recognizer_quickstart.py application. Make sure you update the key and endpoint variables with values from your Azure portal Form Recognizer instance:**
Copy file name to clipboardExpand all lines: articles/applied-ai-services/form-recognizer/v3-migration-guide.md
+2-2Lines changed: 2 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -7,7 +7,7 @@ manager: nitinme
7
7
ms.service: applied-ai-services
8
8
ms.subservice: forms-recognizer
9
9
ms.topic: how-to
10
-
ms.date: 06/06/2022
10
+
ms.date: 07/20/2022
11
11
ms.author: lajanuar
12
12
recommendations: false
13
13
---
@@ -21,7 +21,7 @@ recommendations: false
21
21
Form Recognizer v3.0 (preview) introduces several new features and capabilities:
22
22
23
23
*[Form Recognizer REST API](quickstarts/try-v3-rest-api.md) has been redesigned for better usability.
24
-
*[**General document (v3.0)**](concept-general-document.md) model is a new API that extracts text, tables, structure, key-value pairs, and named entities from forms and documents.
24
+
*[**General document (v3.0)**](concept-general-document.md) model is a new API that extracts text, tables, structure, and key-value pairs, from forms and documents.
25
25
*[**Custom document model (v3.0)**](concept-custom-neural.md) is a new custom model type to extract fields from structured and unstructured documents.
26
26
*[**Receipt (v3.0)**](concept-receipt.md) model supports single-page hotel receipt processing.
27
27
*[**ID document (v3.0)**](concept-id-document.md) model supports endorsements, restrictions, and vehicle classification extraction from US driver's licenses.
0 commit comments