Skip to content

Commit a7b8aed

Browse files
authored
Merge pull request #205405 from laujan/remove-entities
remove entities
2 parents a1155f6 + 8d80b8d commit a7b8aed

File tree

6 files changed

+8
-44
lines changed

6 files changed

+8
-44
lines changed

articles/applied-ai-services/form-recognizer/concept-general-document.md

Lines changed: 6 additions & 38 deletions
Original file line numberDiff line numberDiff line change
@@ -7,16 +7,15 @@ manager: nitinme
77
ms.service: applied-ai-services
88
ms.subservice: forms-recognizer
99
ms.topic: conceptual
10-
ms.date: 06/06/2022
10+
ms.date: 07/20/2022
1111
ms.author: lajanuar
1212
recommendations: false
1313
---
1414
<!-- markdownlint-disable MD033 -->
1515

1616
# Form Recognizer general document model (preview)
1717

18-
The General document preview model combines powerful Optical Character Recognition (OCR) capabilities with deep learning models to extract key-value pairs, selection marks, and entities from documents. General document is only available with the preview (v3.0) API. For more information on using the preview (v3.0) API, see our [migration guide](v3-migration-guide.md).
19-
18+
The General document preview model combines powerful Optical Character Recognition (OCR) capabilities with deep learning models to extract key-value pairs, tables, and selection marks from documents. General document is only available with the preview (v3.0) API. For more information on using the preview (v3.0) API, see our [migration guide](v3-migration-guide.md).
2019

2120
The general document API supports most form types and will analyze your documents and extract keys and associated values. It's ideal for extracting common key-value pairs from documents. You can use the general document model as an alternative to training a custom model without labels.
2221

@@ -27,7 +26,7 @@ The general document API supports most form types and will analyze your document
2726

2827
* The general document model is a pre-trained model; it doesn't require labels or training.
2928

30-
* A single API extracts key-value pairs, selection marks, entities, text, tables, and structure from documents.
29+
* A single API extracts key-value pairs, selection marks, text, tables, and structure from documents.
3130

3231
* The general document model supports structured, semi-structured, and unstructured documents.
3332

@@ -81,21 +80,11 @@ Key-value pairs are specific spans within the document that identify a label or
8180

8281
Keys can also exist in isolation when the model detects that a key exists, with no associated value or when processing optional fields. For example, a middle name field may be left blank on a form in some instances. Key-value pairs are spans of text contained in the document. If you have documents where the same value is described in different ways, for example, customer and user, the associated key will be either customer or user based on context.
8382

84-
## Entities
85-
86-
Natural language processing models can identify parts of speech and classify each token or word. The named entity recognition model is able to identify entities like people, locations, and dates to provide for a richer experience. Identifying entities enables you to distinguish between customer types, for example, an individual or an organization.
87-
88-
The key-value pair extraction model and entity identification model are run in parallel on the entire document—not just on the values of the extracted key-value pairs. This process ensures that complex structures where a key can't be identified are still enriched by identifying the entities referenced. You can still match keys or values to entities based on the offsets of the identified spans.
89-
90-
* The general document is a pre-trained model and can be directly invoked via the REST API.
91-
92-
* The general document model supports named entity recognition (NER) for several entity categories. NER is the ability to identify different entities in text and categorize them into pre-defined classes or types such as: person, location, event, product, and organization. Extracting entities can be useful in scenarios where you want to validate extracted values. The entities are extracted from the entire content and not just the extracted values.
93-
9483
## Data extraction
9584

96-
| **Model** | **Text extraction** |**Key-Value pairs** |**Selection Marks** | **Tables** |**Entities** |
97-
| --- | :---: |:---:| :---: | :---: |:---: |
98-
|General document ||||||
85+
| **Model** | **Text extraction** |**Key-Value pairs** |**Selection Marks** | **Tables** |
86+
| --- | :---: |:---:| :---: | :---: |
87+
|General document |||||
9988

10089
## Input requirements
10190

@@ -114,29 +103,8 @@ The key-value pair extraction model and entity identification model are run in p
114103
|--------|:----------------------|:---------|
115104
|General document| <ul><li>English (United States)—en-US</li></ul>| English (United States)—en-US|
116105

117-
### Named entity recognition (NER) categories
118-
119-
| Category | Type | Description |
120-
|-----------|-------|--------------------|
121-
| Person | String | A person's partial or full name. |
122-
| PersonType | String | A person's job type or role. |
123-
| Location | String | Natural and human-made landmarks, structures, geographical features, and geopolitical entities. |
124-
| Organization | String | Companies, political groups, musical bands, sport clubs, government bodies, and public organizations. |
125-
| Event | String | Historical, social, and naturally occurring events. |
126-
| Product | String |Physical objects of various categories. |
127-
| Skill | String | A capability, skill, or expertise. |
128-
| Address | String | Full mailing addresses. |
129-
| Phone number | String| Phone numbers. |
130-
| Email | String | Email address. |
131-
| URL | String | Website URLs and links. |
132-
| IP Address | String | Network IP addresses. |
133-
| DateTime | String | Dates and times of day. |
134-
| Quantity | String | Numerical measurements and units. |
135-
136106
## Considerations
137107

138-
* Extracting entities can be useful in scenarios where you want to validate extracted values. The entities are extracted on the entire contents of the documents and not just the extracted values.
139-
140108
* Keys are spans of text extracted from the document, for semi structured documents, keys may need to be mapped to an existing dictionary of keys.
141109

142110
* Expect to see key-value pairs with a key, but no value. For example if a user chose to not provide an email address on the form.

articles/applied-ai-services/form-recognizer/quickstarts/try-v3-csharp-sdk.md

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -168,7 +168,6 @@ Analyze and extract text, tables, structure, key-value pairs, and named entities
168168
> * For this example, you'll need a **form document file from a URI**. You can use our [sample form document](https://raw.githubusercontent.com/Azure-Samples/cognitive-services-REST-api-samples/master/curl/form-recognizer/sample-layout.pdf) for this quickstart.
169169
> * To analyze a given file at a URI, you'll use the `StartAnalyzeDocumentFromUri` method and pass `prebuilt-document` as the model ID. The returned value is an `AnalyzeResult` object containing data about the submitted document.
170170
> * We've added the file URI value to the `Uri fileUri` variable at the top of the script.
171-
> * For simplicity, all the entity fields that the service returns are not shown here. To see the list of all supported fields and corresponding types, see the [General document](../concept-general-document.md#named-entity-recognition-ner-categories) concept page.
172171
173172
**Add the following code sample to the Program.cs file. Make sure you update the key and endpoint variables with values from your Azure portal Form Recognizer instance:**
174173

articles/applied-ai-services/form-recognizer/quickstarts/try-v3-java-sdk.md

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -147,7 +147,6 @@ Extract text, tables, structure, key-value pairs, and named entities from docume
147147
> * For this example, you'll need a **form document file at a URI**. You can use our [sample form document](https://raw.githubusercontent.com/Azure-Samples/cognitive-services-REST-api-samples/master/curl/form-recognizer/sample-layout.pdf) for this quickstart.
148148
> * To analyze a given file at a URI, you'll use the `beginAnalyzeDocumentFromUrl` method and pass `prebuilt-document` as the model Id. The returned value is an `AnalyzeResult` object containing data about the submitted document.
149149
> * We've added the file URI value to the `documentUrl` variable in the main method.
150-
> * For simplicity, all the entity fields that the service returns are not shown here. To see the list of all supported fields and corresponding types, see our [General document](../concept-general-document.md#named-entity-recognition-ner-categories) concept page.
151150

152151
**Add the following code sample to the `FormRecognizer.java` file. Make sure you update the key and endpoint variables with values from your Azure portal Form Recognizer instance:**
153152

articles/applied-ai-services/form-recognizer/quickstarts/try-v3-javascript-sdk.md

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -112,7 +112,6 @@ Extract text, tables, structure, key-value pairs, and named entities from docume
112112
> * For this example, you'll need a **form document file from a URL**. You can use our [sample form document](https://raw.githubusercontent.com/Azure-Samples/cognitive-services-REST-api-samples/master/curl/form-recognizer/sample-layout.pdf) for this quickstart.
113113
> * To analyze a given file from a URL, you'll use the `beginAnalyzeDocuments` method and pass in `prebuilt-document` as the model Id.
114114
> * We've added the file URL value to the `formUrl` variable near the top of the file.
115-
> * To see the list of all supported fields and corresponding types, see our [General document](../concept-general-document.md#named-entity-recognition-ner-categories) concept page.
116115

117116
**Add the following code sample to the `index.js` file. Make sure you update the key and endpoint variables with values from your Azure portal Form Recognizer instance:**
118117

articles/applied-ai-services/form-recognizer/quickstarts/try-v3-python-sdk.md

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -88,7 +88,6 @@ Extract text, tables, structure, key-value pairs, and named entities from docume
8888
> * For this example, you'll need a **form document file from a URL**. You can use our [sample form document](https://raw.githubusercontent.com/Azure-Samples/cognitive-services-REST-api-samples/master/curl/form-recognizer/sample-layout.pdf) for this quickstart.
8989
> * To analyze a given file at a URL, you'll use the `begin_analyze_document_from_url` method and pass in `prebuilt-document` as the model Id. The returned value is a `result` object containing data about the submitted document.
9090
> * We've added the file URL value to the `docUrl` variable in the `analyze_general_documents` function.
91-
> * For simplicity, all the entity fields that the service returns are not shown here. To see the list of all supported fields and corresponding types, see our [General document](../concept-general-document.md#named-entity-recognition-ner-categories) concept page.
9291
9392
<!-- markdownlint-disable MD036 -->
9493
**Add the following code sample to your form_recognizer_quickstart.py application. Make sure you update the key and endpoint variables with values from your Azure portal Form Recognizer instance:**

articles/applied-ai-services/form-recognizer/v3-migration-guide.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ manager: nitinme
77
ms.service: applied-ai-services
88
ms.subservice: forms-recognizer
99
ms.topic: how-to
10-
ms.date: 06/06/2022
10+
ms.date: 07/20/2022
1111
ms.author: lajanuar
1212
recommendations: false
1313
---
@@ -21,7 +21,7 @@ recommendations: false
2121
Form Recognizer v3.0 (preview) introduces several new features and capabilities:
2222

2323
* [Form Recognizer REST API](quickstarts/try-v3-rest-api.md) has been redesigned for better usability.
24-
* [**General document (v3.0)**](concept-general-document.md) model is a new API that extracts text, tables, structure, key-value pairs, and named entities from forms and documents.
24+
* [**General document (v3.0)**](concept-general-document.md) model is a new API that extracts text, tables, structure, and key-value pairs, from forms and documents.
2525
* [**Custom document model (v3.0)**](concept-custom-neural.md) is a new custom model type to extract fields from structured and unstructured documents.
2626
* [**Receipt (v3.0)**](concept-receipt.md) model supports single-page hotel receipt processing.
2727
* [**ID document (v3.0)**](concept-id-document.md) model supports endorsements, restrictions, and vehicle classification extraction from US driver's licenses.

0 commit comments

Comments
 (0)