Skip to content

Commit 9a680ba

Browse files
authored
Merge pull request #206146 from laujan/edit-lu-pr-205935
edit lu pr #205935
2 parents 7df4ec5 + 43c5851 commit 9a680ba

File tree

10 files changed

+16
-71
lines changed

10 files changed

+16
-71
lines changed

articles/applied-ai-services/form-recognizer/concept-general-document.md

Lines changed: 1 addition & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -88,14 +88,7 @@ Keys can also exist in isolation when the model detects that a key exists, with
8888

8989
## Input requirements
9090

91-
* For best results, provide one clear photo or high-quality scan per document.
92-
* Supported file formats: JPEG/JPG, PNG, BMP, TIFF, and PDF (text-embedded or scanned). Text-embedded PDFs are best to eliminate the possibility of error in character extraction and location.
93-
* For PDF and TIFF, up to 2000 pages can be processed (with a free tier subscription, only the first two pages are processed).
94-
* The file size must be less than 500 MB for paid (S0) tier and 4 MB for free (F0) tier.
95-
* Image dimensions must be between 50 x 50 pixels and 10,000 x 10,000 pixels.
96-
* PDF dimensions are up to 17 x 17 inches, corresponding to Legal or A3 paper size, or smaller.
97-
* The total size of the training data is 500 pages or less.
98-
* If your PDFs are password-locked, you must remove the lock before submission.
91+
[!INCLUDE [input requirements](./includes/input-requirements.md)]
9992

10093
## Supported languages and locales
10194

articles/applied-ai-services/form-recognizer/concept-id-document.md

Lines changed: 1 addition & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -76,14 +76,7 @@ You'll need an ID document. You can use our [sample ID document](https://raw.git
7676
7777
## Input requirements
7878

79-
* For best results, provide one clear photo or high-quality scan per document.
80-
* Supported file formats: JPEG/JPG, PNG, BMP, TIFF, and PDF (text-embedded or scanned). Text-embedded PDFs are best to eliminate the possibility of error in character extraction and location.
81-
* For PDF and TIFF, up to 2000 pages can be processed (with a free tier subscription, only the first two pages are processed).
82-
* The file size must be less than 500 MB for paid (S0) tier and 4 MB for free (F0) tier.
83-
* Image dimensions must be between 50 x 50 pixels and 10,000 x 10,000 pixels.
84-
* PDF dimensions are up to 17 x 17 inches, corresponding to Legal or A3 paper size, or smaller.
85-
* The total size of the training data is 500 pages or less.
86-
* If your PDFs are password-locked, you must remove the lock before submission.
79+
[!INCLUDE [input requirements](./includes/input-requirements.md)]
8780

8881
> [!NOTE]
8982
> The [Sample Labeling tool](https://fott-2-1.azurewebsites.net/) does not support the BMP file format. This is a limitation of the tool not the Form Recognizer Service.

articles/applied-ai-services/form-recognizer/concept-invoice.md

Lines changed: 1 addition & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -75,14 +75,7 @@ You'll need an invoice document. You can use our [sample invoice document](https
7575
7676
## Input requirements
7777

78-
* For best results, provide one clear photo or high-quality scan per document.
79-
* Supported file formats: JPEG/JPG, PNG, BMP, TIFF, and PDF (text-embedded or scanned). Text-embedded PDFs are best to eliminate the possibility of error in character extraction and location.
80-
* For PDF and TIFF, up to 2000 pages can be processed (with a free tier subscription, only the first two pages are processed).
81-
* The file size must be less than 500 MB for paid (S0) tier and 4 MB for free (F0) tier.
82-
* Image dimensions must be between 50 x 50 pixels and 10,000 x 10,000 pixels.
83-
* PDF dimensions are up to 17 x 17 inches, corresponding to Legal or A3 paper size, or smaller.
84-
* The total size of the training data is 500 pages or less.
85-
* If your PDFs are password-locked, you must remove the lock before submission.
78+
[!INCLUDE [input requirements](./includes/input-requirements.md)]
8679

8780
> [!NOTE]
8881
> The [Sample Labeling tool](https://fott-2-1.azurewebsites.net/) does not support the BMP file format. This is a limitation of the tool not the Form Recognizer Service.

articles/applied-ai-services/form-recognizer/concept-layout.md

Lines changed: 2 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -34,7 +34,7 @@ The Form Recognizer Layout API extracts text, tables, selection marks, and struc
3434
| Layout ||||||
3535

3636
**Supported paragraph roles**:
37-
The paragraph roles are best used with unstructured documents. PAragraph roles help analyze the structure of the extracted content for better semantic search and analysis.
37+
The paragraph roles are best used with unstructured documents. Paragraph roles help analyze the structure of the extracted content for better semantic search and analysis.
3838

3939
* title
4040
* sectionHeading
@@ -89,12 +89,7 @@ Try extracting data from forms and documents using the Form Recognizer Studio. Y
8989
9090
## Input requirements
9191

92-
* For best results, provide one clear photo or high-quality scan per document.
93-
* Supported file formats: JPEG/JPG, PNG, BMP, TIFF, and PDF (text-embedded or scanned).
94-
* For PDF and TIFF, up to 2000 pages can be processed (with a free tier subscription, only the first two pages are processed).
95-
* The file size must be less than 500 MB for paid (S0) tier and 4 MB for free (F0) tier.
96-
* Image dimensions must be between 50 x 50 pixels and 10,000 x 10,000 pixels.
97-
* The minimum height of the text to be extracted is 12 pixels for a 1024 X 768 image. This dimension corresponds to about eight font point text at 150 DPI.
92+
[!INCLUDE [input requirements](./includes/input-requirements.md)]
9893

9994
## Supported languages and locales
10095

articles/applied-ai-services/form-recognizer/concept-model-overview.md

Lines changed: 1 addition & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -193,13 +193,7 @@ A composed model is created by taking a collection of custom models and assignin
193193

194194
## Input requirements
195195

196-
* For best results, provide one clear photo or high-quality scan per document.
197-
* Supported file formats: JPEG/JPG, PNG, BMP, TIFF, and PDF (text-embedded or scanned). Additionally, the Read API supports Microsoft Word (DOCX), Excel (XLS), PowerPoint (PPT), and HTML files.
198-
* For PDF and TIFF, up to 2000 pages can be processed (with a free tier subscription, only the first two pages are processed).
199-
* The file size must be less than 500 MB for paid (S0) tier and 4 MB for free (F0) tier.
200-
* Image dimensions must be between 50 x 50 pixels and 10,000 x 10,000 pixels.
201-
* The total size of the training data is 500 pages or less.
202-
* If your PDFs are password-locked, you must remove the lock before submission.
196+
[!INCLUDE [input requirements](./includes/input-requirements.md)]
203197

204198
> [!NOTE]
205199
> The [Sample Labeling tool](https://fott-2-1.azurewebsites.net/) does not support the BMP file format. This is a limitation of the tool not the Form Recognizer Service.

articles/applied-ai-services/form-recognizer/concept-read.md

Lines changed: 1 addition & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -69,11 +69,7 @@ Try extracting text from forms and documents using the Form Recognizer Studio. Y
6969
7070
## Input requirements
7171

72-
* Supported file formats: These include JPEG/JPG, PNG, BMP, TIFF, PDF (text-embedded or scanned). Additionally, the newest API version `2022-06-30-preview` supports Microsoft Word (DOCX), Excel (XLS), PowerPoint (PPT), and HTML files.
73-
* For PDF and TIFF, up to 2000 pages can be processed (with a free tier subscription, only the first two pages are processed).
74-
* The file size must be less than 500 MB for paid (S0) tier and 4 MB for free (F0) tier.
75-
* Image dimensions must be between 50 x 50 pixels and 10,000 x 10,000 pixels.
76-
* The minimum height of the text to be extracted is 12 pixels for a 1024X768 image. This dimension corresponds to about eight font point text at 150 DPI.
72+
[!INCLUDE [input requirements](./includes/input-requirements.md)]
7773

7874
## Supported languages and locales
7975

articles/applied-ai-services/form-recognizer/concept-receipt.md

Lines changed: 1 addition & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -77,14 +77,7 @@ You'll need a receipt document. You can use our [sample receipt document](https:
7777
7878
## Input requirements
7979

80-
* For best results, provide one clear photo or high-quality scan per document.
81-
* Supported file formats: JPEG/JPG, PNG, BMP, TIFF, and PDF (text-embedded or scanned). Text-embedded PDFs are best to eliminate the possibility of error in character extraction and location.
82-
* For PDF and TIFF, up to 2000 pages can be processed (with a free tier subscription, only the first two pages are processed).
83-
* The file size must be less than 500 MB for paid (S0) tier and 4 MB for free (F0) tier.
84-
* Image dimensions must be between 50 x 50 pixels and 10,000 x 10,000 pixels.
85-
* PDF dimensions are up to 17 x 17 inches, corresponding to Legal or A3 paper size, or smaller.
86-
* The total size of the training data is 500 pages or less.
87-
* If your PDFs are password-locked, you must remove the lock before submission.
80+
[!INCLUDE [input requirements](./includes/input-requirements.md)]
8881

8982
## Supported languages and locales v2.1
9083

articles/applied-ai-services/form-recognizer/concept-w2.md

Lines changed: 1 addition & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -58,14 +58,7 @@ Try extracting data from W-2 forms using the Form Recognizer Studio. You'll need
5858
5959
## Input requirements
6060

61-
* For best results, provide one clear photo or high-quality scan per document.
62-
* Supported file formats: JPEG/JPG, PNG, BMP, TIFF, and PDF (text-embedded or scanned). Text-embedded PDFs are best to eliminate the possibility of error in character extraction and location.
63-
* For PDF and TIFF, up to 2000 pages can be processed (with a free tier subscription, only the first two pages are processed).
64-
* The file size must be less than 500 MB for paid (S0) tier and 4 MB for free (F0) tier.
65-
* Image dimensions must be between 50 x 50 pixels and 10,000 x 10,000 pixels.
66-
* PDF dimensions are up to 17 x 17 inches, corresponding to Legal or A3 paper size, or smaller.
67-
* The total size of the training data is 500 pages or less.
68-
* If your PDFs are password-locked, you must remove the lock before submission.
61+
[!INCLUDE [input requirements](./includes/input-requirements.md)]
6962

7063
## Supported languages and locales
7164

articles/applied-ai-services/form-recognizer/faq.yml

Lines changed: 0 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -199,13 +199,6 @@ sections:
199199
Which file formats does Form Recognizer support? Are there size limitations for input documents?
200200
answer: |
201201
202-
- Form Recognizer extracts data from document images JPEG/JPG, PNG, BMP, TIFF, and PDF (text-embedded or scanned) formats and returns a structured output.
203-
- For PDF and TIFF, up to 2000 pages can be processed (with a free tier subscription, only the first two pages are processed).
204-
- Your file size must be less than 500 MB for paid (S0) tier and 4 MB for free (F0) tier.
205-
- Image dimensions must be between 50 x 50 pixels and 10000 x 10,000 pixels.
206-
- PDF dimensions can be a maximum of 17 x 17 inches (corresponding to Legal or A3 paper size) or smaller.
207-
- The total allowable size of training data is 500 pages or less.
208-
209202
To ensure the best results, see [input requirements](concept-model-overview.md#input-requirements).
210203
211204
- question: |

articles/applied-ai-services/form-recognizer/includes/input-requirements.md

Lines changed: 7 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -3,17 +3,19 @@ author: laujan
33
ms.service: applied-ai-services
44
ms.subservice: forms-recognizer
55
ms.topic: include
6-
ms.date: 04/14/2022
6+
ms.date: 07/27/2022
77
ms.author: lajanuar
88
ms.custom: ignite-fall-2021
99
---
1010
<!-- markdownlint-disable MD041 -->
1111

1212
* For best results, provide one clear photo or high-quality scan per document.
13-
* Supported file formats: JPEG/JPG, PNG, BMP, TIFF, and PDF (text-embedded or scanned). Text-embedded PDFs are best to eliminate the possibility of error in character extraction and location.
13+
* Supported file formats: JPEG/JPG, PNG, BMP, TIFF, and PDF (text-embedded or scanned). Text-embedded PDFs are best to eliminate the possibility of error in character extraction and location. Additionally, the newest API version `2022-06-30-preview` supports Microsoft Word (DOCX), Excel (XLS), PowerPoint (PPT), and HTML files in Read model.
1414
* For PDF and TIFF, up to 2000 pages can be processed (with a free tier subscription, only the first two pages are processed).
15-
* The file size must be less than 500 MB for paid (S0) tier and 4 MB for free (F0) tier.
16-
* Image dimensions must be between 50 x 50 pixels and 10,000 x 10,000 pixels.
15+
* The file size for analyzing documents must be _less than_ 500 MB for paid (S0) tier and 4 MB for free (F0) tier.
16+
* Image dimensions must be between 50 x 50 pixels and 10,000 px x 10,000 pixels.
1717
* PDF dimensions are up to 17 x 17 inches, corresponding to Legal or A3 paper size, or smaller.
18-
* The total size of the training data is 500 pages or less.
1918
* If your PDFs are password-locked, you must remove the lock before submission.
19+
* The minimum height of the text to be extracted is 12 pixels for a 1024 x 768 pixel image. This dimension corresponds to about 8-point text at 150 dots per inch (DPI).
20+
* For custom model training, the maximum number of pages for training data is 500 for the custom template model and 50,000 for the custom neural model.
21+
* For custom model training, the total size of training data is 50 MB for template model and 1G-MB for the neural model.

0 commit comments

Comments
 (0)