You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/applied-ai-services/form-recognizer/label-tool.md
+19-25Lines changed: 19 additions & 25 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,13 +1,13 @@
1
1
---
2
2
title: "How-to: Analyze documents, Label forms, train a model, and analyze forms with Form Recognizer"
3
3
titleSuffix: Azure Applied AI Services
4
-
description: In this how-to, you'll use the Form Recognizer sample tool to analyze documents, invoices, receipts etc. Label and create a custom model to extract text, tables, selection marks, structure and key-value pairs from documents.
4
+
description: How to use the Form Recognizer sample tool to analyze documents, invoices, receipts etc. Label and create a custom model to extract text, tables, selection marks, structure and key-value pairs from documents.
# Train a custom model using the Sample Labeling tool
20
20
21
-
>[!TIP]
22
-
> * For an enhanced experience and advanced model quality, try the [Form Recognizer v3.0 Studio (preview)](https://formrecognizer.appliedai.azure.com/studio).
23
-
> * The v3.0 Studio supports any model trained with v2.1 labeled data.
24
-
> * You can refer to the API migration guide for detailed information about migrating from v2.1 to v3.0.
25
-
> **See* our [**REST API**](quickstarts/try-v3-rest-api.md) or [**C#**](quickstarts/try-v3-csharp-sdk.md), [**Java**](quickstarts/try-v3-java-sdk.md), [**JavaScript**](quickstarts/try-v3-javascript-sdk.md), or [Python](quickstarts/try-v3-python-sdk) SDK quickstarts to get started with the V3.0 preview.
26
-
27
-
In this article, you'll use the Form Recognizer REST API with the Sample Labeling tool to train a custom model with manually labeled data.
21
+
In this article, you'll use the Form Recognizer REST API with the Sample Labeling tool to train a custom model with manually labeled data.
You'll need the following resources to complete this project:
34
28
35
29
* Azure subscription - [Create one for free](https://azure.microsoft.com/free/cognitive-services)
36
30
* Once you have your Azure subscription, <ahref="https://portal.azure.com/#create/Microsoft.CognitiveServicesFormRecognizer"title="Create a Form Recognizer resource"target="_blank">create a Form Recognizer resource </a> in the Azure portal to get your key and endpoint. After it deploys, select **Go to resource**.
37
-
* You will need the key and endpoint from the resource you create to connect your application to the Form Recognizer API. You'll paste your key and endpoint into the code below later in the quickstart.
31
+
* You'll need the key and endpoint from the resource you create to connect your application to the Form Recognizer API. You'll paste your key and endpoint into the code below later in the quickstart.
38
32
* You can use the free pricing tier (`F0`) to try the service, and upgrade later to a paid tier for production.
39
33
* A set of at least six forms of the same type. You'll use this data to train the model and test a form. You can use a [sample data set](https://go.microsoft.com/fwlink/?linkid=2090451) (download and extract *sample_data.zip*) for this quickstart. Upload the training files to the root of a blob storage container in a standard-performance-tier Azure Storage account.
40
34
@@ -49,7 +43,7 @@ Try out the [**Form Recognizer Sample Labeling tool**](https://fott-2-1.azureweb
You will need an Azure subscription ([create one for free](https://azure.microsoft.com/free/cognitive-services)) and a [Form Recognizer resource](https://portal.azure.com/#create/Microsoft.CognitiveServicesFormRecognizer) endpoint and key to try out the Form Recognizer service.
46
+
You'll need an Azure subscription ([create one for free](https://azure.microsoft.com/free/cognitive-services)) and a [Form Recognizer resource](https://portal.azure.com/#create/Microsoft.CognitiveServicesFormRecognizer) endpoint and key to try out the Form Recognizer service.
53
47
54
48
## Set up the Sample Labeling tool
55
49
@@ -150,15 +144,15 @@ When you create or open a project, the main tag editor window opens. The tag edi
150
144
* The main editor pane that allows you to apply tags.
151
145
* The tags editor pane that allows users to modify, lock, reorder, and delete tags.
152
146
153
-
### Identify text and tables
147
+
### Identify text and tables
154
148
155
149
Select **Run Layout on unvisited documents** on the left pane to get the text and table layout information for each document. The labeling tool will draw bounding boxes around each text element.
156
150
157
-
The labeling tool will also show which tables have been automatically extracted. Select the table/grid icon on the left hand of the document to see the extracted table. In this quickstart, because the table content is automatically extracted, we will not be labeling the table content, but rather rely on the automated extraction.
151
+
The labeling tool will also show which tables have been automatically extracted. Select the table/grid icon on the left hand of the document to see the extracted table. In this quickstart, because the table content is automatically extracted, we won't be labeling the table content, but rather rely on the automated extraction.
158
152
159
153
:::image type="content" source="media/label-tool/table-extraction.png" alt-text="Table visualization in Sample Labeling tool.":::
160
154
161
-
In v2.1, if your training document does not have a value filled in, you can draw a box where the value should be. Use **Draw region** on the upper left corner of the window to make the region taggable.
155
+
In v2.1, if your training document doesn't have a value filled in, you can draw a box where the value should be. Use **Draw region** on the upper left corner of the window to make the region taggable.
162
156
163
157
### Apply labels to text
164
158
@@ -201,16 +195,16 @@ The following value types and variations are currently supported:
201
195
202
196
* `number`
203
197
* default, `currency`
204
-
* Formatted as a Floating point value.
205
-
* Example:1234.98 on the document will be formatted into 1234.98 on the output
198
+
* Formatted as a Floating point value.
199
+
* Example:1234.98 on the document will be formatted into 1234.98 on the output
206
200
207
201
* `date`
208
202
* default, `dmy`, `mdy`, `ymd`
209
203
210
204
* `time`
211
205
* `integer`
212
-
* Formatted as a Integer value.
213
-
* Example:1234.98 on the document will be formatted into 123498 on the output
206
+
* Formatted as an integer value.
207
+
* Example:1234.98 on the document will be formatted into 123498 on the output.
214
208
* `selectionMark`
215
209
216
210
> [!NOTE]
@@ -242,11 +236,11 @@ The following value types and variations are currently supported:
242
236
243
237
### Label tables (v2.1 only)
244
238
245
-
At times, your data might lend itself better to being labeled as a table rather than key-value pairs. In this case, you can create a table tag by clicking on "Add a new table tag," specify whether the table will have a fixed number of rows or variable number of rows depending on the document, and define the schema.
239
+
At times, your data might lend itself better to being labeled as a table rather than key-value pairs. In this case, you can create a table tag by selecting **Add a new table tag**. Specify whether the table will have a fixed number of rows or variable number of rows depending on the document and define the schema.
246
240
247
241
:::image type="content" source="media/label-tool/table-tag.png" alt-text="Configuring a table tag.":::
248
242
249
-
Once you have defined your table tag, tag the cell values.
243
+
Once you've defined your table tag, tag the cell values.
250
244
251
245
:::image type="content" source="media/table-labeling.png" alt-text="Labeling a table.":::
252
246
@@ -255,7 +249,7 @@ Once you have defined your table tag, tag the cell values.
255
249
Choose the Train icon on the left pane to open the Training page. Then select the **Train** button to begin training the model. Once the training process completes, you'll see the following information:
256
250
257
251
* **Model ID** - The ID of the model that was created and trained. Each training call creates a new model with its own ID. Copy this string to a secure location; you'll need it if you want to do prediction calls through the [REST API](./quickstarts/try-sdk-rest-api.md?pivots=programming-language-rest-api&tabs=preview%2cv2-1) or [client library guide](./quickstarts/try-sdk-rest-api.md).
258
-
* **Average Accuracy** - The model's average accuracy. You can improve model accuracy by labeling additional forms and retraining to create a new model. We recommend starting by labeling five forms and adding more forms as needed.
252
+
* **Average Accuracy** - The model's average accuracy. You can improve model accuracy by adding and labeling more forms, then retraining to create a new model. We recommend starting by labeling five forms and adding more forms as needed.
259
253
* The list of tags, and the estimated accuracy per tag.
260
254
261
255
@@ -286,7 +280,7 @@ Select the Analyze (light bulb) icon on the left to test your model. Select sour
286
280
287
281
## Improve results
288
282
289
-
Depending on the reported accuracy, you may want to do further training to improve the model. After you've done a prediction, examine the confidence values for each of the applied tags. If the average accuracy training value was high, but the confidence scores are low (or the results are inaccurate), you should add the prediction file to the training set, label it, and train again.
283
+
Depending on the reported accuracy, you may want to do further training to improve the model. After you've done a prediction, examine the confidence values for each of the applied tags. If the average accuracy training value is high, but the confidence scores are low (or the results are inaccurate), add the prediction file to the training set, label it, and train again.
290
284
291
285
The reported average accuracy, confidence scores, and actual accuracy can be inconsistent when the analyzed documents differ from documents used in training. Keep in mind that some documents look similar when viewed by people but can look distinct to the AI model. For example, you might train with a form type that has two variations, where the training set consists of 20% variation A and 80% variation B. During prediction, the confidence scores for documents of variation A are likely to be lower.
292
286
@@ -300,11 +294,11 @@ Go to your project settings page (slider icon) and take note of the security tok
300
294
301
295
### Restore project credentials
302
296
303
-
When you want to resume your project, you first need to create a connection to the same blob storage container. To do so, repeat the steps above. Then, go to the application settings page (gear icon) and see if your project's security token is there. If it isn't, add a new security token and copy over your token name and key from the previous step. Select **Save** to retain your settings..
297
+
When you want to resume your project, you first need to create a connection to the same blob storage container. To do so, repeat the steps above. Then, go to the application settings page (gear icon) and see if your project's security token is there. If it isn't, add a new security token and copy over your token name and key from the previous step. Select **Save** to retain your settings.
304
298
305
299
### Resume a project
306
300
307
-
Finally, go to the main page (house icon) and select **Open Cloud Project**. Then select the blob storage connection, and select your project's **.fott** file. The application will load all of the project's settings because it has the security token.
301
+
Finally, go to the main page (house icon) and select **Open Cloud Project**. Then select the blob storage connection, and select your project's `.fott` file. The application will load all of the project's settings because it has the security token.
0 commit comments