You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/cognitive-services/form-recognizer/build-training-data-set.md
+11-6Lines changed: 11 additions & 6 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -15,9 +15,11 @@ ms.author: pafarley
15
15
16
16
# Build a training data set for a custom model
17
17
18
-
When you use the Form Recognizer custom model, you provide your own training data so the model can train to your industry-specific forms. You can train a model with five filled-in forms or an empty form (you must include the word "empty" in the file name) plus two filled-in forms. Even if you have enough filled-in forms to train with, adding an empty form to your training data set can improve the accuracy of the model.
18
+
When you use the Form Recognizer custom model, you provide your own training data so the model can train to your industry-specific forms.
19
19
20
-
If you want to use manually labeled training data, you should start with at least five forms of the same type. You can still use unlabeled forms and an empty form in the same data set.
20
+
If you're training without manual labels, you can use five filled-in forms, or an empty form (you must include the word "empty" in the file name) plus two filled-in forms. Even if you have enough filled-in forms, adding an empty form to your training data set can improve the accuracy of the model.
21
+
22
+
If you want to use manually labeled training data, you must start with at least five filled-in forms of the same type. You can still use unlabeled forms and an empty form in addition to the required data set.
21
23
22
24
## Training data tips
23
25
@@ -39,9 +41,11 @@ Make sure your training data set also follows the input requirements for all For
39
41
40
42
When you've put together the set of form documents that you'll use for training, you need to upload it to an Azure blob storage container. If you don't know how to create an Azure storage account with a container, following the [Azure Storage quickstart for Azure portal](https://docs.microsoft.com/azure/storage/blobs/storage-quickstart-blobs-portal).
41
43
44
+
If you want to use manually labeled data, you'll also have to upload the *.labels.json* and *.ocr.json* files that correspond to your training documents. You can use the [Sample labeling tool](./quickstarts/label-tool.md) (or your own UI) to generate these files.
45
+
42
46
### Organize your data in subfolders (optional)
43
47
44
-
By default, the [Train Custom Model](https://westus2.dev.cognitive.microsoft.com/docs/services/form-recognizer-api-v2-preview/operations/TrainCustomModelAsync) API will only use form documents that are located at the root of your storage container. However, you can train with data in subfolders if you specify it in the API call. Normally, the body of the [Train Custom Model](https://westus2.dev.cognitive.microsoft.com/docs/services/form-recognizer-api-v2-preview/operations/TrainCustomModelAsync) call has the following form, where `<SAS URL>` is the Shared access signature URL of your container:
48
+
By default, the [Train Custom Model](https://westus2.dev.cognitive.microsoft.com/docs/services/form-recognizer-api-v2-preview/operations/TrainCustomModelAsync) API will only use form documents that are located at the root of your storage container. However, you can train with data in subfolders if you specify it in the API call. Normally, the body of the [Train Custom Model](https://westus2.dev.cognitive.microsoft.com/docs/services/form-recognizer-api-v2-preview/operations/TrainCustomModelAsync) call has the following format, where `<SAS URL>` is the Shared access signature URL of your container:
45
49
46
50
```json
47
51
{
@@ -66,6 +70,7 @@ If you add the following content to the request body, the API will train with do
66
70
67
71
Now that you've learned how to build a training data set, follow a quickstart to train a custom Form Recognizer model and start using it on your forms.
68
72
69
-
*[Quickstart: Train a model and extract form data by using cURL](./quickstarts/curl-train-extract.md)
70
-
*[Quickstart: Train a model and extract form data using the REST API with Python](./quickstarts/python-train-extract.md)
71
-
*[Train with labels using the REST API and Python](./quickstarts/python-labeled-data.md)
73
+
*[Train a model and extract form data using cURL](./quickstarts/curl-train-extract.md)
74
+
*[Train a model and extract form data using the REST API and Python](./quickstarts/python-train-extract.md)
75
+
*[Train with labels using the sample labeling tool](./quickstarts/label-tool.md)
76
+
*[Train with labels using the REST API and Python](./quickstarts/python-labeled-data.md)
Copy file name to clipboardExpand all lines: articles/cognitive-services/form-recognizer/quickstarts/label-tool.md
+22Lines changed: 22 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -139,6 +139,7 @@ Next, you'll create tags (labels) and apply them to the text elements that you w
139
139
> * Label values as they appear on the form; don't try to split a value into two parts with two different tags. For example, an address field should be labeled with a single tag even if it spans multiple lines.
140
140
> * Don't include keys in your tagged fields—only the values.
141
141
> * Table data should be detected automatically and will be available in the final output JSON file. However, if the model fails to detect all of your table data, you can manually tag these fields as well. Tag each cell in the table with a different label. If your forms have tables with varying numbers of rows, make sure you tag at least one form with the largest possible table.
142
+
> * To delete an applied tag, select the rectangle on the document view and press the delete key.
142
143
143
144

144
145
@@ -161,6 +162,27 @@ The following value types and variations are currently supported:
161
162
* `time`
162
163
* `integer`
163
164
165
+
> [!NOTE]
166
+
> See these rules for date formatting:
167
+
>
168
+
> The following characters can be used as DMY date delimiters: `, - / . \`. Whitespace cannot be used as a delimiter. For example:
169
+
> * 01,01,2020
170
+
> * 01-01-2020
171
+
> * 01/01/2020
172
+
>
173
+
> The day and month can each be written as one or two digits, and the year can be two or four digits:
174
+
> * 1-1-2020
175
+
> * 1-01-20
176
+
>
177
+
> If a DMY date string has eight digits, the delimiter is optional:
178
+
> * 01012020
179
+
> * 01 01 2020
180
+
>
181
+
> The month can also be written as its full or short name. If the name is used, delimiter characters are optional:
182
+
> * 01/Jan/2020
183
+
> * 01Jan2020
184
+
> * 01 Jan 2020
185
+
164
186
## Train a custom model
165
187
166
188
Click the Train icon on the left pane to open the Training page. Then click the **Train** button to begin training the model. Once the training process completes, you'll see the following information:
0 commit comments