You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/cognitive-services/form-recognizer/build-training-data-set.md
+11-6Lines changed: 11 additions & 6 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -15,9 +15,11 @@ ms.author: pafarley
15
15
16
16
# Build a training data set for a custom model
17
17
18
-
When you use the Form Recognizer custom model, you provide your own training data so the model can train to your industry-specific forms. You can train a model with five filled-in forms or an empty form (you must include the word "empty" in the file name) plus two filled-in forms. Even if you have enough filled-in forms to train with, adding an empty form to your training data set can improve the accuracy of the model.
18
+
When you use the Form Recognizer custom model, you provide your own training data so the model can train to your industry-specific forms.
19
19
20
-
If you want to use manually labeled training data, you should start with at least five forms of the same type. You can still use unlabeled forms and an empty form in the same data set.
20
+
If you're training without manual labels, you can use five filled-in forms, or an empty form (you must include the word "empty" in the file name) plus two filled-in forms. Even if you have enough filled-in forms, adding an empty form to your training data set can improve the accuracy of the model.
21
+
22
+
If you want to use manually labeled training data, you must start with at least five filled-in forms of the same type. You can still use unlabeled forms and an empty form in addition to the required data set.
21
23
22
24
## Training data tips
23
25
@@ -39,9 +41,11 @@ Make sure your training data set also follows the input requirements for all For
39
41
40
42
When you've put together the set of form documents that you'll use for training, you need to upload it to an Azure blob storage container. If you don't know how to create an Azure storage account with a container, following the [Azure Storage quickstart for Azure portal](https://docs.microsoft.com/azure/storage/blobs/storage-quickstart-blobs-portal).
41
43
44
+
If you want to use manually labeled data, you'll also have to upload the *.labels.json* and *.ocr.json* files that correspond to your training documents. You can use the [Sample labeling tool](./quickstarts/label-tool.md) (or your own UI) to generate these files.
45
+
42
46
### Organize your data in subfolders (optional)
43
47
44
-
By default, the [Train Custom Model](https://westus2.dev.cognitive.microsoft.com/docs/services/form-recognizer-api-v2-preview/operations/TrainCustomModelAsync) API will only use form documents that are located at the root of your storage container. However, you can train with data in subfolders if you specify it in the API call. Normally, the body of the [Train Custom Model](https://westus2.dev.cognitive.microsoft.com/docs/services/form-recognizer-api-v2-preview/operations/TrainCustomModelAsync) call has the following form, where `<SAS URL>` is the Shared access signature URL of your container:
48
+
By default, the [Train Custom Model](https://westus2.dev.cognitive.microsoft.com/docs/services/form-recognizer-api-v2-preview/operations/TrainCustomModelAsync) API will only use form documents that are located at the root of your storage container. However, you can train with data in subfolders if you specify it in the API call. Normally, the body of the [Train Custom Model](https://westus2.dev.cognitive.microsoft.com/docs/services/form-recognizer-api-v2-preview/operations/TrainCustomModelAsync) call has the following format, where `<SAS URL>` is the Shared access signature URL of your container:
45
49
46
50
```json
47
51
{
@@ -66,6 +70,7 @@ If you add the following content to the request body, the API will train with do
66
70
67
71
Now that you've learned how to build a training data set, follow a quickstart to train a custom Form Recognizer model and start using it on your forms.
68
72
69
-
*[Quickstart: Train a model and extract form data by using cURL](./quickstarts/curl-train-extract.md)
70
-
*[Quickstart: Train a model and extract form data using the REST API with Python](./quickstarts/python-train-extract.md)
71
-
*[Train with labels using the REST API and Python](./quickstarts/python-labeled-data.md)
73
+
*[Train a model and extract form data using cURL](./quickstarts/curl-train-extract.md)
74
+
*[Train a model and extract form data using the REST API and Python](./quickstarts/python-train-extract.md)
75
+
*[Train with labels using the sample labeling tool](./quickstarts/label-tool.md)
76
+
*[Train with labels using the REST API and Python](./quickstarts/python-labeled-data.md)
0 commit comments