Skip to content

Commit e7b621c

Browse files
committed
add more labeled data instructions
1 parent 49a6d56 commit e7b621c

File tree

1 file changed

+11
-6
lines changed

1 file changed

+11
-6
lines changed

articles/cognitive-services/form-recognizer/build-training-data-set.md

Lines changed: 11 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -15,9 +15,11 @@ ms.author: pafarley
1515

1616
# Build a training data set for a custom model
1717

18-
When you use the Form Recognizer custom model, you provide your own training data so the model can train to your industry-specific forms. You can train a model with five filled-in forms or an empty form (you must include the word "empty" in the file name) plus two filled-in forms. Even if you have enough filled-in forms to train with, adding an empty form to your training data set can improve the accuracy of the model.
18+
When you use the Form Recognizer custom model, you provide your own training data so the model can train to your industry-specific forms.
1919

20-
If you want to use manually labeled training data, you should start with at least five forms of the same type. You can still use unlabeled forms and an empty form in the same data set.
20+
If you're training without manual labels, you can use five filled-in forms, or an empty form (you must include the word "empty" in the file name) plus two filled-in forms. Even if you have enough filled-in forms, adding an empty form to your training data set can improve the accuracy of the model.
21+
22+
If you want to use manually labeled training data, you must start with at least five filled-in forms of the same type. You can still use unlabeled forms and an empty form in addition to the required data set.
2123

2224
## Training data tips
2325

@@ -39,9 +41,11 @@ Make sure your training data set also follows the input requirements for all For
3941

4042
When you've put together the set of form documents that you'll use for training, you need to upload it to an Azure blob storage container. If you don't know how to create an Azure storage account with a container, following the [Azure Storage quickstart for Azure portal](https://docs.microsoft.com/azure/storage/blobs/storage-quickstart-blobs-portal).
4143

44+
If you want to use manually labeled data, you'll also have to upload the *.labels.json* and *.ocr.json* files that correspond to your training documents. You can use the [Sample labeling tool](./quickstarts/label-tool.md) (or your own UI) to generate these files.
45+
4246
### Organize your data in subfolders (optional)
4347

44-
By default, the [Train Custom Model](https://westus2.dev.cognitive.microsoft.com/docs/services/form-recognizer-api-v2-preview/operations/TrainCustomModelAsync) API will only use form documents that are located at the root of your storage container. However, you can train with data in subfolders if you specify it in the API call. Normally, the body of the [Train Custom Model](https://westus2.dev.cognitive.microsoft.com/docs/services/form-recognizer-api-v2-preview/operations/TrainCustomModelAsync) call has the following form, where `<SAS URL>` is the Shared access signature URL of your container:
48+
By default, the [Train Custom Model](https://westus2.dev.cognitive.microsoft.com/docs/services/form-recognizer-api-v2-preview/operations/TrainCustomModelAsync) API will only use form documents that are located at the root of your storage container. However, you can train with data in subfolders if you specify it in the API call. Normally, the body of the [Train Custom Model](https://westus2.dev.cognitive.microsoft.com/docs/services/form-recognizer-api-v2-preview/operations/TrainCustomModelAsync) call has the following format, where `<SAS URL>` is the Shared access signature URL of your container:
4549

4650
```json
4751
{
@@ -66,6 +70,7 @@ If you add the following content to the request body, the API will train with do
6670

6771
Now that you've learned how to build a training data set, follow a quickstart to train a custom Form Recognizer model and start using it on your forms.
6872

69-
* [Quickstart: Train a model and extract form data by using cURL](./quickstarts/curl-train-extract.md)
70-
* [Quickstart: Train a model and extract form data using the REST API with Python](./quickstarts/python-train-extract.md)
71-
* [Train with labels using the REST API and Python](./quickstarts/python-labeled-data.md)
73+
* [Train a model and extract form data using cURL](./quickstarts/curl-train-extract.md)
74+
* [Train a model and extract form data using the REST API and Python](./quickstarts/python-train-extract.md)
75+
* [Train with labels using the sample labeling tool](./quickstarts/label-tool.md)
76+
* [Train with labels using the REST API and Python](./quickstarts/python-labeled-data.md)

0 commit comments

Comments
 (0)