Skip to content

Commit d7ad411

Browse files
authored
Merge pull request #116485 from PatrickFarley/formre-small-tasks
[cog serv] add date value formatting rules
2 parents 2abbc2c + e7b621c commit d7ad411

File tree

2 files changed

+33
-6
lines changed

2 files changed

+33
-6
lines changed

articles/cognitive-services/form-recognizer/build-training-data-set.md

Lines changed: 11 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -15,9 +15,11 @@ ms.author: pafarley
1515

1616
# Build a training data set for a custom model
1717

18-
When you use the Form Recognizer custom model, you provide your own training data so the model can train to your industry-specific forms. You can train a model with five filled-in forms or an empty form (you must include the word "empty" in the file name) plus two filled-in forms. Even if you have enough filled-in forms to train with, adding an empty form to your training data set can improve the accuracy of the model.
18+
When you use the Form Recognizer custom model, you provide your own training data so the model can train to your industry-specific forms.
1919

20-
If you want to use manually labeled training data, you should start with at least five forms of the same type. You can still use unlabeled forms and an empty form in the same data set.
20+
If you're training without manual labels, you can use five filled-in forms, or an empty form (you must include the word "empty" in the file name) plus two filled-in forms. Even if you have enough filled-in forms, adding an empty form to your training data set can improve the accuracy of the model.
21+
22+
If you want to use manually labeled training data, you must start with at least five filled-in forms of the same type. You can still use unlabeled forms and an empty form in addition to the required data set.
2123

2224
## Training data tips
2325

@@ -39,9 +41,11 @@ Make sure your training data set also follows the input requirements for all For
3941

4042
When you've put together the set of form documents that you'll use for training, you need to upload it to an Azure blob storage container. If you don't know how to create an Azure storage account with a container, following the [Azure Storage quickstart for Azure portal](https://docs.microsoft.com/azure/storage/blobs/storage-quickstart-blobs-portal).
4143

44+
If you want to use manually labeled data, you'll also have to upload the *.labels.json* and *.ocr.json* files that correspond to your training documents. You can use the [Sample labeling tool](./quickstarts/label-tool.md) (or your own UI) to generate these files.
45+
4246
### Organize your data in subfolders (optional)
4347

44-
By default, the [Train Custom Model](https://westus2.dev.cognitive.microsoft.com/docs/services/form-recognizer-api-v2-preview/operations/TrainCustomModelAsync) API will only use form documents that are located at the root of your storage container. However, you can train with data in subfolders if you specify it in the API call. Normally, the body of the [Train Custom Model](https://westus2.dev.cognitive.microsoft.com/docs/services/form-recognizer-api-v2-preview/operations/TrainCustomModelAsync) call has the following form, where `<SAS URL>` is the Shared access signature URL of your container:
48+
By default, the [Train Custom Model](https://westus2.dev.cognitive.microsoft.com/docs/services/form-recognizer-api-v2-preview/operations/TrainCustomModelAsync) API will only use form documents that are located at the root of your storage container. However, you can train with data in subfolders if you specify it in the API call. Normally, the body of the [Train Custom Model](https://westus2.dev.cognitive.microsoft.com/docs/services/form-recognizer-api-v2-preview/operations/TrainCustomModelAsync) call has the following format, where `<SAS URL>` is the Shared access signature URL of your container:
4549

4650
```json
4751
{
@@ -66,6 +70,7 @@ If you add the following content to the request body, the API will train with do
6670

6771
Now that you've learned how to build a training data set, follow a quickstart to train a custom Form Recognizer model and start using it on your forms.
6872

69-
* [Quickstart: Train a model and extract form data by using cURL](./quickstarts/curl-train-extract.md)
70-
* [Quickstart: Train a model and extract form data using the REST API with Python](./quickstarts/python-train-extract.md)
71-
* [Train with labels using the REST API and Python](./quickstarts/python-labeled-data.md)
73+
* [Train a model and extract form data using cURL](./quickstarts/curl-train-extract.md)
74+
* [Train a model and extract form data using the REST API and Python](./quickstarts/python-train-extract.md)
75+
* [Train with labels using the sample labeling tool](./quickstarts/label-tool.md)
76+
* [Train with labels using the REST API and Python](./quickstarts/python-labeled-data.md)

articles/cognitive-services/form-recognizer/quickstarts/label-tool.md

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -139,6 +139,7 @@ Next, you'll create tags (labels) and apply them to the text elements that you w
139139
> * Label values as they appear on the form; don't try to split a value into two parts with two different tags. For example, an address field should be labeled with a single tag even if it spans multiple lines.
140140
> * Don't include keys in your tagged fields&mdash;only the values.
141141
> * Table data should be detected automatically and will be available in the final output JSON file. However, if the model fails to detect all of your table data, you can manually tag these fields as well. Tag each cell in the table with a different label. If your forms have tables with varying numbers of rows, make sure you tag at least one form with the largest possible table.
142+
> * To delete an applied tag, select the rectangle on the document view and press the delete key.
142143
143144
![Main editor window of sample labeling tool](../media/label-tool/main-editor.png)
144145
@@ -161,6 +162,27 @@ The following value types and variations are currently supported:
161162
* `time`
162163
* `integer`
163164
165+
> [!NOTE]
166+
> See these rules for date formatting:
167+
>
168+
> The following characters can be used as DMY date delimiters: `, - / . \`. Whitespace cannot be used as a delimiter. For example:
169+
> * 01,01,2020
170+
> * 01-01-2020
171+
> * 01/01/2020
172+
>
173+
> The day and month can each be written as one or two digits, and the year can be two or four digits:
174+
> * 1-1-2020
175+
> * 1-01-20
176+
>
177+
> If a DMY date string has eight digits, the delimiter is optional:
178+
> * 01012020
179+
> * 01 01 2020
180+
>
181+
> The month can also be written as its full or short name. If the name is used, delimiter characters are optional:
182+
> * 01/Jan/2020
183+
> * 01Jan2020
184+
> * 01 Jan 2020
185+
164186
## Train a custom model
165187
166188
Click the Train icon on the left pane to open the Training page. Then click the **Train** button to begin training the model. Once the training process completes, you'll see the following information:

0 commit comments

Comments
 (0)