Merge pull request #116485 from PatrickFarley/formre-small-tasks

GitHubber17 · web-flow · commit d7ad41144384 · 2020-05-26T11:49:54.000-07:00
[cog serv] add date value formatting rules
diff --git a/articles/cognitive-services/form-recognizer/build-training-data-set.md b/articles/cognitive-services/form-recognizer/build-training-data-set.md
@@ -15,9 +15,11 @@ ms.author: pafarley
 
 # Build a training data set for a custom model
 
-When you use the Form Recognizer custom model, you provide your own training data so the model can train to your industry-specific forms. You can train a model with five filled-in forms or an empty form (you must include the word "empty" in the file name) plus two filled-in forms. Even if you have enough filled-in forms to train with, adding an empty form to your training data set can improve the accuracy of the model.
+When you use the Form Recognizer custom model, you provide your own training data so the model can train to your industry-specific forms. 
 
-If you want to use manually labeled training data, you should start with at least five forms of the same type. You can still use unlabeled forms and an empty form in the same data set.
+If you're training without manual labels, you can use five filled-in forms, or an empty form (you must include the word "empty" in the file name) plus two filled-in forms. Even if you have enough filled-in forms, adding an empty form to your training data set can improve the accuracy of the model.
+
+If you want to use manually labeled training data, you must start with at least five filled-in forms of the same type. You can still use unlabeled forms and an empty form in addition to the required data set.
 
 ## Training data tips
 
@@ -39,9 +41,11 @@ Make sure your training data set also follows the input requirements for all For
 
 When you've put together the set of form documents that you'll use for training, you need to upload it to an Azure blob storage container. If you don't know how to create an Azure storage account with a container, following the [Azure Storage quickstart for Azure portal](https://docs.microsoft.com/azure/storage/blobs/storage-quickstart-blobs-portal).
 
+If you want to use manually labeled data, you'll also have to upload the *.labels.json* and *.ocr.json* files that correspond to your training documents. You can use the [Sample labeling tool](./quickstarts/label-tool.md) (or your own UI) to generate these files.
+
 ### Organize your data in subfolders (optional)
 
-By default, the [Train Custom Model](https://westus2.dev.cognitive.microsoft.com/docs/services/form-recognizer-api-v2-preview/operations/TrainCustomModelAsync) API will only use form documents that are located at the root of your storage container. However, you can train with data in subfolders if you specify it in the API call. Normally, the body of the [Train Custom Model](https://westus2.dev.cognitive.microsoft.com/docs/services/form-recognizer-api-v2-preview/operations/TrainCustomModelAsync) call has the following form, where `<SAS URL>` is the Shared access signature URL of your container:
+By default, the [Train Custom Model](https://westus2.dev.cognitive.microsoft.com/docs/services/form-recognizer-api-v2-preview/operations/TrainCustomModelAsync) API will only use form documents that are located at the root of your storage container. However, you can train with data in subfolders if you specify it in the API call. Normally, the body of the [Train Custom Model](https://westus2.dev.cognitive.microsoft.com/docs/services/form-recognizer-api-v2-preview/operations/TrainCustomModelAsync) call has the following format, where `<SAS URL>` is the Shared access signature URL of your container:
 
 ```json
 {
@@ -66,6 +70,7 @@ If you add the following content to the request body, the API will train with do
 
 Now that you've learned how to build a training data set, follow a quickstart to train a custom Form Recognizer model and start using it on your forms.
 
-* [Quickstart: Train a model and extract form data by using cURL](./quickstarts/curl-train-extract.md)
-* [Quickstart: Train a model and extract form data using the REST API with Python](./quickstarts/python-train-extract.md)
-* [Train with labels using the REST API and Python](./quickstarts/python-labeled-data.md)
+* [Train a model and extract form data using cURL](./quickstarts/curl-train-extract.md)
+* [Train a model and extract form data using the REST API and Python](./quickstarts/python-train-extract.md)
+* [Train with labels using the sample labeling tool](./quickstarts/label-tool.md)
+* [Train with labels using the REST API and Python](./quickstarts/python-labeled-data.md)
diff --git a/articles/cognitive-services/form-recognizer/quickstarts/label-tool.md b/articles/cognitive-services/form-recognizer/quickstarts/label-tool.md
@@ -139,6 +139,7 @@ Next, you'll create tags (labels) and apply them to the text elements that you w
     > * Label values as they appear on the form; don't try to split a value into two parts with two different tags. For example, an address field should be labeled with a single tag even if it spans multiple lines.
     > * Don't include keys in your tagged fields&mdash;only the values.
     > * Table data should be detected automatically and will be available in the final output JSON file. However, if the model fails to detect all of your table data, you can manually tag these fields as well. Tag each cell in the table with a different label. If your forms have tables with varying numbers of rows, make sure you tag at least one form with the largest possible table.
+    > * To delete an applied tag, select the rectangle on the document view and press the delete key.
 
 ![Main editor window of sample labeling tool](../media/label-tool/main-editor.png)
 
@@ -161,6 +162,27 @@ The following value types and variations are currently supported:
 * `time`
 * `integer`
 
+> [!NOTE]
+> See these rules for date formatting:
+> 
+> The following characters can be used as DMY date delimiters: `, - / . \`. Whitespace cannot be used as a delimiter. For example:
+> * 01,01,2020
+> * 01-01-2020
+> * 01/01/2020
+>
+> The day and month can each be written as one or two digits, and the year can be two or four digits:
+> * 1-1-2020
+> * 1-01-20
+>
+> If a DMY date string has eight digits, the delimiter is optional:
+> * 01012020
+> * 01 01 2020
+>
+> The month can also be written as its full or short name. If the name is used, delimiter characters are optional:
+> * 01/Jan/2020
+> * 01Jan2020
+> * 01 Jan 2020
+
 ## Train a custom model
 
 Click the Train icon on the left pane to open the Training page. Then click the **Train** button to begin training the model. Once the training process completes, you'll see the following information: