Skip to content

Commit 5930d37

Browse files
authored
Update best-practices.md
I'd also like to add in a picture of the toggles in the Language Studio, but I don't know how to add pictures in Markdown files.
1 parent a61b5d2 commit 5930d37

File tree

1 file changed

+17
-3
lines changed
  • articles/ai-services/language-service/conversational-language-understanding/concepts

1 file changed

+17
-3
lines changed

articles/ai-services/language-service/conversational-language-understanding/concepts/best-practices.md

Lines changed: 17 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -73,17 +73,31 @@ To resolve this, you would label a learned component in your training data for a
7373
If you require the learned component, make sure that *ticket quantity* is only returned when the learned component predicts it in the right context. If you also require the prebuilt component, you can then guarantee that the returned *ticket quantity* entity is both a number and in the correct position.
7474

7575

76-
## Addressing casing inconsistencies
76+
## Addressing model inconsistencies
7777

78-
If you have poor AI quality and determine the casing used in your training data is dissimilar to the testing data, you can use the `normalizeCasing` project setting. This normalizes the casing of utterances when training and testing the model. If you've migrated from LUIS, you might recognize that LUIS did this by default.
78+
If your model is overly sensitive to small grammatical changes, like casing or diacritics, you can systematically manipulate your dataset directly in the Language Studio. To use these features, click on the Settings tab on the left toolbar and locate the **Advanced project settings** section. First, you can ***Enable data transformation for casing***, which normalizes the casing of utterances when training, testing, and implementing your model. If you've migrated from LUIS, you might recognize that LUIS did this normalization by default. To access this feature via the API, set the `"normalizeCasing"` parameter to `true`. See an example below:
7979

8080
```json
8181
{
8282
"projectFileVersion": "2022-10-01-preview",
8383
...
8484
"settings": {
85-
"confidenceThreshold": 0.5,
85+
...
8686
"normalizeCasing": true
87+
...
88+
}
89+
...
90+
```
91+
Second, you can also leverage the **Advanced project settings** to ***Enable data augmentation for diacritics*** to generate variations of your training data for possible diacritic variations used in natural language. This feature is available for all languages, but it is especially useful for Germanic and Slavic languages, where users often write words using classic English characters instead of the correct characters. For example, the phrase "Navigate to the sports channel" in French is "Accédez à la chaîne sportive". When this feature is enabled, the phrase "Accedez a la chaine sportive" (without diacritic characters) is also included in the training dataset. If you enable this feature, please note that the utterance count of your training set will increase, and you may need to adjust your training data size accordingly. The current maximum utterance count after augmentation is 25,000. To access this feature via the API, set the `"augmentDiacritics"` parameter to `true`. See an example below:
92+
93+
```json
94+
{
95+
"projectFileVersion": "2022-10-01-preview",
96+
...
97+
"settings": {
98+
...
99+
"augmentDiacritics": true
100+
...
87101
}
88102
...
89103
```

0 commit comments

Comments
 (0)