Skip to content

Commit 928196b

Browse files
authored
Merge pull request #115304 from diberry/diberry/luis-0514-best-practices
[Cogsvcs] PostKeynote-Best practices
2 parents 25da845 + af181d9 commit 928196b

File tree

5 files changed

+94
-7
lines changed

5 files changed

+94
-7
lines changed

articles/cognitive-services/LUIS/luis-concept-best-practices.md

Lines changed: 32 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22
title: Best practices for building your LUIS app
33
description: Learn the best practices to get the best results from your LUIS app's model.
44
ms.topic: conceptual
5-
ms.date: 05/06/2020
5+
ms.date: 05/17/2020
66
ms.author: diberry
77
---
88
# Best practices for building a language understanding (LUIS) app
@@ -25,14 +25,28 @@ The following list includes best practices for LUIS apps:
2525

2626
|Do|Don't|
2727
|--|--|
28-
|[Define distinct intents](#do-define-distinct-intents)<br>[Add features to intents](#do-add-features-to-intents) |[Add many example utterances to intents](#dont-add-many-example-utterances-to-intents)<br>[Use few or simple entities](#dont-use-few-or-simple-entities) |
28+
|[Plan your schema](#do-plan-your-schema)|[Build and publish without a plan](#dont-publish-too-quickly)|
29+
|[Define distinct intents](#do-define-distinct-intents)<br>[Add features to intents](#do-add-features-to-intents)<br>
30+
[Use machine learned entities](#do-use-machine-learned-entities) |[Add many example utterances to intents](#dont-add-many-example-utterances-to-intents)<br>[Use few or simple entities](#dont-use-few-or-simple-entities) |
2931
|[Find a sweet spot between too generic and too specific for each intent](#do-find-sweet-spot-for-intents)|[Use LUIS as a training platform](#dont-use-luis-as-a-training-platform)|
3032
|[Build your app iteratively with versions](#do-build-your-app-iteratively-with-versions)<br>[Build entities for model decomposition](#do-build-for-model-decomposition)|[Add many example utterances of the same format, ignoring other formats](#dont-add-many-example-utterances-of-the-same-format-ignoring-other-formats)|
3133
|[Add patterns in later iterations](#do-add-patterns-in-later-iterations)|[Mix the definition of intents and entities](#dont-mix-the-definition-of-intents-and-entities)|
3234
|[Balance your utterances across all intents](#balance-your-utterances-across-all-intents) except the None intent.<br>[Add example utterances to None intent](#do-add-example-utterances-to-none-intent)|[Create phrase lists with all possible values](#dont-create-phrase-lists-with-all-the-possible-values)|
3335
|[Leverage the suggest feature for active learning](#do-leverage-the-suggest-feature-for-active-learning)|[Add too many patterns](#dont-add-many-patterns)|
3436
|[Monitor the performance of your app with batch testing](#do-monitor-the-performance-of-your-app)|[Train and publish with every single example utterance added](#dont-train-and-publish-with-every-single-example-utterance)|
3537

38+
## Do plan your schema
39+
40+
Before you start building your app's schema, you should identify what and where you plan to use this app. The more thorough and specific your planning, the better your app becomes.
41+
42+
* Research targeted users
43+
* Defining end-to-end personas to represent your app - voice, avatar, issue handling (proactive, reactive)
44+
* Identify user interactions (text, speech) through which channels, handing off to existing solutions or creating a new solution for this app
45+
* End-to-end user journey
46+
* What you should expect this app to do and not do? * What are the priorities of what it should do?
47+
* What are the main use cases?
48+
* Collecting data - [learn](data-collection.md) about collecting and preparing data
49+
3650
## Do define distinct intents
3751
Make sure the vocabulary for each intent is just for that intent and not overlapping with a different intent. For example, if you want to have an app that handles travel arrangements such as airline flights and hotels, you can choose to have these subject areas as separate intents or the same intent with entities for specific data inside the utterance.
3852

@@ -54,6 +68,14 @@ Features describe concepts for an intent. A feature can be a phrase list of word
5468
## Do find sweet spot for intents
5569
Use prediction data from LUIS to determine if your intents are overlapping. Overlapping intents confuse LUIS. The result is that the top scoring intent is too close to another intent. Because LUIS does not use the exact same path through the data for training each time, an overlapping intent has a chance of being first or second in training. You want the utterance's score for each intention to be farther apart so this flip/flop doesn't happen. Good distinction for intents should result in the expected top intent every time.
5670

71+
## Do use machine learned entities
72+
73+
Machine learned entities are tailored to your app and require labeling to be successful. If you are not using machine learned entities, you might be using the wrong tool.
74+
75+
Machine learned entities can use other entities as features. These other entities can be custom entities such as regular expression entities or list entities, or you can use prebuilt entities as features.
76+
77+
Learn about [effective machine learned entities](luis-concept-entity-types.md#effective-machine-learned-entities).
78+
5779
<a name="#do-build-the-app-iteratively"></a>
5880

5981
## Do build your app iteratively with versions
@@ -116,6 +138,14 @@ Monitor the prediction accuracy using a [batch test](luis-concept-batch-test.md)
116138

117139
Keep a separate set of utterances that aren't used as [example utterances](luis-concept-utterance.md) or endpoint utterances. Keep improving the app for your test set. Adapt the test set to reflect real user utterances. Use this test set to evaluate each iteration or version of the app.
118140

141+
## Don't publish too quickly
142+
143+
Publishing your app too quickly, without [proper planning](#do-plan-your-schema), may lead to several issues such as:
144+
145+
* Your app will not work in your actual scenario at an acceptable level of performance.
146+
* The schema (intents and entities) would not be appropriate, and if you have developed client app logic following the schema, you may need to rewrite that from scratch. This would cause unexpected delays and an extra cost to the project you are working on.
147+
* Utterances you add to the model might cause bias towards the example utterance set that is hard to debug and identify. It will also make removing ambiguity difficult after you have committed to a certain schema.
148+
119149
## Don't add many example utterances to intents
120150

121151
After the app is published, only add utterances from active learning in the development lifecycle process. If utterances are too similar, add a pattern.

articles/cognitive-services/LUIS/luis-concept-entity-types.md

Lines changed: 19 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22
title: Entity types - LUIS
33
description: An entity extracts data from a user utterance at prediction runtime. An _optional_, secondary purpose is to boost the prediction of the intent or other entities by using the entity as a feature.
44
ms.topic: conceptual
5-
ms.date: 04/30/2020
5+
ms.date: 05/17/2020
66
---
77

88
# Extract data with entities
@@ -11,7 +11,7 @@ An entity extracts data from a user utterance at prediction runtime. An _optiona
1111

1212
There are several types of entities:
1313

14-
* [Machine-learned entity](reference-entity-machine-learned-entity.md)
14+
* [Machine-learned entity](reference-entity-machine-learned-entity.md) - this is the primary entity. You should design your schema with this entity type before using other entities.
1515
* Non-machine-learned used as a required [feature](luis-concept-feature.md) - for exact text matches, pattern matches, or detection by prebuilt entities
1616
* [Pattern.any](#patternany-entity) - to extract free-form text such as book titles from a [Pattern](reference-entity-pattern-any.md)
1717

@@ -59,6 +59,14 @@ A machine-learned entity triggers based on the context learned through example u
5959

6060
[**Machine-learned entities**](tutorial-machine-learned-entity.md) are the top-level extractors. Subentities are child entities of machine-learned entities.
6161

62+
## Effective machine learned entities
63+
64+
To build the machine learned entities effectively:
65+
66+
* Your labeling should be consistent across the intents. This includes even utterances you provide in the **None** intent that include this entity. Otherwise the model will not be able to determine the sequences effectively.
67+
* If you have a machine learned entity with subentities, make sure that the different orders and variants of the entity and subentities are presented in the labeled utterances. Labeled example utterances should include all valid forms, and include entities that appear and are absent and also reordered within the utterance.
68+
* You should avoid overfitting the entities to a very fixed set. **Overfitting** happens when the model doesn't generalize well, and is a common problem in machine learning models. This implies the app would not work on new data adequately. In turn, you should vary the labeled example utterances so the app is able to generalize beyond the limited examples you provide. You should vary the different subentities with enough change for the model to think more of the concept instead of just the examples shown.
69+
6270
<a name="composite-entity"></a>
6371
<a name="list-entity"></a>
6472
<a name="patternany-entity"></a>
@@ -80,6 +88,15 @@ Choose the entity based on how the data should be extracted and how it should be
8088
|[**Prebuilt**](luis-reference-prebuilt-entities.md)|Already trained to extract specific kind of data such as URL or email. Some of these prebuilt entities are defined in the open-source [Recognizers-Text](https://github.com/Microsoft/Recognizers-Text) project. If your specific culture or entity isn't currently supported, contribute to the project.|
8189
|[**Regular Expression**](reference-entity-regular-expression.md)|Uses regular expression for **exact text match**.|
8290

91+
92+
## Extraction versus resolution
93+
94+
Entities extract data as the data appears in the utterance. Entities do not change or resolve the data. The entity won't provide any resolution if the text is a valid value for the entity or not.
95+
96+
There are ways to bring resolution into the extraction, but you should be aware that this limits the ability of the app to be immune against variations and mistakes.
97+
98+
List entities and regular expression (text-matching) entities can be used as [required features](luis-concept-feature.md#required-features) to a subentity and that acts as a filter to the extraction. You should use this carefully as not to hinder the ability of the app to predict.
99+
83100
## Extracting contextually related data
84101

85102
An utterance may contain two or more occurrences of an entity where the meaning of the data is based on context within the utterance. An example is an utterance for booking a flight that has two geographical locations, origin and destination.

articles/cognitive-services/LUIS/luis-concept-feature.md

Lines changed: 13 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22
title: Features - LUIS
33
description: Add features to a language model to provide hints about how to recognize input that you want to label or classify.
44
ms.topic: conceptual
5-
ms.date: 04/23/2020
5+
ms.date: 05/14/2020
66
---
77
# Machine-learning (ML) features
88

@@ -82,10 +82,22 @@ For example, if n shipping address entity contained a street address subentity,
8282
* Country (subentity)
8383
* Postal code (subentity)
8484

85+
## Nested subentities with features
86+
87+
A machine learned subentity indicates a concept is present to the parent entity, whether that parent is another subentity or the top entity. The value of the subentity acts as a feature to its parent.
88+
89+
A subentity can have both a phrase list as a feature as well as a model (another entity) as a feature.
90+
91+
When the subentity has a phrase list, this will boost the vocabulary of the concept but won't add any information to the JSON response of the prediction.
92+
93+
When the subentity has a feature of another entity, the JSON response includes the extracted data of that other entity.
94+
8595
## Required features
8696

8797
A required feature has to be found in order for the model to be returned from the prediction endpoint. Use a required feature when you know your incoming data must match the feature.
8898

99+
If the utterance text doesn't match the required feature, it will not be extracted.
100+
89101
**A required feature uses a non-machine learned entity**:
90102
* Regular expression entity
91103
* List entity

articles/cognitive-services/LUIS/luis-glossary.md

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -195,6 +195,10 @@ A (machine learned) model is a function that makes a prediction on input data. I
195195

196196
You add values to your [list](#list-entity) entities. Each of those values can have a list of one or more synonyms. Only the normalized value is returned in the response.
197197

198+
## Overfitting
199+
200+
Overfitting happens when the model is fixated on the specific examples and is not able to generalize well.
201+
198202
## Owner
199203

200204
Each app has one owner who is the person that created the app. The owner manages permissions to the application in the Azure portal.
@@ -255,7 +259,7 @@ LUIS quota is the limitation of the Azure subscription tier. The LUIS quota can
255259

256260
## Schema
257261

258-
Your schema includes your intents and entities along with the subentities. The schema is initially planned for then iterated over time. The schema doesn't include app settings, features, or example utterances.
262+
Your schema includes your intents and entities along with the subentities. The schema is initially planned for then iterated over time. The schema doesn't include app settings, features, or example utterances.
259263

260264
## Sentiment Analysis
261265
Sentiment analysis provides positive or negative values of the utterances provided by [Text Analytics](../text-analytics/overview.md).

articles/cognitive-services/LUIS/luis-how-plan-your-app.md

Lines changed: 25 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22
title: Plan your app - LUIS
33
description: Outline relevant app intents and entities, and then create your application plans in Language Understanding Intelligent Services (LUIS).
44
ms.topic: conceptual
5-
ms.date: 04/14/2020
5+
ms.date: 05/14/2020
66
---
77

88
# Plan your LUIS app schema with subject domain and data extraction
@@ -44,6 +44,30 @@ When determining which entities to use in your app, keep in mind that there are
4444
> [!TIP]
4545
> LUIS offers [prebuilt entities](luis-prebuilt-entities.md) for common, conversational user scenarios. Consider using prebuilt entities as a starting point for your application development.
4646
47+
## Resolution with intent or entity?
48+
49+
In many cases, especially when working with natural conversation, users provide an utterance that can contain more than one function or intent. To address this, a general rule of thumb is to understand that the representation of the output can be done in both intents and entities. This representation should be mappable to your client application actions, and it doesn't need to be limited to the intents.
50+
51+
**Int-ent-ties** is the concept that actions (usually understood as intents) could also be captured as entities and relied on in this form in the output JSON where you can map it to a specific action. _Negation_ is a common usage to leverage this reliance on both intent and entity for full extraction.
52+
53+
Consider the following two utterances which are very close considering word choice but have different results:
54+
55+
|Utterance|
56+
|--|
57+
|`Please schedule my flight from Cairo to Seattle`|
58+
|`Cancel my flight from Cairo to Seattle`|
59+
60+
Instead of having two separate intents, create a single intent with a `FlightAction` machine learning entity. The machine learning entity should extract the details of the action for both a scheduling and a cancelling request as well as either a origin or destination location.
61+
62+
The `FlightAction` entity would be structured in the following suedo-schema of machine learning entity and subentities:
63+
64+
* FlightAction
65+
* Action
66+
* Origin
67+
* Destination
68+
69+
To help the extraction add features to the subentities. You will choose your features based on the vocabulary you expect to see in user utterances and the values you want returned in the prediction response.
70+
4771
## Next steps
4872

4973
> [!div class="nextstepaction"]

0 commit comments

Comments
 (0)