Merge pull request #115304 from diberry/diberry/luis-0514-best-practices

v-shils · web-flow · commit 928196be3379 · 2020-05-19T09:53:11.000-07:00
[Cogsvcs] PostKeynote-Best practices
diff --git a/articles/cognitive-services/LUIS/luis-concept-best-practices.md b/articles/cognitive-services/LUIS/luis-concept-best-practices.md
@@ -2,7 +2,7 @@
 title: Best practices for building your LUIS app
 description: Learn the best practices to get the best results from your LUIS app's model.
 ms.topic: conceptual
-ms.date: 05/06/2020
+ms.date: 05/17/2020
 ms.author: diberry
 ---
 # Best practices for building a language understanding (LUIS) app
@@ -25,14 +25,28 @@ The following list includes best practices for LUIS apps:
 
 |Do|Don't|
 |--|--|
-|[Define distinct intents](#do-define-distinct-intents)<br>[Add features to intents](#do-add-features-to-intents) |[Add many example utterances to intents](#dont-add-many-example-utterances-to-intents)<br>[Use few or simple entities](#dont-use-few-or-simple-entities) |
+|[Plan your schema](#do-plan-your-schema)|[Build and publish without a plan](#dont-publish-too-quickly)|
+|[Define distinct intents](#do-define-distinct-intents)<br>[Add features to intents](#do-add-features-to-intents)<br>
+[Use machine learned entities](#do-use-machine-learned-entities) |[Add many example utterances to intents](#dont-add-many-example-utterances-to-intents)<br>[Use few or simple entities](#dont-use-few-or-simple-entities) |
 |[Find a sweet spot between too generic and too specific for each intent](#do-find-sweet-spot-for-intents)|[Use LUIS as a training platform](#dont-use-luis-as-a-training-platform)|
 |[Build your app iteratively with versions](#do-build-your-app-iteratively-with-versions)<br>[Build entities for model decomposition](#do-build-for-model-decomposition)|[Add many example utterances of the same format, ignoring other formats](#dont-add-many-example-utterances-of-the-same-format-ignoring-other-formats)|
 |[Add patterns in later iterations](#do-add-patterns-in-later-iterations)|[Mix the definition of intents and entities](#dont-mix-the-definition-of-intents-and-entities)|
 |[Balance your utterances across all intents](#balance-your-utterances-across-all-intents) except the None intent.<br>[Add example utterances to None intent](#do-add-example-utterances-to-none-intent)|[Create phrase lists with all possible values](#dont-create-phrase-lists-with-all-the-possible-values)|
 |[Leverage the suggest feature for active learning](#do-leverage-the-suggest-feature-for-active-learning)|[Add too many patterns](#dont-add-many-patterns)|
 |[Monitor the performance of your app with batch testing](#do-monitor-the-performance-of-your-app)|[Train and publish with every single example utterance added](#dont-train-and-publish-with-every-single-example-utterance)|
 
+## Do plan your schema
+
+Before you start building your app's schema, you should identify what and where you plan to use this app. The more thorough and specific your planning, the better your app becomes.
+
+* Research targeted users
+* Defining end-to-end personas to represent your app - voice, avatar, issue handling (proactive, reactive)
+* Identify user interactions (text, speech) through which channels, handing off to existing solutions or creating a new solution for this app
+* End-to-end user journey
+    * What you should expect this app to do and not do? * What are the priorities of what it should do?
+    * What are the main use cases?
+* Collecting data - [learn](data-collection.md) about collecting and preparing data
+
 ## Do define distinct intents
 Make sure the vocabulary for each intent is just for that intent and not overlapping with a different intent. For example, if you want to have an app that handles travel arrangements such as airline flights and hotels, you can choose to have these subject areas as separate intents or the same intent with entities for specific data inside the utterance.
 
@@ -54,6 +68,14 @@ Features describe concepts for an intent. A feature can be a phrase list of word
 ## Do find sweet spot for intents
 Use prediction data from LUIS to determine if your intents are overlapping. Overlapping intents confuse LUIS. The result is that the top scoring intent is too close to another intent. Because LUIS does not use the exact same path through the data for training each time, an overlapping intent has a chance of being first or second in training. You want the utterance's score for each intention to be farther apart so this flip/flop doesn't happen. Good distinction for intents should result in the expected top intent every time.
 
+## Do use machine learned entities
+
+Machine learned entities are tailored to your app and require labeling to be successful. If you are not using machine learned entities, you might be using the wrong tool.
+
+Machine learned entities can use other entities as features. These other entities can be custom entities such as regular expression entities or list entities, or you can use prebuilt entities as features.
+
+Learn about [effective machine learned entities](luis-concept-entity-types.md#effective-machine-learned-entities).
+
 <a name="#do-build-the-app-iteratively"></a>
 
 ## Do build your app iteratively with versions
@@ -116,6 +138,14 @@ Monitor the prediction accuracy using a [batch test](luis-concept-batch-test.md)
 
 Keep a separate set of utterances that aren't used as [example utterances](luis-concept-utterance.md) or endpoint utterances. Keep improving the app for your test set. Adapt the test set to reflect real user utterances. Use this test set to evaluate each iteration or version of the app.
 
+## Don't publish too quickly
+
+Publishing your app too quickly, without [proper planning](#do-plan-your-schema), may lead to several issues such as:
+
+* Your app will not work in your actual scenario at an acceptable level of performance.
+* The schema (intents and entities) would not be appropriate, and if you have developed client app logic following the schema, you may need to rewrite that from scratch. This would cause unexpected delays and an extra cost to the project you are working on.
+* Utterances you add to the model might cause bias towards the example utterance set that is hard to debug and identify. It will also make removing ambiguity difficult after you have committed to a certain schema.
+
 ## Don't add many example utterances to intents
 
 After the app is published, only add utterances from active learning in the development lifecycle process. If utterances are too similar, add a pattern.
diff --git a/articles/cognitive-services/LUIS/luis-concept-entity-types.md b/articles/cognitive-services/LUIS/luis-concept-entity-types.md
@@ -2,7 +2,7 @@
 title: Entity types - LUIS
 description: An entity extracts data from a user utterance at prediction runtime. An _optional_, secondary purpose is to boost the prediction of the intent or other entities by using the entity as a feature.
 ms.topic: conceptual
-ms.date: 04/30/2020
+ms.date: 05/17/2020
 ---
 
 # Extract data with entities
@@ -11,7 +11,7 @@ An entity extracts data from a user utterance at prediction runtime. An _optiona
 
 There are several types of entities:
 
-* [Machine-learned entity](reference-entity-machine-learned-entity.md)
+* [Machine-learned entity](reference-entity-machine-learned-entity.md) - this is the primary entity. You should design your schema with this entity type before using other entities.
 * Non-machine-learned used as a required [feature](luis-concept-feature.md) - for exact text matches, pattern matches, or detection by prebuilt entities
 * [Pattern.any](#patternany-entity) - to extract free-form text such as book titles from a [Pattern](reference-entity-pattern-any.md)
 
@@ -59,6 +59,14 @@ A machine-learned entity triggers based on the context learned through example u
 
 [**Machine-learned entities**](tutorial-machine-learned-entity.md) are the top-level extractors. Subentities are child entities of machine-learned entities.
 
+## Effective machine learned entities
+
+To build the machine learned entities effectively:
+
+* Your labeling should be consistent across the intents. This includes even utterances you provide in the **None** intent that include this entity. Otherwise the model will not be able to determine the sequences effectively.
+* If you have a machine learned entity with subentities, make sure that the different orders and variants of the entity and subentities are presented in the labeled utterances. Labeled example utterances should include all valid forms, and include entities that appear and are absent and also reordered within the utterance.
+* You should avoid overfitting the entities to a very fixed set. **Overfitting** happens when the model doesn't generalize well, and is a common problem in machine learning models. This implies the app would not work on new data adequately. In turn, you should vary the labeled example utterances so the app is able to generalize beyond the limited examples you provide. You should vary the different subentities with enough change for the model to think more of the concept instead of just the examples shown.
+
 <a name="composite-entity"></a>
 <a name="list-entity"></a>
 <a name="patternany-entity"></a>
@@ -80,6 +88,15 @@ Choose the entity based on how the data should be extracted and how it should be
 |[**Prebuilt**](luis-reference-prebuilt-entities.md)|Already trained to extract specific kind of data such as URL or email. Some of these prebuilt entities are defined in the open-source [Recognizers-Text](https://github.com/Microsoft/Recognizers-Text) project. If your specific culture or entity isn't currently supported, contribute to the project.|
 |[**Regular Expression**](reference-entity-regular-expression.md)|Uses regular expression for **exact text match**.|
 
+
+## Extraction versus resolution
+
+Entities extract data as the data appears in the utterance. Entities do not change or resolve the data. The entity won't provide any resolution if the text is a valid value for the entity or not.
+
+There are ways to bring resolution into the extraction, but you should be aware that this limits the ability of the app to be immune against variations and mistakes.
+
+List entities and regular expression (text-matching) entities can be used as [required features](luis-concept-feature.md#required-features) to a subentity and that acts as a filter to the extraction. You should use this carefully as not to hinder the ability of the app to predict.
+
 ## Extracting contextually related data
 
 An utterance may contain two or more occurrences of an entity where the meaning of the data is based on context within the utterance. An example is an utterance for booking a flight that has two geographical locations, origin and destination.
diff --git a/articles/cognitive-services/LUIS/luis-concept-feature.md b/articles/cognitive-services/LUIS/luis-concept-feature.md
@@ -2,7 +2,7 @@
 title: Features - LUIS
 description: Add features to a language model to provide hints about how to recognize input that you want to label or classify.
 ms.topic: conceptual
-ms.date: 04/23/2020
+ms.date: 05/14/2020
 ---
 # Machine-learning (ML) features
 
@@ -82,10 +82,22 @@ For example, if n shipping address entity contained a street address subentity,
     * Country (subentity)
     * Postal code (subentity)
 
+## Nested subentities with features
+
+A machine learned subentity indicates a concept is present to the parent entity, whether that parent is another subentity or the top entity. The value of the subentity acts as a feature to its parent. 
+
+A subentity can have both a phrase list as a feature as well as a model (another entity) as a feature.
+
+When the subentity has a phrase list, this will boost the vocabulary of the concept but won't add any information to the JSON response of the prediction.
+
+When the subentity has a feature of another entity, the JSON response includes the extracted data of that other entity.
+
 ## Required features
 
 A required feature has to be found in order for the model to be returned from the prediction endpoint. Use a required feature when you know your incoming data must match the feature.
 
+If the utterance text doesn't match the required feature, it will not be extracted.
+
 **A required feature uses a non-machine learned entity**:
 * Regular expression entity
 * List entity
diff --git a/articles/cognitive-services/LUIS/luis-glossary.md b/articles/cognitive-services/LUIS/luis-glossary.md
@@ -195,6 +195,10 @@ A (machine learned) model is a function that makes a prediction on input data. I
 
 You add values to your [list](#list-entity) entities. Each of those values can have a list of one or more synonyms. Only the normalized value is returned in the response.
 
+## Overfitting
+
+Overfitting happens when the model is fixated on the specific examples and is not able to generalize well.
+
 ## Owner
 
 Each app has one owner who is the person that created the app. The owner manages permissions to the application in the Azure portal.
@@ -255,7 +259,7 @@ LUIS quota is the limitation of the Azure subscription tier. The LUIS quota can
 
 ## Schema
 
-Your schema includes your intents and entities along with the subentities. The schema is initially planned for then iterated over time. The schema doesn't include app settings, features, or example utterances. 
+Your schema includes your intents and entities along with the subentities. The schema is initially planned for then iterated over time. The schema doesn't include app settings, features, or example utterances.
 
 ## Sentiment Analysis
 Sentiment analysis provides positive or negative values of the utterances provided by [Text Analytics](../text-analytics/overview.md).
diff --git a/articles/cognitive-services/LUIS/luis-how-plan-your-app.md b/articles/cognitive-services/LUIS/luis-how-plan-your-app.md
@@ -2,7 +2,7 @@
 title: Plan your app - LUIS
 description: Outline relevant app intents and entities, and then create your application plans in Language Understanding Intelligent Services (LUIS).
 ms.topic: conceptual
-ms.date: 04/14/2020
+ms.date: 05/14/2020
 ---
 
 # Plan your LUIS app schema with subject domain and data extraction
@@ -44,6 +44,30 @@ When determining which entities to use in your app, keep in mind that there are
 > [!TIP]
 > LUIS offers [prebuilt entities](luis-prebuilt-entities.md) for common, conversational user scenarios. Consider using prebuilt entities as a starting point for your application development.
 
+## Resolution with intent or entity?
+
+In many cases, especially when working with natural conversation, users provide an utterance that can contain more than one function or intent. To address this, a general rule of thumb is to understand that the representation of the output can be done in both intents and entities. This representation should be mappable to your client application actions, and it doesn't need to be limited to the intents.
+
+**Int-ent-ties** is the concept that actions (usually understood as intents) could also be captured as entities and relied on in this form in the output JSON where you can map it to a specific action. _Negation_ is a common usage to leverage this reliance on both intent and entity for full extraction.
+
+Consider the following two utterances which are very close considering word choice but have different results:
+
+|Utterance|
+|--|
+|`Please schedule my flight from Cairo to Seattle`|
+|`Cancel my flight from Cairo to Seattle`|
+
+Instead of having two separate intents, create a single intent with a `FlightAction` machine learning entity. The machine learning entity should extract the details of the action for both a scheduling and a cancelling request as well as either a origin or destination location.
+
+The `FlightAction` entity would be structured in the following suedo-schema of machine learning entity and subentities:
+
+* FlightAction
+    * Action
+    * Origin
+    * Destination
+
+To help the extraction add features to the subentities. You will choose your features based on the vocabulary you expect to see in user utterances and the values you want returned in the prediction response.
+
 ## Next steps
 
 > [!div class="nextstepaction"]