Skip to content

Commit 1aa45de

Browse files
authored
Merge pull request #79396 from diberry/0612-luis-regex-word-boundary
[Cogsvcs] LUIS - entity concepts
2 parents b10395f + e162077 commit 1aa45de

File tree

1 file changed

+44
-6
lines changed

1 file changed

+44
-6
lines changed

articles/cognitive-services/LUIS/luis-concept-entity-types.md

Lines changed: 44 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ ms.custom: seodec18
99
ms.service: cognitive-services
1010
ms.subservice: language-understanding
1111
ms.topic: conceptual
12-
ms.date: 04/01/2019
12+
ms.date: 06/12/2019
1313
ms.author: diberry
1414
---
1515
# Entity types and their purposes in LUIS
@@ -103,6 +103,30 @@ Pattern.any entities need to be marked in the [Pattern](luis-how-to-model-intent
103103

104104
Mixed entities use a combination of entity detection methods.
105105

106+
## Machine-learned entities use context
107+
108+
Machine-learned entities learn from context in the utterance. This makes variation of placement in example utterances significant.
109+
110+
## Non-machine-learned entities don't use context
111+
112+
The following non-machine learned entities do not take utterance context into account when matching entities:
113+
114+
* [Prebuilt entities](#prebuilt-entity)
115+
* [Regex entities](#regular-expression-entity)
116+
* [List entities](#list-entity)
117+
118+
These entities do not require labeling or training the model. Once you add or configure the entity, the entities are extracted. The tradeoff is that these entities can be overmatched, where if context was taken into account, the match would not have been made.
119+
120+
This happens with list entities on new models frequently. You build and test your model with a list entity but when you publish your model and receive queries from the endpoint, you realize your model is overmatching due to lack of context.
121+
122+
If you want to match words or phrases and take context into account, you have two options. The first is to use a simple entity paired with a phrase list. The phrase list will not be used for matching but instead will help signal relatively similar words (interchangeable list). If you must have an exact match instead of a phrase list's variations, use a list entity with a role, described below.
123+
124+
### Context with non-machine-learned entities
125+
126+
If you want context of the utterance to matter for non-machine learned entities, you should use [roles](luis-concept-roles.md).
127+
128+
If you have a non-machine-learned entity, such as [prebuilt entities](#prebuilt-entity), [regex](#regular-expression-entity) entities or [list](#list-entity) entities, which is matching beyond the instance you want, consider creating one entity with two roles. One role will capture what you are looking for, and one role will capture what you are not looking for. Both versions will need to be labeled in example utterances.
129+
106130
## Composite entity
107131

108132
A composite entity is made up of other entities, such as prebuilt entities, simple, regular expression, and list entities. The separate entities form a whole entity.
@@ -127,8 +151,9 @@ List entities represent a fixed, closed set of related words along with their sy
127151
The entity is a good fit when the text data:
128152

129153
* Are a known set.
154+
* Doesn't change often. If you need to change the list often or want the list to self-expand, a simple entity boosted with a phrase list is a better choice.
130155
* The set doesn't exceed the maximum LUIS [boundaries](luis-boundaries.md) for this entity type.
131-
* The text in the utterance is an exact match with a synonym or the canonical name. LUIS doesn't use the list beyond exact text matches. Stemming, plurals, and other variations are not resolved with a list entity. To manage variations, consider using a [pattern](luis-concept-patterns.md#syntax-to-mark-optional-text-in-a-template-utterance) with the optional text syntax.
156+
* The text in the utterance is an exact match with a synonym or the canonical name. LUIS doesn't use the list beyond exact text matches. Fuzzy matching, case-insensitivity, stemming, plurals, and other variations are not resolved with a list entity. To manage variations, consider using a [pattern](luis-concept-patterns.md#syntax-to-mark-optional-text-in-a-template-utterance) with the optional text syntax.
132157

133158
![list entity](./media/luis-concept-entities/list-entity.png)
134159

@@ -152,10 +177,11 @@ In the following table, each row has two versions of the utterance. The top utte
152177

153178
|Utterance|
154179
|--|
155-
|`Was The Man Who Mistook His Wife for a Hat and Other Clinical Tales written by an American this year?<br>Was **The Man Who Mistook His Wife for a Hat and Other Clinical Tales** written by an American this year?|
156-
|`Was Half Asleep in Frog Pajamas written by an American this year?`<br>`Was **Half Asleep in Frog Pajamas** written by an American this year?`|
157-
|`Was The Particular Sadness of Lemon Cake: A Novel written by an American this year?`<br>`Was **The Particular Sadness of Lemon Cake: A Novel** written by an American this year?`|
158-
|`Was There's A Wocket In My Pocket! written by an American this year?`<br>`Was **There's A Wocket In My Pocket!** written by an American this year?`|
180+
|Was The Man Who Mistook His Wife for a Hat and Other Clinical Tales written by an American this year?<br><br>Was **The Man Who Mistook His Wife for a Hat and Other Clinical Tales** written by an American this year?|
181+
|Was Half Asleep in Frog Pajamas written by an American this year?<br><br>Was **Half Asleep in Frog Pajamas** written by an American this year?|
182+
|Was The Particular Sadness of Lemon Cake: A Novel written by an American this year?<br><br>Was **The Particular Sadness of Lemon Cake: A Novel** written by an American this year?|
183+
|Was There's A Wocket In My Pocket! written by an American this year?<br><br>Was **There's A Wocket In My Pocket!** written by an American this year?|
184+
||
159185

160186
## Prebuilt entity
161187

@@ -220,6 +246,18 @@ The entity is a good fit when:
220246
[Tutorial](luis-quickstart-intents-regex-entity.md)<br>
221247
[Example JSON response for entity](luis-concept-data-extraction.md#regular-expression-entity-data)<br>
222248

249+
Regular expressions may match more than you expect to match. An example of this is numeric word matching such as `one` and `two`. An example is the following regex, which matches the number `one` along with other numbers:
250+
251+
```javascript
252+
(plus )?(zero|one|two|three|four|five|six|seven|eight|nine)(\s+(zero|one|two|three|four|five|six|seven|eight|nine))*
253+
```
254+
255+
This regex expression also matches any words that end with these numbers, such as `phone`. In order to fix issues like this, make sure the regex matches takes into account word boundaries. The regex to use word boundaries for this example is used in the following regex:
256+
257+
```javascript
258+
\b(plus )?(zero|one|two|three|four|five|six|seven|eight|nine)(\s+(zero|one|two|three|four|five|six|seven|eight|nine))*\b
259+
```
260+
223261
## Simple entity
224262
225263
A simple entity is a generic entity that describes a single concept and is learned from the machine-learned context. Because simple entities are generally names such as company names, product names, or other categories of names, add a [phrase list](luis-concept-feature.md) when using a simple entity to boost the signal of the names used.

0 commit comments

Comments
 (0)