You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/cognitive-services/language-service/conversational-language-understanding/concepts/data-formats.md
+36-12Lines changed: 36 additions & 12 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -8,7 +8,7 @@ manager: nitinme
8
8
ms.service: cognitive-services
9
9
ms.subservice: language-service
10
10
ms.topic: conceptual
11
-
ms.date: 05/13/2022
11
+
ms.date: 10/14/2022
12
12
ms.author: aahi
13
13
ms.custom: language-service-custom-clu
14
14
---
@@ -24,14 +24,17 @@ If you're [importing a project](../how-to/create-project.md#import-project) into
24
24
25
25
```json
26
26
{
27
-
"projectFileVersion": "2022-05-01",
27
+
"projectFileVersion": "2022-10-01-preview",
28
28
"stringIndexType": "Utf16CodeUnit",
29
29
"metadata": {
30
30
"projectKind": "Conversation",
31
31
"projectName": "{PROJECT-NAME}",
32
32
"multilingual": true,
33
33
"description": "DESCRIPTION",
34
-
"language": "{LANGUAGE-CODE}"
34
+
"language": "{LANGUAGE-CODE}",
35
+
"settings": {
36
+
"confidenceThreshold": 0
37
+
}
35
38
},
36
39
"assets": {
37
40
"projectKind": "Conversation",
@@ -43,7 +46,7 @@ If you're [importing a project](../how-to/create-project.md#import-project) into
43
46
"entities": [
44
47
{
45
48
"category": "entity1",
46
-
"compositionSetting": "requireExactOverlap",
49
+
"compositionSetting": "{COMPOSITION-SETTING}",
47
50
"list": {
48
51
"sublists": [
49
52
{
@@ -61,8 +64,20 @@ If you're [importing a project](../how-to/create-project.md#import-project) into
61
64
},
62
65
"prebuilts": [
63
66
{
64
-
"category": "PREBUILT1"
67
+
"category": "{PREBUILT-COMPONENTS}"
65
68
}
69
+
],
70
+
"regex": {
71
+
"expressions": [
72
+
{
73
+
"regexKey": "regex1",
74
+
"language": "{LANGUAGE-CODE}",
75
+
"regexPattern": "{REGEX-PATTERN}"
76
+
}
77
+
]
78
+
},
79
+
"requiredComponents": [
80
+
"{REQUIRED-COMPONENTS}"
66
81
]
67
82
}
68
83
],
@@ -89,19 +104,28 @@ If you're [importing a project](../how-to/create-project.md#import-project) into
89
104
|Key |Placeholder |Value | Example |
90
105
|---------|---------|----------|--|
91
106
|`api-version`|`{API-VERSION}`| The version of the API you're calling. The value referenced here is for the latest released [model version](../../concepts/model-lifecycle.md#choose-the-model-version-used-on-your-data) released. |`2022-05-01`|
92
-
|`confidenceThreshold`|`{CONFIDENCE-THRESHOLD}`|This is the threshold score below which the intent will be predicted as [none intent](none-intent.md)|`0.7`|
107
+
|`confidenceThreshold`|`{CONFIDENCE-THRESHOLD}`|This is the threshold score below which the intent will be predicted as [none intent](none-intent.md). Values are from `0` to `1`|`0.7`|
93
108
|`projectName`|`{PROJECT-NAME}`| The name of your project. This value is case-sensitive. |`EmailApp`|
94
-
|`multilingual`|`true`| A boolean value that enables you to have documents in multiple languages in your dataset and when your model is deployed you can query the model in any supported language (not necessarily included in your training documents. See [Language support](../language-support.md#multi-lingual-option) for more information about supported language codes. |`true`|
95
-
|`sublists`|`[]`|Array containing a sublists|`[]`|
109
+
|`multilingual`|`true`| A boolean value that enables you to have utterances in multiple languages in your dataset and when your model is deployed you can query the model in any supported language (not necessarily included in your training documents. See [Language support](../language-support.md#multi-lingual-option) for more information about supported language codes. |`true`|
110
+
|`sublists`|`[]`|Array containing sublists. Each sublist is a key and its associated values.|`[]`|
111
+
|`compositionSetting`|`{COMPOSITION-SETTING}`|Rule that defines how to manage multiple components in your entity. Options are `combineComponents` or `separateComponents`. |`combineComponents`|
96
112
|`synonyms`|`[]`|Array containing all the synonyms|synonym|
97
-
|`language`|`{LANGUAGE-CODE}`| A string specifying the language code for the utterances used in your project. If your project is a multilingual project, choose the [language code](../language-support.md) of the majority of the utterances. |`en-us`|
98
-
|`intents`|`[]`| Array containing all the intents you have in the project. These are the intent types that will be extracted from your utterances.|`[]`|
99
-
|`entities`|`[]`| Array containing all the entities in your project. These are the entities that will be extracted from your utterances.|`[]`|
113
+
|`language`|`{LANGUAGE-CODE}`| A string specifying the language code for the utterances, synonyms, and regular expressions used in your project. If your project is a multilingual project, choose the [language code](../language-support.md) of the majority of the utterances. |`en-us`|
114
+
|`intents`|`[]`| Array containing all the intents you have in the project. These are the intents that will be classified from your utterances.|`[]`|
115
+
|`entities`|`[]`| Array containing all the entities in your project. These are the entities that will be extracted from your utterances. Every entity can have additional optional components defined with them: list, prebuilt, or regex. |`[]`|
100
116
|`dataset`|`{DATASET}`| The test set to which this utterance will go to when split before training. Learn more about data splitting [here](../how-to/train-model.md#data-splitting) . Possible values for this field are `Train` and `Test`. |`Train`|
101
117
|`category`|``| The type of entity associated with the span of text specified. |`Entity1`|
102
118
|`offset`|``| The inclusive character position of the start of the entity. |`5`|
103
119
|`length`|``| The character length of the entity. |`5`|
104
-
|`language`|`{LANGUAGE-CODE}`| A string specifying the language code for the utterances used in your project. If your project is a multilingual project, choose the [language code](../language-support.md) of the majority of the utterances. |`en-us`|
120
+
|`listKey`|``| A normalized value for the list of synonyms to map back to in prediction. |`Microsoft`|
121
+
|`values`|`{VALUES-FOR-LIST}`| A list of comma separated strings that will be matched exactly for extraction and map to the list key. |`"msft", "microsoft", "MS"`|
122
+
|`regexKey`|`{REGEX-PATTERN}`| A regular expression. |`ProductPattern1`|
123
+
|`regexPattern`|`{REGEX-PATTERN}`| A regular expression. |`^pre`|
124
+
|`prebuilts`|`{PREBUILT-COMPONENTS}`| The prebuilt components that can extract common types. You can find the list of prebuilts you can add [here](../prebuilt-component-reference.md). |`Quantity.Number`|
125
+
|`requiredComponents`|`{REQUIRED-COMPONENTS}`| A setting that specifies a requirement that a specific component be present to return the entity. You can learn more [here](./entity-components.md#required-components). The possible values are `learned`, `regex`, `list`, or `prebuilts`|`"learned", "prebuilt"`|
0 commit comments