You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/search/cognitive-search-skill-entity-recognition.md
+12-11Lines changed: 12 additions & 11 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -11,7 +11,7 @@ ms.topic: conceptual
11
11
ms.date: 11/04/2019
12
12
---
13
13
14
-
#Entity Recognition cognitive skill
14
+
#Entity Recognition cognitive skill
15
15
16
16
The **Entity Recognition** skill extracts entities of different types from text. This skill uses the machine learning models provided by [Text Analytics](https://docs.microsoft.com/azure/cognitive-services/text-analytics/overview) in Cognitive Services.
17
17
@@ -31,29 +31,29 @@ The maximum size of a record should be 50,000 characters as measured by [`String
31
31
32
32
Parameters are case-sensitive and are all optional.
33
33
34
-
| Parameter name| Description |
34
+
| Parameter name| Description |
35
35
|--------------------|-------------|
36
-
| categories| Array of categories that should be extracted. Possible category types: `"Person"`, `"Location"`, `"Organization"`, `"Quantity"`, `"Datetime"`, `"URL"`, `"Email"`. If no category is provided, all types are returned.|
37
-
|defaultLanguageCode |Language code of the input text. The following languages are supported: `ar, cs, da, de, en, es, fi, fr, hu, it, ja, ko, nl, no, pl, pt-BR, pt-PT, ru, sv, tr, zh-hans`. Not all entity categories are supported for all languages; see note below.|
36
+
| categories| Array of categories that should be extracted. Possible category types: `"Person"`, `"Location"`, `"Organization"`, `"Quantity"`, `"Datetime"`, `"URL"`, `"Email"`. If no category is provided, all types are returned.|
37
+
|defaultLanguageCode |Language code of the input text. The following languages are supported: `ar, cs, da, de, en, es, fi, fr, hu, it, ja, ko, nl, no, pl, pt-BR, pt-PT, ru, sv, tr, zh-hans`. Not all entity categories are supported for all languages; see note below.|
38
38
|minimumPrecision | A value between 0 and 1. If the confidence score (in the `namedEntities` output) is lower than this value, the entity is not returned. The default is 0. |
39
39
|includeTypelessEntities | Set to `true` if you want to recognize well-known entities that don't fit the current categories. Recognized entities are returned in the `entities` complex output field. For example, "Windows 10" is a well-known entity (a product), but since "Products" is not a supported category, this entity would be included in the entities output field. Default is `false`|
40
40
41
41
42
42
## Skill inputs
43
43
44
-
| Input name| Description |
44
+
| Input name| Description |
45
45
|---------------|-------------------------------|
46
-
| languageCode| Optional. Default is `"en"`. |
46
+
| languageCode| Optional. Default is `"en"`. |
47
47
| text | The text to analyze. |
48
48
49
49
## Skill outputs
50
50
51
51
> [!NOTE]
52
52
> Not all entity categories are supported for all languages. The `"Person"`, `"Location"`, and `"Organization"` entity category types are supported for the full list of languages above. Only _de_, _en_, _es_, _fr_, and _zh-hans_ support extraction of `"Quantity"`, `"Datetime"`, `"URL"`, and `"Email"` types. For more information, see [Language and region support for the Text Analytics API](https://docs.microsoft.com/azure/cognitive-services/text-analytics/language-support).
53
53
54
-
| Output name| Description |
54
+
| Output name| Description |
55
55
|---------------|-------------------------------|
56
-
| persons| An array of strings where each string represents the name of a person. |
56
+
| persons| An array of strings where each string represents the name of a person. |
57
57
| locations | An array of strings where each string represents a location. |
58
58
| organizations | An array of strings where each string represents an organization. |
59
59
| quantities | An array of strings where each string represents a quantity. |
@@ -63,7 +63,7 @@ Parameters are case-sensitive and are all optional.
63
63
| namedEntities | An array of complex types that contains the following fields: <ul><li>category</li> <li>value (The actual entity name)</li><li>offset (The location where it was found in the text)</li><li>confidence (Higher value means it's more to be a real entity)</li></ul> |
64
64
| entities | An array of complex types that contains rich information about the entities extracted from text, with the following fields <ul><li> name (the actual entity name. This represents a "normalized" form)</li><li> wikipediaId</li><li>wikipediaLanguage</li><li>wikipediaUrl (a link to Wikipedia page for the entity)</li><li>bingId</li><li>type (the category of the entity recognized)</li><li>subType (available only for certain categories, this gives a more granular view of the entity type)</li><li> matches (a complex collection that contains)<ul><li>text (the raw text for the entity)</li><li>offset (the location where it was found)</li><li>length (the length of the raw entity text)</li></ul></li></ul> |
65
65
66
-
##Sample definition
66
+
##Sample definition
67
67
68
68
```json
69
69
{
@@ -93,7 +93,7 @@ Parameters are case-sensitive and are all optional.
93
93
]
94
94
}
95
95
```
96
-
##Sample input
96
+
##Sample input
97
97
98
98
```json
99
99
{
@@ -110,7 +110,7 @@ Parameters are case-sensitive and are all optional.
110
110
}
111
111
```
112
112
113
-
##Sample output
113
+
##Sample output
114
114
115
115
```json
116
116
{
@@ -183,6 +183,7 @@ Parameters are case-sensitive and are all optional.
183
183
}
184
184
```
185
185
186
+
Note that the offsets returned for entities in the output of this skill are directly returned from the [Text Analytics API](https://docs.microsoft.com/azure/cognitive-services/text-analytics/overview), which means if you are using them to index into the original string, you should use the [StringInfo](https://docs.microsoft.com/dotnet/api/system.globalization.stringinfo?view=netframework-4.8) class in .NET in order to extract the correct content. [More details can be found here.](https://docs.microsoft.com/azure/cognitive-services/text-analytics/concepts/text-offsets)
186
187
187
188
## Error cases
188
189
If the language code for the document is unsupported, an error is returned and no entities are extracted.
Copy file name to clipboardExpand all lines: articles/search/cognitive-search-skill-pii-detection.md
+10-9Lines changed: 10 additions & 9 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -11,7 +11,7 @@ ms.topic: conceptual
11
11
ms.date: 1/27/2020
12
12
---
13
13
14
-
#PII Detection cognitive skill
14
+
#PII Detection cognitive skill
15
15
16
16
> [!IMPORTANT]
17
17
> This skill is currently in public preview. Preview functionality is provided without a service level agreement, and is not recommended for production workloads. For more information, see [Supplemental Terms of Use for Microsoft Azure Previews](https://azure.microsoft.com/support/legal/preview-supplemental-terms/). There is currently no portal or .NET SDK support.
@@ -34,29 +34,29 @@ The maximum size of a record should be 50,000 characters as measured by [`String
34
34
35
35
Parameters are case-sensitive and all are optional.
36
36
37
-
| Parameter name| Description |
37
+
| Parameter name| Description |
38
38
|--------------------|-------------|
39
-
| defaultLanguageCode |Language code of the input text. For now, only `en` is supported. |
39
+
| defaultLanguageCode |Language code of the input text. For now, only `en` is supported. |
40
40
| minimumPrecision | A value between 0.0 and 1.0. If the confidence score (in the `piiEntities` output) is lower than the set `minimumPrecision` value, the entity is not returned or masked. The default is 0.0. |
41
41
| maskingMode | A parameter that provides various ways to mask the detected PII in the input text. The following options are supported: <ul><li>`none` (default): This means that no masking will be performed and the `maskedText` output will not be returned. </li><li> `redact`: This option will remove the detected entities from the input text and not replace them with anything. Note that in this case, the offset in the `piiEntities` output will be in relation to the original text, and not the masked text. </li><li> `replace`: This option will replace the detected entities with the character given in the `maskingCharacter` parameter. The character will be repeated to the length of the detected entity so that the offsets will correctly correspond to both the input text as well as the output `maskedText`.</li></ul> |
42
42
| maskingCharacter | The character that will be used to masked the text if the `maskingMode` parameter is set to `replace`. The following options are supported: `*` (default), `#`, `X`. This parameter can only be `null` if `maskingMode` is not set to `replace`. |
43
43
44
44
45
45
## Skill inputs
46
46
47
-
| Input name| Description |
47
+
| Input name| Description |
48
48
|---------------|-------------------------------|
49
-
| languageCode| Optional. Default is `en`. |
49
+
| languageCode| Optional. Default is `en`. |
50
50
| text | The text to analyze. |
51
51
52
52
## Skill outputs
53
53
54
-
| Output name| Description |
54
+
| Output name| Description |
55
55
|---------------|-------------------------------|
56
56
| piiEntities | An array of complex types that contains the following fields: <ul><li>text (The actual PII as extracted)</li> <li>type</li><li>subType</li><li>score (Higher value means it's more likely to be a real entity)</li><li>offset (into the input text)</li><li>length</li></ul> </br> [Possible types and subTypes can be found here.](https://docs.microsoft.com/azure/cognitive-services/text-analytics/named-entity-types?tabs=personal)|
57
57
| maskedText | If `maskingMode` is set to a value other than `none`, this output will be the string result of the masking performed on the input text as described by the selected `maskingMode`. If `maskingMode` is set to `none`, this output will not be present. |
58
58
59
-
##Sample definition
59
+
##Sample definition
60
60
61
61
```json
62
62
{
@@ -81,7 +81,7 @@ Parameters are case-sensitive and all are optional.
81
81
]
82
82
}
83
83
```
84
-
##Sample input
84
+
##Sample input
85
85
86
86
```json
87
87
{
@@ -97,7 +97,7 @@ Parameters are case-sensitive and all are optional.
97
97
}
98
98
```
99
99
100
-
##Sample output
100
+
##Sample output
101
101
102
102
```json
103
103
{
@@ -123,6 +123,7 @@ Parameters are case-sensitive and all are optional.
123
123
}
124
124
```
125
125
126
+
Note that the offsets returned for entities in the output of this skill are directly returned from the [Text Analytics API](https://docs.microsoft.com/azure/cognitive-services/text-analytics/overview), which means if you are using them to index into the original string, you should use the [StringInfo](https://docs.microsoft.com/dotnet/api/system.globalization.stringinfo?view=netframework-4.8) class in .NET in order to extract the correct content. [More details can be found here.](https://docs.microsoft.com/azure/cognitive-services/text-analytics/concepts/text-offsets)
126
127
127
128
## Error and warning cases
128
129
If the language code for the document is unsupported, a warning is returned and no entities are extracted.
0 commit comments