Skip to content

Commit 68b78a8

Browse files
Merge pull request #267062 from HeidiSteen/heidist-fresh
[azure search] Addressed verbatim feedback in a second PR
2 parents 35f9944 + d881fa6 commit 68b78a8

6 files changed

+57
-57
lines changed

articles/search/cognitive-search-quickstart-blob.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@ ms.service: cognitive-search
1010
ms.custom:
1111
- ignite-2023
1212
ms.topic: quickstart
13-
ms.date: 11/30/2023
13+
ms.date: 02/22/2024
1414
---
1515
# Quickstart: Create a skillset in the Azure portal
1616

@@ -31,13 +31,13 @@ Before you begin, have the following prerequisites in place:
3131
+ Azure Storage account with Blob Storage.
3232

3333
> [!NOTE]
34-
> This quickstart uses [Azure AI services](https://azure.microsoft.com/services/cognitive-services/) for the AI. Because the workload is so small, Azure AI services is tapped behind the scenes for free processing for up to 20 transactions. You can complete this exercise without having to create an Azure AI multi-service resource.
34+
> This quickstart uses [Azure AI services](https://azure.microsoft.com/services/cognitive-services/) for the AI transformations. Because the workload is so small, Azure AI services is tapped behind the scenes for free processing for up to 20 transactions. You can complete this exercise without having to create an Azure AI multi-service resource.
3535
3636
## Set up your data
3737

3838
In the following steps, set up a blob container in Azure Storage to store heterogeneous content files.
3939

40-
1. [Download sample data](https://1drv.ms/f/s!As7Oy81M_gVPa-LCb5lC_3hbS-4) consisting of a small file set of different types. Unzip the files.
40+
1. [Download sample data](https://github.com/Azure-Samples/azure-search-sample-data/tree/main/ai-enrichment-mixed-media) consisting of a small file set of different types.
4141

4242
1. Sign in to the [Azure portal](https://portal.azure.com/) with your Azure account.
4343

@@ -187,7 +187,7 @@ When you're working in your own subscription, it's a good idea at the end of a p
187187

188188
You can find and manage resources in the portal, using the **All resources** or **Resource groups** link in the left-navigation pane.
189189

190-
If you use a free service, remember that you're limited to three indexes, indexers, and data sources. You can delete individual items in the portal to stay under the limit.
190+
If you used a free service, remember that you're limited to three indexes, indexers, and data sources. You can delete individual items in the portal to stay under the limit.
191191

192192
## Next steps
193193

articles/search/cognitive-search-skill-pii-detection.md

Lines changed: 13 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@ ms.service: cognitive-search
1010
ms.custom:
1111
- ignite-2023
1212
ms.topic: reference
13-
ms.date: 12/09/2021
13+
ms.date: 02/22/2024
1414
---
1515

1616
# Personally Identifiable Information (PII) Detection cognitive skill
@@ -27,35 +27,35 @@ Microsoft.Skills.Text.PIIDetectionSkill
2727

2828
## Data limits
2929

30-
The maximum size of a record should be 50,000 characters as measured by [`String.Length`](/dotnet/api/system.string.length). If you need to chunk your data before sending it to the skill, consider using the [Text Split skill](cognitive-search-skill-textsplit.md). If you do use a text split skill, set the page length to 5000 for the best performance.
30+
The maximum size of a record should be 50,000 characters as measured by [`String.Length`](/dotnet/api/system.string.length). You can use [Text Split skill](cognitive-search-skill-textsplit.md) for data chunking. Set the page length to 5000 for the best results.
3131

3232
## Skill parameters
3333

3434
Parameters are case-sensitive and all are optional.
3535

3636
| Parameter name | Description |
3737
|--------------------|-------------|
38-
| `defaultLanguageCode` | (Optional) The language code to apply to documents that don't specify language explicitly. If the default language code is not specified, English (en) will be used as the default language code. <br/> See the [full list of supported languages](../ai-services/language-service/personally-identifiable-information/language-support.md). |
39-
| `minimumPrecision` | A value between 0.0 and 1.0. If the confidence score (in the `piiEntities` output) is lower than the set `minimumPrecision` value, the entity is not returned or masked. The default is 0.0. |
40-
| `maskingMode` | A parameter that provides various ways to mask the personal information detected in the input text. The following options are supported: <ul><li>`"none"` (default): No masking occurs and the `maskedText` output will not be returned. </li><li> `"replace"`: Replaces the detected entities with the character given in the `maskingCharacter` parameter. The character will be repeated to the length of the detected entity so that the offsets will correctly correspond to both the input text and the output `maskedText`.</li></ul> <br/> When this skill was in public preview, the `maskingMode` option `redact` was also supported, which allowed removing the detected entities entirely without replacement. The `redact` option has since been deprecated and are longer be supported. |
41-
| `maskingCharacter` | The character used to mask the text if the `maskingMode` parameter is set to `replace`. The following option is supported: `*` (default). This parameter can only be `null` if `maskingMode` is not set to `replace`. <br/><br/> When this skill was in public preview, there was support for the `maskingCharacter` options, `X` and `#`. Both `X` and `#` options have since been deprecated and are longer be supported. |
42-
| `domain` | (Optional) A string value, if specified, will set the domain to include only a subset of the entity categories. Possible values include: `"phi"` (detect confidential health information only), `"none"`. |
43-
| `piiCategories` | (Optional) If you want to specify which entities will be detected and returned, use this optional parameter (defined as a list of strings) with the appropriate entity categories. This parameter can also let you detect entities that aren't enabled by default for your document language. See [Supported Personally Identifiable Information entity categories](../ai-services/language-service/personally-identifiable-information/concepts/entity-categories.md) for the full list. |
44-
| `modelVersion` | (Optional) Specifies the [version of the model](../ai-services/language-service/concepts/model-lifecycle.md) to use when calling personally identifiable information detection. It will default to the most recent version when not specified. We recommend you do not specify this value unless it's necessary. |
38+
| `defaultLanguageCode` | (Optional) The language code to apply to documents that don't specify language explicitly. If the default language code isn't specified, English (en) is the default language code. <br/> See the [full list of supported languages](../ai-services/language-service/personally-identifiable-information/language-support.md). |
39+
| `minimumPrecision` | A value between 0.0 and 1.0. If the confidence score (in the `piiEntities` output) is lower than the set `minimumPrecision` value, the entity isn't returned or masked. The default is 0.0. |
40+
| `maskingMode` | A parameter that provides various ways to mask the personal information detected in the input text. The following options are supported: <ul><li>`"none"` (default): No masking occurs and the `maskedText` output isn't returned. </li><li> `"replace"`: Replaces the detected entities with the character given in the `maskingCharacter` parameter. The character is repeated to the length of the detected entity so that the offsets will correctly correspond to both the input text and the output `maskedText`.</li></ul> |
41+
| `maskingCharacter` | The character used to mask the text if the `maskingMode` parameter is set to `replace`. The following option is supported: `*` (default). This parameter can only be `null` if `maskingMode` isn't set to `replace`. |
42+
| `domain` | (Optional) A string value, if specified, sets the domain to a subset of the entity categories. Possible values include: `"phi"` (detect confidential health information only), `"none"`. |
43+
| `piiCategories` | (Optional) If you want to specify which entities are detected and returned, use this optional parameter (defined as a list of strings) with the appropriate entity categories. This parameter can also let you detect entities that aren't enabled by default for your document language. See [Supported Personally Identifiable Information entity categories](../ai-services/language-service/personally-identifiable-information/concepts/entity-categories.md) for the full list. |
44+
| `modelVersion` | (Optional) Specifies the [version of the model](../ai-services/language-service/concepts/model-lifecycle.md) to use when calling personally identifiable information detection. It defaults to the most recent version when not specified. We recommend you don't specify this value unless it's necessary. |
4545

4646
## Skill inputs
4747

4848
| Input name | Description |
4949
|---------------|-------------------------------|
50-
| `languageCode` | A string indicating the language of the records. If this parameter is not specified, the default language code will be used to analyze the records. <br/>See the [full list of supported languages](../ai-services/language-service/personally-identifiable-information/language-support.md). |
50+
| `languageCode` | A string indicating the language of the records. If this parameter isn't specified, the default language code is used to analyze the records. <br/>See the [full list of supported languages](../ai-services/language-service/personally-identifiable-information/language-support.md). |
5151
| `text` | The text to analyze. |
5252

5353
## Skill outputs
5454

5555
| Output name | Description |
5656
|---------------|-------------------------------|
5757
| `piiEntities` | An array of complex types that contains the following fields: <ul><li>`"text"` (The actual personally identifiable information as extracted)</li> <li>`"type"`</li><li>`"subType"`</li><li>`"score"` (Higher value means it's more likely to be a real entity)</li><li>`"offset"` (into the input text)</li><li>`"length"`</li></ul> </br> See [Supported Personally Identifiable Information entity categories](../ai-services/language-service/personally-identifiable-information/concepts/entity-categories.md) for the full list. |
58-
| `maskedText` | If `maskingMode` is set to a value other than `none`, this output will be the string result of the masking performed on the input text as described by the selected `maskingMode`. If `maskingMode` is set to `none`, this output will not be present. |
58+
| `maskedText` | This output varies depending `maskingMode`. If `maskingMode` is `replace`, output is the string result of the masking performed over the input text, as described by the `maskingMode`. If `maskingMode` is `none`, there's no output. |
5959

6060
## Sample definition
6161

@@ -125,13 +125,13 @@ Parameters are case-sensitive and all are optional.
125125
}
126126
```
127127

128-
The offsets returned for entities in the output of this skill are directly returned from the [Language Service APIs](../ai-services/language-service/overview.md), which means if you are using them to index into the original string, you should use the [StringInfo](/dotnet/api/system.globalization.stringinfo) class in .NET in order to extract the correct content. For more information, see [Multilingual and emoji support in Language service features](../ai-services/language-service/concepts/multilingual-emoji-support.md).
128+
The offsets returned for entities in the output of this skill are directly returned from the [Language Service APIs](../ai-services/language-service/overview.md), which means if you're using them to index into the original string, you should use the [StringInfo](/dotnet/api/system.globalization.stringinfo) class in .NET in order to extract the correct content. For more information, see [Multilingual and emoji support in Language service features](../ai-services/language-service/concepts/multilingual-emoji-support.md).
129129

130130
## Errors and warnings
131131

132132
If the language code for the document is unsupported, a warning is returned and no entities are extracted.
133133
If your text is empty, a warning is returned.
134-
If your text is larger than 50,000 characters, only the first 50,000 characters will be analyzed and a warning will be issued.
134+
If your text is larger than 50,000 characters, only the first 50,000 characters are analyzed and a warning is issued.
135135

136136
If the skill returns a warning, the output `maskedText` may be empty, which can impact any downstream skills that expect the output. For this reason, be sure to investigate all warnings related to missing output when writing your skillset definition.
137137

0 commit comments

Comments
 (0)