Skip to content

Commit b14de65

Browse files
authored
Merge pull request #202680 from whhender/short-term-language-updates3
Short term language updates3
2 parents f6eafb7 + f2cb833 commit b14de65

File tree

4 files changed

+56
-59
lines changed

4 files changed

+56
-59
lines changed

articles/purview/concept-best-practices-classification.md

Lines changed: 20 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
---
2-
title: Microsoft Purview classification best practices
3-
description: This article provides best practices for classification in Microsoft Purview.
2+
title: Classification best practices for the Microsoft Purview governance portal
3+
description: This article provides best practices for classification in the Microsoft Purview governance portal so you can effectively identify sensitive data across your environment.
44
author: amberz
55
ms.author: amberz
66
ms.service: purview
@@ -9,25 +9,21 @@ ms.topic: conceptual
99
ms.date: 11/18/2021
1010
---
1111

12-
# Microsoft Purview classification best practices
12+
# Classification best practices in the Microsoft Purview governance portal
1313

14-
Data classification, in the context of Microsoft Purview, is a way of categorizing data assets by assigning unique logical labels or classes to the data assets. Classification is based on the business context of the data. For example, you might classify assets by *Passport Number*, *Driver's License Number*, *Credit Card Number*, *SWIFT Code*, *Person’s Name*, and so on.
14+
Data classification in the Microsoft Purview governance portal is a way of categorizing data assets by assigning unique logical labels or classes to the data assets. Classification is based on the business context of the data. For example, you might classify assets by *Passport Number*, *Driver's License Number*, *Credit Card Number*, *SWIFT Code*, *Person’s Name*, and so on. To learn more about classification itself, see our [classification article](concept-classification.md).
1515

16-
To learn more about classification, see [Classification](concept-classification.md).
16+
This article describes best practices to adopt when you're classifying data assets, so that your scans will be more effective and you have the most complete information possible about your entire data estate.
1717

18-
## Classification best practices
19-
20-
This section describes best practices to adopt when you're classifying data assets.
21-
22-
### Scan rule set
18+
## Scan rule set
2319

2420
By using a *scan rule set*, you can configure the relevant classifications that should be applied to the particular scan for the data source. Select the relevant system classifications, or select custom classifications if you've created one for the data you're scanning.
2521

2622
For example, in the following image, only the specific selected system and custom classifications will be applied for the data source you're scanning (for example, financial data).
2723

2824
:::image type="content" source="./media/concept-best-practices/classification-select-classification-rules-example-3.png" alt-text="Screenshot that shows a selected classification rule." lightbox="./media/concept-best-practices/classification-select-classification-rules-example-3.png":::
2925

30-
### Annotation management
26+
## Annotation management
3127

3228
While you're deciding on which classifications to apply, we recommend that you:
3329

@@ -41,7 +37,7 @@ While you're deciding on which classifications to apply, we recommend that you:
4137

4238
:::image type="content" source="./media/concept-best-practices/classification-classification-rules-example-2.png" alt-text="Screenshot that shows the 'Classification rules' pane." lightbox="./media/concept-best-practices/classification-classification-rules-example-2.png":::
4339

44-
### Custom classifications
40+
## Custom classifications
4541

4642
Create custom classifications only if the available system classifications don't meet your needs.
4743

@@ -55,7 +51,7 @@ When you create and configure the classification rules for a custom classificati
5551

5652
* Select the appropriate classification name for which the classification rule is to be created.
5753

58-
* Microsoft Purview supports the following two methods for creating custom classification rules:
54+
* The Microsoft Purview governance portal supports the following two methods for creating custom classification rules:
5955
* Use the **Regular expression** (regex) method if you can consistently express the data element by using a regular expression pattern or you can generate the pattern by using a data file. Ensure that the sample data reflects the population.
6056
* Use the **Dictionary** method only if the list of values in the dictionary file represents all possible values of data to be classified and is expected to conform to a given set of data (considering future values as well).
6157

@@ -85,9 +81,9 @@ When you create and configure the classification rules for a custom classificati
8581

8682
* This method supports .csv and .tsv files, with a file size limit of 30 megabytes (MB).
8783

88-
### Custom classification archetypes
84+
## Custom classification archetypes
8985

90-
**How the "threshold" parameter works in the regular expression**
86+
### How the "threshold" parameter works in the regular expression
9187

9288
* Consider the sample source data in the following image. There are five columns, and the custom classification rule should be applied to columns **Sample_col1**, **Sample_col2**, and **Sample_col3** for the data pattern *N{Digit}{Digit}{Digit}AN*.
9389

@@ -103,11 +99,11 @@ When you create and configure the classification rules for a custom classificati
10399

104100
:::image type="content" source="./media/concept-best-practices/classification-custom-classification-rule-threshold-11.png" alt-text="Screenshot that shows thresholds of a custom classification rule." lightbox="./media/concept-best-practices/classification-custom-classification-rule-threshold-11.png":::
105101

106-
If you have a threshold of 55%, only columns **Sample_col1** and **Sample_col2** will be classified. **Sample_col3** will not be classified, because it doesn't meet the 55% threshold criterion.
102+
If you have a threshold of 55%, only columns **Sample_col1** and **Sample_col2** will be classified. **Sample_col3** won't be classified, because it doesn't meet the 55% threshold criterion.
107103

108104
:::image type="content" source="./media/concept-best-practices/classification-test-custom-classification-rule-12.png" alt-text="Screenshot that shows the result of a high-threshold criterion." lightbox="./media/concept-best-practices/classification-test-custom-classification-rule-12.png":::
109105

110-
**How to use both data and column patterns**
106+
### How to use both data and column patterns
111107

112108
* For the given sample data, where both column **B** and column **C** have similar data patterns, you can classify on column **B** based on the data pattern "^P[0-9]{3}[A-Z]{2}$".
113109

@@ -124,7 +120,7 @@ When you create and configure the classification rules for a custom classificati
124120

125121
:::image type="content" source="./media/concept-best-practices/classification-custom-classification-rule-column-pattern-15.png" alt-text="Screenshot that shows a column pattern." lightbox="./media/concept-best-practices/classification-custom-classification-rule-column-pattern-15.png":::
126122

127-
**How to use multiple column patterns**
123+
### How to use multiple column patterns
128124

129125
If there are multiple column patterns to be classified for the same classification rule, use pipe (|) character-separated column names. For example, for columns **Product ID**, **Product_ID**, **ProductID**, and so on, write the column pattern as shown in the following image:
130126

@@ -141,15 +137,15 @@ Here are some considerations to bear in mind as you're defining classifications:
141137
* Set priorities and develop a plan to achieve the security and compliance needs of an organization.
142138
* Describe the phases in the data preparation processes (raw zone, landing zone, and so on) and assign the classifications to specific assets to mark the phase in the process.
143139

144-
* With Microsoft Purview, you can assign classifications at the asset or column level automatically by including relevant classifications in the scan rule, or you can assign them manually after you ingest the metadata into Microsoft Purview.
145-
* For automatic assignment, see [Supported data stores in Microsoft Purview](./azure-purview-connector-overview.md).
146-
* Before you scan your data sources in Microsoft Purview, it is important to understand your data and configure the appropriate scan rule set for it (for example, by selecting relevant system classification, custom classifications, or a combination of both), because it could affect your scan performance. For more information, see [Supported classifications in Microsoft Purview](./supported-classifications.md).
147-
* The Microsoft Purview scanner applies data sampling rules for deep scans (subject to classification) for both system and custom classifications. The sampling rule is based on the type of data sources. For more information, see the "Sampling within a file" section in [Supported data sources and file types in Microsoft Purview](./sources-and-scans.md#sampling-within-a-file).
140+
* You can assign classifications at the asset or column level automatically by including relevant classifications in the scan rule, or you can assign them manually after you ingest the metadata into the Microsoft Purview Data Map.
141+
* For automatic assignment, see [supported data stores in the Microsoft Purview governance portal](./azure-purview-connector-overview.md).
142+
* Before you scan your data sources in the Microsoft Purview Data Map, it's important to understand your data and configure the appropriate scan rule set for it (for example, by selecting relevant system classification, custom classifications, or a combination of both), because it could affect your scan performance. For more information, see [supported classifications in the Microsoft Purview governance portal](./supported-classifications.md).
143+
* The Microsoft Purview scanner applies data sampling rules for deep scans (subject to classification) for both system and custom classifications. The sampling rule is based on the type of data sources. For more information, see the "Sampling within a file" section in [Supported data sources and file types in Microsoft Purview](./sources-and-scans.md#sampling-within-a-file).
148144

149145
> [!Note]
150146
> **Distinct data threshold**: This is the total number of distinct data values that need to be found in a column before the scanner runs the data pattern on it. Distinct data threshold has nothing to do with pattern matching but it is a pre-requisite for pattern matching. System classification rules require there to be at least 8 distinct values in each column to subject them to classification. The system requires this value to make sure that the column contains enough data for the scanner to accurately classify it. For example, a column that contains multiple rows that all contain the value 1 won't be classified. Columns that contain one row with a value and the rest of the rows have null values also won't get classified. If you specify multiple patterns, this value applies to each of them.
151147
152-
* The sampling rules apply to resource sets as well. For more information, see the "Resource set file sampling" section in [Supported data sources and file types in Microsoft Purview](./sources-and-scans.md#resource-set-file-sampling).
148+
* The sampling rules apply to resource sets as well. For more information, see the "Resource set file sampling" section in [supported data sources and file types in the Microsoft Purview governance portal](./sources-and-scans.md#resource-set-file-sampling).
153149
* Custom classifications can't be applied on document type assets using custom classification rules. Classifications for such types can be applied manually only.
154150
* Custom classifications aren't included in any default scan rules. Therefore, if automatic assignment of custom classifications is expected, you must deploy and use a custom scan rule that includes the custom classification to run the scan.
155151
* If you apply classifications manually from the Microsoft Purview governance portal, such classifications are retained in subsequent scans.
@@ -158,6 +154,7 @@ Here are some considerations to bear in mind as you're defining classifications:
158154

159155

160156
## Next steps
157+
161158
- [Apply system classification](./apply-classifications.md)
162159
- [Create custom classification](./create-a-custom-classification-and-classification-rule.md)
163160

articles/purview/concept-classification.md

Lines changed: 9 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
---
2-
title: Understand data classification feature in Microsoft Purview
3-
description: This article explains the concept of data classification in Microsoft Purview.
2+
title: Understand data classification in the Microsoft Purview governance portal
3+
description: This article explains the concepts behind data classification in the Microsoft Purview governance portal.
44
author: ankitscribbles
55
ms.author: ankitgup
66
ms.service: purview
@@ -9,17 +9,17 @@ ms.topic: conceptual
99
ms.date: 01/04/2022
1010
---
1111

12-
# Data Classification in Microsoft Purview
12+
# Data classification in the Microsoft Purview governance portal
1313

14-
Data classification, in the context of Microsoft Purview, is a way of categorizing data assets by assigning unique logical tags or classes to the data assets. Classification is based on the business context of the data. For example, you might classify assets by *Passport Number*, *Driver's License Number*, *Credit Card Number*, *SWIFT Code*, *Person’s Name*, and so on.
14+
Data classification in the Microsoft Purview governance portal is a way of categorizing data assets by assigning unique logical tags or classes to the data assets. Classification is based on the business context of the data. For example, you might classify assets by *Passport Number*, *Driver's License Number*, *Credit Card Number*, *SWIFT Code*, *Person’s Name*, and so on.
1515

1616
When you classify data assets, you make them easier to understand, search, and govern. Classifying data assets also helps you understand the risks associated with them. This in turn can help you implement measures to protect sensitive or important data from ungoverned proliferation and unauthorized access across the data estate.
1717

18-
Microsoft Purview provides an automated classification capability while you scan your data sources. You get more than 200+ built-in system classifications and the ability to create custom classifications for your data. You can classify assets automatically when they're configured as part of a scan, or you can edit them manually in the Microsoft Purview governance portal after they're scanned and ingested.
18+
The Microsoft Purview Data Map provides an automated classification capability while you scan your data sources. You get more than 200+ built-in system classifications and the ability to create custom classifications for your data. You can classify assets automatically when they're configured as part of a scan, or you can edit them manually in the Microsoft Purview governance portal after they're scanned and ingested.
1919

2020
## Use of classification
2121

22-
Classification is the process of organizing data into *logical categories* that make the data easy to retrieve, sort, and identify for future use. This can be particularly important for data governance. Among other reasons, classifying data assets is important because it helps you:
22+
Classification is the process of organizing data into *logical categories* that make the data easy to retrieve, sort, and identify for future use. This can be important for data governance. Among other reasons, classifying data assets is important because it helps you:
2323

2424
* Narrow down the search for data assets that you're interested in.
2525
* Organize and understand the variety of data classes that are important in your organization and where they're stored.
@@ -31,9 +31,9 @@ As shown in the following image, it's possible to apply classifications at both
3131

3232
## Types of classification
3333

34-
Microsoft Purview supports both system and custom classifications.
34+
The Microsoft Purview governance portal supports both system and custom classifications.
3535

36-
* **System classifications**: Microsoft Purview supports 200+ system classifications out of the box. For the entire list of available system classifications, see [Supported classifications in Microsoft Purview](./supported-classifications.md).
36+
* **System classifications**: 200+ system classifications supported out of the box. For the entire list of available system classifications, see [supported classifications in the Microsoft Purview governance portal](./supported-classifications.md).
3737

3838
In the example in the preceding image, *Person’s Name* is a system classification.
3939

@@ -43,7 +43,7 @@ Custom classification rules can be based on a *regular expression* pattern or *d
4343
Let's say that the *Employee ID* column follows the EMPLOYEE{GUID} pattern (for example, EMPLOYEE9c55c474-9996-420c-a285-0d0fc23f1f55). You can create your own custom classification by using a regular expression, such as `\^Employee\[A-Za-z0-9\]{8}-\[A-Za-z0-9\]{4}-\[A-Za-z0-9\]{4}-\[A-Za-z0-9\]{4}-\[A-Za-z0-9\]{12}\$`.
4444

4545
> [!NOTE]
46-
> Sensitivity labels are different from classifications. Sensitivity labels categorize assets in the context of data security and privacy, such as *Highly Confidential*, *Restricted*, *Public*, and so on. To use sensitivity labels in the Microsoft Purview data map, you'll need at least one Microsoft 365 license or account within the same Azure Active Directory (Azure AD) tenant as your Microsoft Purview data map. For more information about the differences between sensitivity labels and classifications, see [Sensitivity labels in Microsoft Purview FAQ](sensitivity-labels-frequently-asked-questions.yml#what-is-the-difference-between-classifications-and-sensitivity-labels).
46+
> Sensitivity labels are different from classifications. Sensitivity labels categorize assets in the context of data security and privacy, such as *Highly Confidential*, *Restricted*, *Public*, and so on. To use sensitivity labels in the Microsoft Purview Data Map, you'll need at least one Microsoft 365 license or account within the same Azure Active Directory (Azure AD) tenant as your Microsoft Purview Data Map. For more information about the differences between sensitivity labels and classifications, see [sensitivity labels in the Microsoft Purview governance portal FAQ](sensitivity-labels-frequently-asked-questions.yml#what-is-the-difference-between-classifications-and-sensitivity-labels).
4747
4848
## Next steps
4949

0 commit comments

Comments
 (0)