|
| 1 | +The Microsoft Purview Information Protection scanner extends labeling and protection to on-premises files. Before running scans, you need to define scanner settings, install the service, configure authentication, and optionally enable data loss prevention (DLP) rules. |
| 2 | + |
| 3 | +## Step 1: Configure scanner settings in the portal |
| 4 | + |
| 5 | +Start in the Microsoft Purview portal. First, define a scanner cluster, which acts as a logical container for the scanner instance. Then, create one or more content scan jobs to specify the schedule, labeling behavior, and repositories to scan. |
| 6 | + |
| 7 | +You need one of these Microsoft Purview roles to configure the scanner in the portal: |
| 8 | + |
| 9 | +- Compliance Administrator |
| 10 | +- Compliance Data Administrator |
| 11 | +- Security Administrator |
| 12 | +- Organization Management |
| 13 | + |
| 14 | +### Steps to configure scanner settings |
| 15 | + |
| 16 | +1. Go to the [Microsoft Purview portal](https://purview.microsoft.com/?azure-portal=true) > **Settings** > **Information Protection** > **Information protection scanner**. |
| 17 | +1. On the Clusters tab, select **Add**. |
| 18 | +1. Give your cluster a **name** and optional **description**. |
| 19 | +1. Select **Save** to save your changes. |
| 20 | + |
| 21 | +The cluster name is required later when you install the scanner. |
| 22 | + |
| 23 | +### Create the content scan job |
| 24 | + |
| 25 | +1. On the **Content scan jobs** tab, select **Add**. |
| 26 | +1. Configure the scan job settings: |
| 27 | + |
| 28 | + - **Schedule**: Controls how often the scan runs. The default setting is **Manual**. Change it to **Always** to enable continuous scanning. |
| 29 | + - **Info types to be discovered**: Determines which information types are identified during scanning. Choose Policy only to scan only for types defined in your labeling policy, or All to include all built-in types. |
| 30 | + - **Enable DLP rules**: Activates enforcement of data loss prevention policies. Set to On only if a DLP policy is already configured in Microsoft Purview. |
| 31 | + - **Enforce sensitivity labeling policy**: Turns automatic labeling on or off. Set this to On if you want the scanner to apply sensitivity labels based on content. |
| 32 | + - **Relabel files**: Specifies whether files that already have a label can be relabeled. Turn this on if you want labels to be updated during scanning. |
| 33 | + - **Preserve metadata**: Keeps original file attributes such as Date modified, Last modified, and Modified by during scanning. This setting is On by default. |
| 34 | + |
| 35 | + :::image type="content" source="../media/content-scan-job.png" alt-text="Screenshot showing where to edit a content scan job for the Microsoft Purview Information Protection scanner." lightbox="../media/content-scan-job.png"::: |
| 36 | + |
| 37 | +1. Add repositories on the **Repositories** tab: |
| 38 | + - Use UNC paths, for example: `\\Server\Folder` |
| 39 | + - Use local file paths, for example: `C:\Folder` |
| 40 | + - Add SharePoint Server libraries, for example: `http://sp2016/Shared Documents/Reports` |
| 41 | + - To scan an entire SharePoint root site, for example `http://sp2016`, the scanner account must have **Site Collection Auditor** permissions |
| 42 | + |
| 43 | + :::image type="content" source="../media/repository-settings.png" alt-text="Screenshot showing the repository settings for the Microsoft Purview Information Protection scanner." lightbox="../media/repository-settings.png"::: |
| 44 | + |
| 45 | +## Step 2: Install the scanner service |
| 46 | + |
| 47 | +Once you configured your cluster and scan job, the next step is to install the scanner service on a supported Windows Server. This step uses PowerShell and requires that you meet a few key setup requirements. |
| 48 | + |
| 49 | +### Prerequisites |
| 50 | + |
| 51 | +Before you begin installation, make sure the following requirements are met: |
| 52 | + |
| 53 | +- You're installing on a **64-bit Windows Server 2016 or later**. |
| 54 | +- The server has at least **4 CPU cores**, **8 GB of RAM**, and **10 GB of available disk space**. |
| 55 | +- The **Microsoft Purview Information Protection client** (full version) is already installed on the server. |
| 56 | +- A **SQL Server 2016 or later** instance is available to store the scanner's configuration database. This can be a local or remote SQL Server. |
| 57 | +- You have the **scanner cluster name** you defined earlier in the Microsoft Purview portal. This is required during installation. |
| 58 | + |
| 59 | +### Install using PowerShell |
| 60 | + |
| 61 | +Open a PowerShell session with Run as administrator on the Windows Server that hosts the scanner. Run this command to install the scanner: |
| 62 | + |
| 63 | +``` powershell |
| 64 | +Install-Scanner -SqlServerInstance <SQLServerName> -Cluster <ClusterName> |
| 65 | +``` |
| 66 | + |
| 67 | +Example: |
| 68 | + |
| 69 | +``` powershell |
| 70 | +Install-Scanner -SqlServerInstance SQL01\SCANNER -Cluster Europe |
| 71 | +``` |
| 72 | + |
| 73 | +After you run the command, you'll be prompted to enter the credentials for the scanner service account. This should be an Active Directory account that is synced to Microsoft Entra ID. |
| 74 | + |
| 75 | +When the installation completes, a Windows service named **Microsoft Purview Information Protection Scanner** is created and set to run under the service account you provided. |
| 76 | + |
| 77 | +## Step 3: Authenticate with Microsoft Entra ID |
| 78 | + |
| 79 | +To allow the scanner to run unattended and apply sensitivity labels, it needs to authenticate with Microsoft Entra ID using an app registration. This step connects the scanner to Microsoft Purview services securely and enables policy enforcement. |
| 80 | + |
| 81 | +### Set up API permissions in Microsoft Entra ID |
| 82 | + |
| 83 | +The app registration used by the scanner must be granted specific permissions: |
| 84 | + |
| 85 | +- Azure Rights Management Service |
| 86 | + - `Content.DelegatedReader`: allows the scanner to read sensitivity labels and policies. |
| 87 | + - `Content.DelegatedWriter`: allows the scanner to apply or remove labels and protection. |
| 88 | + |
| 89 | +- Microsoft Information Protection Sync Service |
| 90 | + - `UnifiedPolicy.Tenant.Read`: allows the scanner to retrieve labeling policies. |
| 91 | + |
| 92 | +These permissions must be added and admin consent granted in the Microsoft Entra admin center before proceeding. |
| 93 | + |
| 94 | +### Authenticate using PowerShell |
| 95 | + |
| 96 | +Once permissions are configured, use this PowerShell command to authenticate the scanner: |
| 97 | + |
| 98 | +``` powershell |
| 99 | +Set-Authentication -AppId <AppID> -AppSecret <Secret> -TenantId <TenantID> -DelegatedUser <EntraUser> |
| 100 | +``` |
| 101 | + |
| 102 | +Example: |
| 103 | + |
| 104 | +``` powershell |
| 105 | +Set-Authentication -AppId "77c3c1c3-abf9-404e-8b2b-4652836c8c66" ` |
| 106 | + -AppSecret "OAkk+rnuYc/u+]ah2kNxVbtrDGbS47L4" ` |
| 107 | + -TenantId "9c11c87a-ac8b-46a3-8d5c-f4d0b72ee29a" ` |
| 108 | + |
| 109 | +``` |
| 110 | + |
| 111 | +If the scanner service account isn't allowed to sign in locally, use the `-OnBehalfOf` parameter along with credentials for the service account: |
| 112 | + |
| 113 | +Example: |
| 114 | + |
| 115 | +``` powershell |
| 116 | +$creds = Get-Credential CONTOSO\scanner |
| 117 | +Set-Authentication -AppId "<AppID>" -AppSecret "<Secret>" -TenantId "<TenantID>" ` |
| 118 | + -DelegatedUser [email protected] -OnBehalfOf $creds |
| 119 | +``` |
| 120 | + |
| 121 | +After this step, the scanner is registered with Microsoft Entra ID and ready to classify and protect content according to your configured policies. |
| 122 | + |
| 123 | +## Step 4: Turn on policy enforcement |
| 124 | + |
| 125 | +When first created, content scan jobs are set to discovery mode by default. This means the scanner scans files but doesn't apply labels or enforce protection. To move from discovery to enforcement, you need to update the scan job configuration. |
| 126 | + |
| 127 | +### Update the scan job in the Microsoft Purview portal |
| 128 | + |
| 129 | +Follow these steps to enable protection: |
| 130 | + |
| 131 | +1. In the Microsoft Purview portal, go to the Content scan jobs tab. |
| 132 | +1. Select the scan job you created earlier. |
| 133 | +1. Change the **Schedule** setting from **Manual** to **Always** so that the scan runs continuously. |
| 134 | +1. Turn on **Enforce sensitivity labeling policy**. This allows the scanner to apply labels and protection automatically based on the policy. |
| 135 | +1. Once a scanner node is online, select **Scan now** to begin the job. |
| 136 | + |
| 137 | +> [!TIP] |
| 138 | +> The **Scan now** button only appears when a scanner node is connected and available. |
| 139 | +
|
| 140 | +### Use PowerShell to enable enforcement |
| 141 | + |
| 142 | +If you're configuring the scanner using PowerShell, use this command: |
| 143 | + |
| 144 | +``` powershell |
| 145 | +Set-ScannerContentScan -Schedule Always -Enforce On |
| 146 | +``` |
| 147 | + |
| 148 | +This command changes the scan job from manual to continuous and enables policy enforcement, allowing the scanner to apply classification and protection to the scanned files. |
| 149 | + |
| 150 | +## Step 5: Run your first scan |
| 151 | + |
| 152 | +With configuration complete, go back to the Microsoft Purview portal and start your first scan by selecting **Scan now** on your content scan job. This action starts the scan using your defined settings. |
| 153 | + |
| 154 | +It's a good practice to begin in discovery mode to review what the scanner finds without making changes. Once you're confident in the results, switch to enforcement to apply labeling and protection. |
0 commit comments