Skip to content

Commit 617b443

Browse files
Merge pull request #190465 from Blackmist/1885496-cmk-current
1885496 cmk current
2 parents e9d4ece + cc2740f commit 617b443

7 files changed

+306
-68
lines changed
Lines changed: 115 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,115 @@
1+
---
2+
title: Customer-managed keys
3+
titleSuffix: Azure Machine Learning
4+
description: 'Learn about using customer-managed keys to improve data security with Azure Machine Learning.'
5+
services: machine-learning
6+
ms.service: machine-learning
7+
ms.subservice: enterprise-readiness
8+
ms.topic: conceptual
9+
ms.author: jhirono
10+
author: jhirono
11+
ms.reviewer: larryfr
12+
ms.date: 03/17/2022
13+
---
14+
# Customer-managed keys for Azure Machine Learning
15+
16+
Azure Machine Learning is built on top of multiple Azure services. While the data is stored securely using encryption keys that Microsoft provides, you can enhance security by also providing your own (customer-managed) keys. The keys you provide are stored securely using Azure Key Vault.
17+
18+
[!INCLUDE [machine-learning-customer-managed-keys.md](../../includes/machine-learning-customer-managed-keys.md)]
19+
20+
In addition to customer-managed keys, Azure Machine Learning also provides a [hbi_workspace flag](/python/api/azureml-core/azureml.core.workspace%28class%29#create-name--auth-none--subscription-id-none--resource-group-none--location-none--create-resource-group-true--sku--basic---friendly-name-none--storage-account-none--key-vault-none--app-insights-none--container-registry-none--cmk-keyvault-none--resource-cmk-uri-none--hbi-workspace-false--default-cpu-compute-target-none--default-gpu-compute-target-none--exist-ok-false--show-output-true-). Enabling this flag reduces the amount of data Microsoft collects for diagnostic purposes and enables [extra encryption in Microsoft-managed environments](../security/fundamentals/encryption-atrest.md). This flag also enables the following behaviors:
21+
22+
* Starts encrypting the local scratch disk in your Azure Machine Learning compute cluster, provided you haven’t created any previous clusters in that subscription. Else, you need to raise a support ticket to enable encryption of the scratch disk of your compute clusters.
23+
* Cleans up your local scratch disk between runs.
24+
* Securely passes credentials for your storage account, container registry, and SSH account from the execution layer to your compute clusters using your key vault.
25+
26+
> [!TIP]
27+
> The `hbi_workspace` flag does not impact encryption in transit, only encryption at rest.
28+
29+
## Prerequisites
30+
31+
* An Azure subscription.
32+
* An Azure Key Vault instance. The key vault contains the key(s) used to encrypt your services.
33+
34+
* The key vault instance must enable soft delete and purge protection.
35+
* The managed identity for the services secured by a customer-managed key must have the following permissions in key vault:
36+
37+
* wrap key
38+
* unwrap key
39+
* get
40+
41+
For example, the managed identity for Azure Cosmos DB would need to have those permissions to the key vault.
42+
43+
## Limitations
44+
45+
* The customer-managed key for resources the workspace depends on can’t be updated after workspace creation.
46+
* Resources managed by Microsoft in your subscription can’t transfer ownership to you.
47+
* You can't delete Microsoft-managed resources used for customer-managed keys without also deleting your workspace.
48+
49+
## How workspace metadata is stored
50+
51+
The following resources store metadata for your workspace:
52+
53+
| Service | How it’s used |
54+
| ----- | ----- |
55+
| Azure Cosmos DB | Stores run history data. |
56+
| Azure Cognitive Search | Stores indices that are used to help query your machine learning content. |
57+
| Azure Storage Account | Stores other metadata such as Azure Machine Learning pipelines data. |
58+
59+
Your Azure Machine Learning workspace reads and writes data using its managed identity. This identity is granted access to the resources using a role assignment (Azure role-based access control) on the data resources. The encryption key you provide is used to encrypt data that is stored on Microsoft-managed resources. It's also used to create indices for Azure Cognitive Search, which are created at runtime.
60+
61+
## Customer-managed keys
62+
63+
When you __don't use a customer-managed key__, Microsoft creates and manages these resources in a Microsoft owned Azure subscription and uses a Microsoft-managed key to encrypt the data.
64+
65+
When you __use a customer-managed key__, these resources are _in your Azure subscription_ and encrypted with your key. While they exist in your subscription, these resources are __managed by Microsoft__. They're automatically created and configured when you create your Azure Machine Learning workspace.
66+
67+
> [!IMPORTANT]
68+
> When using a customer-managed key, the costs for your subscription will be higher because these resources are in your subscription. To estimate the cost, use the [Azure pricing calculator](https://azure.microsoft.com/pricing/calculator/).
69+
70+
These Microsoft-managed resources are located in a new Azure resource group is created in your subscription. This group is in addition to the resource group for your workspace. This resource group will contain the Microsoft-managed resources that your key is used with. The resource group will be named using the formula of `<Azure Machine Learning workspace resource group name><GUID>`.
71+
72+
> [!TIP]
73+
> * The [__Request Units__](/azure/cosmos-db/request-units) for the Azure Cosmos DB automatically scale as needed.
74+
> * If your Azure Machine Learning workspace uses a private endpoint, this resource group will also contain a Microsoft-managed Azure Virtual Network. This VNet is used to secure communications between the managed services and the workspace. You __cannot provide your own VNet for use with the Microsoft-managed resources__. You also __cannot modify the virtual network__. For example, you cannot change the IP address range that it uses.
75+
76+
> [!IMPORTANT]
77+
> If your subscription does not have enough quota for these services, a failure will occur.
78+
79+
> [!WARNING]
80+
> __Don't delete the resource group__ that contains this Azure Cosmos DB instance, or any of the resources automatically created in this group. If you need to delete the resource group or Microsoft-managed services in it, you must delete the Azure Machine Learning workspace that uses it. The resource group resources are deleted when the associated workspace is deleted.
81+
82+
## How compute data is stored
83+
84+
Azure Machine Learning uses compute resources to train and deploy machine learning models. The following table describes the compute options and how data is encrypted by each one:
85+
86+
| Compute | Encryption |
87+
| ----- | ----- |
88+
| Azure Container Instance | Data is encrypted by a Microsoft-managed key or a customer-managed key.</br>For more information, see [Encrypt data with a customer-managed key](../container-instances/container-instances-encrypt-data.md). |
89+
| Azure Kubernetes Service | Data is encrypted by a Microsoft-managed key or a customer-managed key.</br>For more information, see [Bring your own keys with Azure disks in Azure Kubernetes Services](/azure/aks/azure-disk-customer-managed-keys). |
90+
| Azure Machine Learning compute instance | Local scratch disk is encrypted if the `hbi_workspace` flag is enabled for the workspace. |
91+
| Azure Machine Learning compute cluster | OS disk encrypted in Azure Storage with Microsoft-managed keys. Temporary disk is encrypted if the `hbi_workspace` flag is enabled for the workspace. |
92+
93+
**Compute cluster**
94+
The OS disk for each compute node stored in Azure Storage is encrypted with Microsoft-managed keys in Azure Machine Learning storage accounts. This compute target is ephemeral, and clusters are typically scaled down when no runs are queued. The underlying virtual machine is de-provisioned, and the OS disk is deleted. Azure Disk Encryption isn't supported for the OS disk.
95+
96+
Each virtual machine also has a local temporary disk for OS operations. If you want, you can use the disk to stage training data. If the workspace was created with the `hbi_workspace` parameter set to `TRUE`, the temporary disk is encrypted. This environment is short-lived (only during your run) and encryption support is limited to system-managed keys only.
97+
98+
**Compute instance**
99+
The OS disk for compute instance is encrypted with Microsoft-managed keys in Azure Machine Learning storage accounts. If the workspace was created with the `hbi_workspace` parameter set to `TRUE`, the local temporary disk on compute instance is encrypted with Microsoft managed keys. Customer managed key encryption isn't supported for OS and temp disk.
100+
101+
### HBI_workspace flag
102+
103+
* The `hbi_workspace` flag can only be set when a workspace is created. It can’t be changed for an existing workspace.
104+
* When this flag is set to True, it may increase the difficulty of troubleshooting issues because less telemetry data is sent to Microsoft. There’s less visibility into success rates or problem types. Microsoft may not be able to react as proactively when this flag is True.
105+
106+
To enable the `hbi_workspace` flag when creating an Azure Machine Learning workspace, follow the steps in one of the following articles:
107+
108+
* [How to create and manage a workspace](how-to-manage-workspace.md).
109+
* [How to create and manage a workspace using the Azure CLI](how-to-manage-workspace-cli.md).
110+
* [How to create a workspace using Hashicorp Terraform](how-to-manage-workspace-terraform.md).
111+
* [How to create a workspace using Azure Resource Manager templates](how-to-create-workspace-template.md).
112+
113+
## Next Steps
114+
115+
* [How to configure customer-managed keys with Azure Machine Learning](how-to-setup-customer-managed-keys.md).

articles/machine-learning/concept-data-encryption.md

Lines changed: 6 additions & 37 deletions
Original file line numberDiff line numberDiff line change
@@ -23,23 +23,11 @@ Azure Machine Learning uses a variety of Azure data storage services and compute
2323
2424
## Encryption at rest
2525

26-
> [!IMPORTANT]
27-
> If your workspace contains sensitive data we recommend setting the [hbi_workspace flag](/python/api/azureml-core/azureml.core.workspace%28class%29#create-name--auth-none--subscription-id-none--resource-group-none--location-none--create-resource-group-true--sku--basic---friendly-name-none--storage-account-none--key-vault-none--app-insights-none--container-registry-none--cmk-keyvault-none--resource-cmk-uri-none--hbi-workspace-false--default-cpu-compute-target-none--default-gpu-compute-target-none--exist-ok-false--show-output-true-) while creating your workspace. The `hbi_workspace` flag can only be set when a workspace is created. It cannot be changed for an existing workspace.
28-
29-
The `hbi_workspace` flag controls the amount of [data Microsoft collects for diagnostic purposes](#microsoft-collected-data) and enables [additional encryption in Microsoft-managed environments](../security/fundamentals/encryption-atrest.md). In addition, it enables the following actions:
30-
31-
* Starts encrypting the local scratch disk in your Azure Machine Learning compute cluster provided you have not created any previous clusters in that subscription. Else, you need to raise a support ticket to enable encryption of the scratch disk of your compute clusters
32-
* Cleans up your local scratch disk between runs
33-
* Securely passes credentials for your storage account, container registry, and SSH account from the execution layer to your compute clusters using your key vault
34-
35-
When this flag is set to True, one possible impact is increased difficulty troubleshooting issues. This could happen because some telemetry isn't sent to Microsoft and there is less visibility into success rates or problem types, and therefore may not be able to react as proactively when this flag is True.
36-
37-
> [!TIP]
38-
> The `hbi_workspace` flag does not impact encryption in transit, only encryption at rest.
26+
Azure Machine Learning relies on multiple Azure Services, each of which have their own encryption capabilities.
3927

4028
### Azure Blob storage
4129

42-
Azure Machine Learning stores snapshots, output, and logs in the Azure Blob storage account that's tied to the Azure Machine Learning workspace and your subscription. All the data stored in Azure Blob storage is encrypted at rest with Microsoft-managed keys.
30+
Azure Machine Learning stores snapshots, output, and logs in the Azure Blob storage account (default storage account) that's tied to the Azure Machine Learning workspace and your subscription. All the data stored in Azure Blob storage is encrypted at rest with Microsoft-managed keys.
4331

4432
For information on how to use your own keys for data stored in Azure Blob storage, see [Azure Storage encryption with customer-managed keys in Azure Key Vault](../storage/common/customer-managed-keys-configure-key-vault.md).
4533

@@ -53,29 +41,7 @@ For information on regenerating the access keys, see [Regenerate storage access
5341

5442
Azure Machine Learning stores metadata in an Azure Cosmos DB instance. This instance is associated with a Microsoft subscription managed by Azure Machine Learning. All the data stored in Azure Cosmos DB is encrypted at rest with Microsoft-managed keys.
5543

56-
To use your own (customer-managed) keys to encrypt the Azure Cosmos DB instance, you can create a dedicated Cosmos DB instance for use with your workspace. We recommend this approach if you want to store your data, such as run history information, outside of the multi-tenant Cosmos DB instance hosted in our Microsoft subscription.
57-
58-
To enable provisioning a Cosmos DB instance in your subscription with customer-managed keys, perform the following actions:
59-
60-
* Register the Microsoft.MachineLearning and Microsoft.DocumentDB resource providers in your subscription, if not done already.
61-
62-
* Use the following parameters when creating the Azure Machine Learning workspace. Both parameters are mandatory and supported in SDK, Azure CLI, REST APIs, and Resource Manager templates.
63-
64-
* `cmk_keyvault`: This parameter is the resource ID of the key vault in your subscription. This key vault needs to be in the same region and subscription that you will use for the Azure Machine Learning workspace.
65-
66-
* `resource_cmk_uri`: This parameter is the full resource URI of the customer managed key in your key vault, including the [version information for the key](../key-vault/general/about-keys-secrets-certificates.md#objects-identifiers-and-versioning).
67-
68-
> [!NOTE]
69-
> Enabling soft delete and purge protection on the CMK key vault instance is required before creating an encrypted machine learning workspace to protect against accidental data loss in case of vault deletion.
70-
71-
> [!NOTE]
72-
> This key vault instance can be different than the key vault that is created by Azure Machine Learning when you provision the workspace. If you want to use the same key vault instance for the workspace, pass the same key vault while provisioning the workspace by using the [key_vault parameter](/python/api/azureml-core/azureml.core.workspace%28class%29#create-name--auth-none--subscription-id-none--resource-group-none--location-none--create-resource-group-true--sku--basic---friendly-name-none--storage-account-none--key-vault-none--app-insights-none--container-registry-none--cmk-keyvault-none--resource-cmk-uri-none--hbi-workspace-false--default-cpu-compute-target-none--default-gpu-compute-target-none--exist-ok-false--show-output-true-).
73-
74-
[!INCLUDE [machine-learning-customer-managed-keys.md](../../includes/machine-learning-customer-managed-keys.md)]
75-
76-
If you need to __rotate or revoke__ your key, you can do so at any time. When rotating a key, Cosmos DB will start using the new key (latest version) to encrypt data at rest. When revoking (disabling) a key, Cosmos DB takes care of failing requests. It usually takes an hour for the rotation or revocation to be effective.
77-
78-
For more information on customer-managed keys with Cosmos DB, see [Configure customer-managed keys for your Azure Cosmos DB account](../cosmos-db/how-to-setup-cmk.md).
44+
When using your own (customer-managed) keys to encrypt the Azure Cosmos DB instance, a Microsoft managed Azure Cosmos DB instance is created in your subscription. This instance is created in a Microsoft-managed resource group, which is different than the resource group for your workspace. For more information, see [Customer-managed keys](concept-customer-managed-keys.md).
7945

8046
### Azure Container Registry
8147

@@ -131,6 +97,8 @@ Each virtual machine also has a local temporary disk for OS operations. If you w
13197
**Compute instance**
13298
The OS disk for compute instance is encrypted with Microsoft-managed keys in Azure Machine Learning storage accounts. If the workspace was created with the `hbi_workspace` parameter set to `TRUE`, the local temporary disk on compute instance is encrypted with Microsoft managed keys. Customer managed key encryption is not supported for OS and temp disk.
13399

100+
For more information, see [Customer-managed keys](concept-customer-managed-keys.md).
101+
134102
### Azure Databricks
135103

136104
Azure Databricks can be used in Azure Machine Learning pipelines. By default, the Databricks File System (DBFS) used by Azure Databricks is encrypted using a Microsoft-managed key. To configure Azure Databricks to use customer-managed keys, see [Configure customer-managed keys on default (root) DBFS](/azure/databricks/security/customer-managed-keys-dbfs).
@@ -175,3 +143,4 @@ Each workspace has an associated system-assigned managed identity that has the s
175143
* [Get data from a datastore](how-to-create-register-datasets.md)
176144
* [Connect to data](how-to-connect-data-ui.md)
177145
* [Train with datasets](how-to-train-with-datasets.md)
146+
* [Customer-managed keys](concept-customer-managed-keys.md).

articles/machine-learning/how-to-create-workspace-template.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ ms.topic: how-to
99
ms.custom: devx-track-azurecli, devx-track-azurepowershell
1010
ms.author: larryfr
1111
author: Blackmist
12-
ms.date: 10/21/2021
12+
ms.date: 03/08/2022
1313

1414

1515
# Customer intent: As a DevOps person, I need to automate or customize the creation of Azure Machine Learning by using templates.
@@ -169,18 +169,18 @@ The following example template demonstrates how to create a workspace with three
169169
* Enable encryption for the workspace.
170170
* Uses an existing Azure Key Vault to retrieve customer-managed keys. Customer-managed keys are used to create a new Cosmos DB instance for the workspace.
171171

172-
[!INCLUDE [machine-learning-customer-managed-keys.md](../../includes/machine-learning-customer-managed-keys.md)]
173-
174172
> [!IMPORTANT]
175173
> Once a workspace has been created, you cannot change the settings for confidential data, encryption, key vault ID, or key identifiers. To change these values, you must create a new workspace using the new values.
176174
177-
For more information, see [Encryption at rest](concept-data-encryption.md#encryption-at-rest).
175+
For more information, see [Customer-managed keys](concept-customer-managed-keys.md).
178176

179177
> [!IMPORTANT]
180178
> There are some specific requirements your subscription must meet before using this template:
181179
> * You must have an existing Azure Key Vault that contains an encryption key.
182180
> * The Azure Key Vault must be in the same region where you plan to create the Azure Machine Learning workspace.
183181
> * You must specify the ID of the Azure Key Vault and the URI of the encryption key.
182+
>
183+
> For steps on creating the vault and key, see [Configure customer-managed keys](how-to-setup-customer-managed-keys.md).
184184
185185
__To get the values__ for the `cmk_keyvault` (ID of the Key Vault) and the `resource_cmk_uri` (key URI) parameters needed by this template, use the following steps:
186186

0 commit comments

Comments
 (0)