Skip to content

Commit 75b0982

Browse files
authored
Merge pull request #263907 from deeikele/cmk-enhancements
Describe data lifecycle and data content for CMK workspace
2 parents ec361b0 + e84f6b0 commit 75b0982

File tree

1 file changed

+26
-16
lines changed

1 file changed

+26
-16
lines changed

articles/machine-learning/concept-customer-managed-keys.md

Lines changed: 26 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -15,9 +15,7 @@ monikerRange: 'azureml-api-2 || azureml-api-1'
1515
---
1616
# Customer-managed keys for Azure Machine Learning
1717

18-
Azure Machine Learning is built on top of multiple Azure services. While the data is stored securely using encryption keys that Microsoft provides, you can enhance security by also providing your own (customer-managed) keys. The keys you provide are stored securely using Azure Key Vault.
19-
20-
[!INCLUDE [machine-learning-customer-managed-keys.md](includes/machine-learning-customer-managed-keys.md)]
18+
Azure Machine Learning is built on top of multiple Azure services. While the data is stored securely using encryption keys that Microsoft provides, you can enhance security by also providing your own (customer-managed) keys. The keys you provide are stored securely using Azure Key Vault. Your data is stored on a set of additional resources managed in your Azure subscription.
2119

2220
In addition to customer-managed keys, Azure Machine Learning also provides a [hbi_workspace flag](/python/api/azure-ai-ml/azure.ai.ml.entities.workspace). Enabling this flag reduces the amount of data Microsoft collects for diagnostic purposes and enables [extra encryption in Microsoft-managed environments](../security/fundamentals/encryption-atrest.md). This flag also enables the following behaviors:
2321

@@ -44,22 +42,29 @@ In addition to customer-managed keys, Azure Machine Learning also provides a [hb
4442

4543
## Limitations
4644

47-
* The customer-managed key for resources the workspace depends on can't be updated after workspace creation.
48-
* Resources managed by Microsoft in your subscription can't transfer ownership to you.
45+
* After workspace creation, the customer-managed encryption key for resources the workspace depends on can only be updated to another key in the original Azure Key Vault resource.
46+
* Encrypted data is stored on resources that live in a Microsoft-managed resource group in your subscription. You cannot create these resources upfront or transfer ownership of these to you. Data lifecycle is managed indirectly via the Azure ML APIs as you create objects in Azure Machine Learning service.
4947
* You can't delete Microsoft-managed resources used for customer-managed keys without also deleting your workspace.
48+
* The compute cluster OS disk cannot be encrypted using your customer-managed keys, but only Microsoft-managed keys.
5049

51-
## How workspace metadata is stored
50+
## How and what workspace metadata is stored
5251

53-
The following resources store metadata for your workspace:
52+
When you bring your own encryption key, service metadata is stored on dedicated resources in your Azure subscription. Microsoft creates a separate resource group in your subscription for this named *"azureml-rg-workspacename_GUID"*. Resource in this managed resource group can only be modified by Microsoft.
5453

55-
| Service | How it's used |
56-
| ----- | ----- |
57-
| Azure Cosmos DB | Stores job history data. |
58-
| Azure AI Search | Stores indices that are used to help query your machine learning content. |
59-
| Azure Storage Account | Stores other metadata such as Azure Machine Learning pipelines data. |
54+
The following resources are created and store metadata for your workspace:
55+
56+
| Service | Usage | Example data |
57+
| ----- | ----- | ----- |
58+
| Azure Cosmos DB | Stores job history data, compute metadata, asset metadata | Job name, status, sequence number and status; Compute cluster name, number of cores, number of nodes; Datastore names and tags, descriptions on assets like models; data label names |
59+
| Azure AI Search | Stores indices that are used to help query your machine learning content. | These indices are built on top of the data stored in CosmosDB. |
60+
| Azure Storage Account | Stores metadata related to Azure Machine Learning pipelines data. | Designer pipeline names, pipeline layout, execution properties. |
61+
62+
From a data lifecycle management point of view, data in the above resources are created and deleted as you create and delete their corresponding objects in Azure Machine Learning.
6063

6164
Your Azure Machine Learning workspace reads and writes data using its managed identity. This identity is granted access to the resources using a role assignment (Azure role-based access control) on the data resources. The encryption key you provide is used to encrypt data that is stored on Microsoft-managed resources. It's also used to create indices for Azure AI Search, which are created at runtime.
6265

66+
Extra networking controls are configured when you create a private link endpoint on your workspace to allow for inbound connectivity. In this configuration, a private link endpoint connection will be created to the CosmosDB instance and network access will be restricted to only trusted Microsoft services.
67+
6368
## Customer-managed keys
6469

6570
When you __don't use a customer-managed key__, Microsoft creates and manages these resources in a Microsoft owned Azure subscription and uses a Microsoft-managed key to encrypt the data.
@@ -69,7 +74,7 @@ When you __use a customer-managed key__, these resources are _in your Azure subs
6974
> [!IMPORTANT]
7075
> When using a customer-managed key, the costs for your subscription will be higher because these resources are in your subscription. To estimate the cost, use the [Azure pricing calculator](https://azure.microsoft.com/pricing/calculator/).
7176
72-
These Microsoft-managed resources are located in a new Azure resource group is created in your subscription. This group is in addition to the resource group for your workspace. This resource group will contain the Microsoft-managed resources that your key is used with. The resource group will be named using the formula of `<Azure Machine Learning workspace resource group name><GUID>`.
77+
These Microsoft-managed resources are located in a new Azure resource group is created in your subscription. This group is in addition to the resource group for your workspace. This resource group contains the Microsoft-managed resources that your key is used with. The resource group will be named using the formula of `<Azure Machine Learning workspace resource group name><GUID>`.
7378

7479
> [!TIP]
7580
> * The [__Request Units__](../cosmos-db/request-units.md) for the Azure Cosmos DB automatically scale as needed.
@@ -102,9 +107,14 @@ Azure Machine Learning uses compute resources to train and deploy machine learni
102107
:::moniker-end
103108

104109
**Compute cluster**
105-
The OS disk for each compute node stored in Azure Storage is encrypted with Microsoft-managed keys in Azure Machine Learning storage accounts. This compute target is ephemeral, and clusters are typically scaled down when no jobs are queued. The underlying virtual machine is de-provisioned, and the OS disk is deleted. Azure Disk Encryption isn't supported for the OS disk.
106110

107-
Each virtual machine also has a local temporary disk for OS operations. If you want, you can use the disk to stage training data. If the workspace was created with the `hbi_workspace` parameter set to `TRUE`, the temporary disk is encrypted. This environment is short-lived (only during your job) and encryption support is limited to system-managed keys only.
111+
Compute clusters have local OS disk storage and can mount data from storage accounts in your subscription during the job.
112+
113+
When mounting data from your own storage account in a job, you can enable customer-managed keys on those storage accounts for encryption.
114+
115+
The OS disk for each compute node stored in Azure Storage is always encrypted with Microsoft-managed keys in Azure Machine Learning storage accounts, and not using customer-managed keys. This compute target is ephemeral, and hence data that is stored on the OS disk is deleted once the cluster scales down. Clusters are typically scaled down when no jobs are queued, autoscaling is on and the minimum node count is set to zero. The underlying virtual machine is deprovisioned, and the OS disk is deleted.
116+
117+
Azure Disk Encryption isn't supported for the OS disk. Each virtual machine also has a local temporary disk for OS operations. If you want, you can use the disk to stage training data. If the workspace was created with the `hbi_workspace` parameter set to `TRUE`, the temporary disk is encrypted. This environment is short-lived (only during your job) and encryption support is limited to system-managed keys only.
108118

109119
**Compute instance**
110120
The OS disk for compute instance is encrypted with Microsoft-managed keys in Azure Machine Learning storage accounts. If the workspace was created with the `hbi_workspace` parameter set to `TRUE`, the local temporary disk on compute instance is encrypted with Microsoft managed keys. Customer managed key encryption isn't supported for OS and temp disk.
@@ -123,4 +133,4 @@ To enable the `hbi_workspace` flag when creating an Azure Machine Learning works
123133

124134
## Next Steps
125135

126-
* [How to configure customer-managed keys with Azure Machine Learning](how-to-setup-customer-managed-keys.md).
136+
* [How to configure customer-managed keys with Azure Machine Learning](how-to-setup-customer-managed-keys.md).

0 commit comments

Comments
 (0)