Skip to content

Commit 416a6d1

Browse files
authored
Merge pull request #100591 from Blackmist/encryption-update
Encryption update
2 parents a22f89e + 66b91f7 commit 416a6d1

File tree

2 files changed

+98
-13
lines changed

2 files changed

+98
-13
lines changed

articles/machine-learning/concept-enterprise-security.md

Lines changed: 98 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ ms.topic: conceptual
99
ms.author: aashishb
1010
author: aashishb
1111
ms.reviewer: larryfr
12-
ms.date: 12/17/2019
12+
ms.date: 01/09/2019
1313
---
1414

1515
# Enterprise security for Azure Machine Learning
@@ -18,6 +18,9 @@ In this article, you'll learn about security features available for Azure Machin
1818

1919
When you use a cloud service, a best practice is to restrict access to only the users who need it. Start by understanding the authentication and authorization model used by the service. You might also want to restrict network access or securely join resources in your on-premises network with the cloud. Data encryption is also vital, both at rest and while data moves between services. Finally, you need to be able to monitor the service and produce an audit log of all activity.
2020

21+
> [!NOTE]
22+
> The information in this article works with the Azure Machine Learning Python SDK version 1.0.83.1 or higher.
23+
2124
## Authentication
2225

2326
Multi-factor authentication is supported if Azure Active Directory (Azure AD) is configured to use it. Here's the authentication process:
@@ -28,7 +31,8 @@ Multi-factor authentication is supported if Azure Active Directory (Azure AD) is
2831

2932
[![Authentication in Azure Machine Learning](media/concept-enterprise-security/authentication.png)](media/concept-enterprise-security/authentication-expanded.png#lightbox)
3033

31-
See the [Set up authentication](how-to-setup-authentication.md) how-to for detailed examples and instructions on setting up authentication, including service principal authentication for automated workflows.
34+
For more information, see [Set up authentication for Azure Machine Learning resources and workflows](how-to-setup-authentication.md). This article provides information and examples on authentication, including using service principals and automated workflows.
35+
3236

3337
### Authentication for web service deployment
3438

@@ -39,7 +43,7 @@ Azure Machine Learning supports two forms of authentication for web services: ke
3943
|Key|Keys are static and do not need to be refreshed. Keys can be regenerated manually.|Disabled by default| Enabled by default|
4044
|Token|Tokens expire after a specified time period and need to be refreshed.| Not available| Disabled by default |
4145

42-
See the [web-service authentication section](how-to-setup-authentication.md#web-service-authentication) for code examples on authenticating to web-services in Azure Machine Learning.
46+
For code examples, see the [web-service authentication section](how-to-setup-authentication.md#web-service-authentication).
4347

4448
## Authorization
4549

@@ -88,7 +92,7 @@ For more information on managed identities, see [Managed identities for Azure re
8892

8993
We don't recommend that admins revoke the access of the managed identity to the resources mentioned in the preceding table. You can restore access by using the resync keys operation.
9094

91-
Azure Machine Learning creates an additional application (the name starts with `aml-` or `Microsoft-AzureML-Support-App-`) with contributor-level access in your subscription for every workspace region. For example, if you have one workspace in East US and another workspace in North Europe in the same subscription, you'll see two of these applications. These applications enable Azure Machine Learning to help you manage compute resources.
95+
Azure Machine Learning creates an additional application (the name starts with `aml-` or `Microsoft-AzureML-Support-App-`) with contributor-level access in your subscription for every workspace region. For example, if you have one workspace in East US and one in North Europe in the same subscription, you'll see two of these applications. These applications enable Azure Machine Learning to help you manage compute resources.
9296

9397
## Network security
9498

@@ -100,29 +104,86 @@ For more information, see [How to run experiments and inference in a virtual net
100104

101105
### Encryption at rest
102106

107+
> [!IMPORTANT]
108+
> If your workspace contains sensitive data we recommend setting the [hbi_workspace flag](https://docs.microsoft.com/python/api/azureml-core/azureml.core.workspace(class)?view=azure-ml-py#create-name--auth-none--subscription-id-none--resource-group-none--location-none--create-resource-group-true--sku--basic---friendly-name-none--storage-account-none--key-vault-none--app-insights-none--container-registry-none--cmk-keyvault-none--resource-cmk-uri-none--hbi-workspace-false--default-cpu-compute-target-none--default-gpu-compute-target-none--exist-ok-false--show-output-true-) while creating your workspace. This controls the amount of data Microsoft collects for diagnostic purposes and enables additional encryption in Microsoft managed environments.
109+
110+
103111
#### Azure Blob storage
104112

105113
Azure Machine Learning stores snapshots, output, and logs in the Azure Blob storage account that's tied to the Azure Machine Learning workspace and your subscription. All the data stored in Azure Blob storage is encrypted at rest with Microsoft-managed keys.
106114

107-
For information on how to use your own keys for data stored in Azure Blob storage, see [Azure Storage encryption with customer-managed keys in Azure Key Vault](https://docs.microsoft.com/azure/storage/common/storage-service-encryption-customer-managed-keys).
115+
For information on how to use your own keys for data stored in Azure Blob storage, see [Azure Storage encryption with customer-managed keys in Azure Key Vault](../storage/common/storage-encryption-keys-portal.md).
108116

109117
Training data is typically also stored in Azure Blob storage so that it's accessible to training compute targets. This storage isn't managed by Azure Machine Learning but mounted to compute targets as a remote file system.
110118

111-
For information on regenerating the access keys for the Azure storage accounts used with your workspace, see [Regenerate storage access keys](how-to-change-storage-access-key.md).
119+
For information on regenerating the access keys, see [Regenerate storage access keys](how-to-change-storage-access-key.md).
112120

113121
#### Azure Cosmos DB
114122

115-
Azure Machine Learning stores metrics and metadata in the Azure Cosmos DB instance associated with a Microsoft subscription managed by Azure Machine Learning. All the data stored in Azure Cosmos DB is encrypted at rest with Microsoft-managed keys.
123+
Azure Machine Learning stores metrics and metadata in an Azure Cosmos DB instance. This instance is associated with a Microsoft subscription managed by Azure Machine Learning. All the data stored in Azure Cosmos DB is encrypted at rest with Microsoft-managed keys.
124+
125+
To use your own (customer-managed) keys to encrypt the Azure Cosmos DB instance, you can create a dedicated Cosmos DB instance for use with your workspace. We recommend this approach if you want to store your data, such as run history information, outside of the multi-tenant Cosmos DB instance hosted in our Microsoft subscription.
126+
127+
> [!NOTE]
128+
> This feature is currently available only in US East, US West 2, US South Central.
129+
130+
To enable provisioning a Cosmos DB instance in your subscription with customer-managed keys, perform the following actions:
131+
132+
* Enable customer-managed key capabilities for Cosmos DB. At this time, you must request access to use this capability. To do so, please contact [[email protected]](mailto:[email protected]).
133+
134+
* Register the Azure Machine Learning and Azure Cosmos DB resource providers in your subscription, if not done already.
135+
136+
* Authorize the Machine Learning App (in Identity and Access Management) with contributor permissions on your subscription.
137+
138+
![Authorize the 'Azure Machine Learning App' in Identity and Access Management in the portal](./media/concept-enterprise-security/authorize-azure-machine-learning.png)
139+
140+
* Use the following parameters when creating the Azure Machine Learning workspace. Both parameters are mandatory and supported in SDK, CLI, REST APIs, and Resource Manager templates.
141+
142+
* `resource_cmk_uri`: This parameter is the full resource URI of the customer managed key in your key vault, including the [version information for the key](../key-vault/about-keys-secrets-and-certificates.md#objects-identifiers-and-versioning).
143+
144+
* `cmk_keyvault`: This parameter is the resource ID of the key vault in your subscription. This key vault needs to be in the same region and subscription that you will use for the Azure Machine Learning workspace.
145+
146+
> [!NOTE]
147+
> This key vault instance can be different than the key vault that is created by Azure Machine Learning when you provision the workspace. If you want to use the same key vault instance for the workspace, pass the same key vault while provisioning the workspace by using the [key_vault parameter](https://docs.microsoft.com/python/api/azureml-core/azureml.core.workspace(class)?view=azure-ml-py#create-name--auth-none--subscription-id-none--resource-group-none--location-none--create-resource-group-true--sku--basic---friendly-name-none--storage-account-none--key-vault-none--app-insights-none--container-registry-none--cmk-keyvault-none--resource-cmk-uri-none--hbi-workspace-false--default-cpu-compute-target-none--default-gpu-compute-target-none--exist-ok-false--show-output-true-).
148+
149+
This Cosmos DB instance is created in a Microsoft-managed resource group in your subscription.
150+
151+
> [!IMPORTANT]
152+
> * If you need to delete this Cosmos DB instance, you must delete the Azure Machine Learning workspace that uses it.
153+
> * The default [__Request Units__](../cosmos-db/request-units.md) for this Cosmos DB account is set at __8000__. Changing this value is unsupported.
154+
155+
For more information on customer-managed keys with Cosmos DB, see [Configure customer-managed keys for your Azure Cosmos DB account](../cosmos-db/how-to-setup-cmk.md).
116156

117157
#### Azure Container Registry
118158

119-
All container images in your registry (Azure Container Registry) are encrypted at rest. Azure automatically encrypts an image before storing it and decrypts it on the fly when Azure Machine Learning pulls the image.
159+
All container images in your registry (Azure Container Registry) are encrypted at rest. Azure automatically encrypts an image before storing it and decrypts it when Azure Machine Learning pulls the image.
160+
161+
To use your own (customer-managed) keys to encrypt your Azure Container Registry, you need to create your own ACR and attach it while provisioning the workspace or encrypt the default instance that gets created at the time of workspace provisioning.
162+
163+
For an example of creating a workspace using an existing Azure Container Registry, see the following articles:
164+
165+
* [Create a workspace for Azure Machine Learning with Azure CLI](how-to-manage-workspace-cli.md).
166+
* [Use an Azure Resource Manager template to create a workspace for Azure Machine Learning](how-to-create-workspace-template.md)
167+
168+
#### Azure Container Instance
169+
170+
Azure Container Instance does not support disk encryption. If you need disk encryption, we recommend [deploying to an Azure Kubernetes Service instance](how-to-deploy-azure-kubernetes-service.md) instead. In this case, you may also want to use Azure Machine Learning’s support for role-based access controls to prevent deployments to an Azure Container Instance in your subscription.
171+
172+
#### Azure Kubernetes Service
173+
174+
You may encrypt a deployed Azure Kubernetes Service resource using customer-managed keys at any time. For more information, see [https://aka.ms/aks/byok](https://aka.ms/aks/byok).
175+
176+
This process allows you to encrypt both the Data and the OS Disk of the deployed virtual machines in the Kubernetes cluster.
177+
178+
> [!IMPORTANT]
179+
> This process only works with AKS K8s version 1.16 or higher. Azure Machine Learning added support for AKS 1.16 on Jan 13, 2020.
120180
121181
#### Machine Learning Compute
122182

123183
The OS disk for each compute node stored in Azure Storage is encrypted with Microsoft-managed keys in Azure Machine Learning storage accounts. This compute target is ephemeral, and clusters are typically scaled down when no runs are queued. The underlying virtual machine is de-provisioned, and the OS disk is deleted. Azure Disk Encryption isn't supported for the OS disk.
124184

125-
Each virtual machine also has a local temporary disk for OS operations. If you want, you can use the disk to stage training data. The disk isn't encrypted.
185+
Each virtual machine also has a local temporary disk for OS operations. If you want, you can use the disk to stage training data. The disk is encrypted by default for workspaces with the `hbi_workspace` parameter set to `TRUE`. This environment is short-lived only for the duration of your run, and encryption support is limited to system-managed keys only.
186+
126187
For more information on how encryption at rest works in Azure, see [Azure data encryption at rest](https://docs.microsoft.com/azure/security/fundamentals/encryption-atrest).
127188

128189
### Encryption in transit
@@ -143,6 +204,22 @@ SSH passwords and keys to compute targets like Azure HDInsight and VMs are store
143204

144205
Each workspace has an associated system-assigned managed identity that has the same name as the workspace. This managed identity has access to all keys, secrets, and certificates in the key vault.
145206

207+
## Data collection and handling
208+
209+
### Microsoft collected data
210+
211+
Microsoft may collect non-user identifying information like resource names (for example the dataset name, or the machine learning experiment name), or job environment variables for diagnostic purposes. All such data is stored using Microsoft-managed keys in storage hosted in Microsoft owned subscriptions and follows [Microsoft’s standard Privacy policy and data handling standards](https://privacy.microsoft.com/privacystatement).
212+
213+
Microsoft also recommends not storing sensitive information (such as account key secrets) in environment variables. Environment variables are logged, encrypted, and stored by us.
214+
215+
You may opt out from diagnostic data being collected by setting the `hbi_workspace` parameter to `TRUE` while provisioning the workspace. This functionality is supported when using the AzureML Python SDK, CLI, REST APIs, or Azure Resource Manager templates.
216+
217+
### Microsoft-generated data
218+
219+
When using services such as Automated Machine Learning, Microsoft may generate a transient, pre-processed data for training multiple models. This data is stored in a datastore in your workspace, which allows you to enforce access controls and encryption appropriately.
220+
221+
You may also want to encrypt [diagnostic information logged from your deployed endpoint](how-to-enable-app-insights.md) into your Azure Application Insights instance.
222+
146223
## Monitoring
147224

148225
### Metrics
@@ -163,7 +240,15 @@ This screenshot shows the activity log of a workspace:
163240

164241
[![Screenshot showing the activity log of a workspace](media/concept-enterprise-security/workspace-activity-log.png)](media/concept-enterprise-security/workspace-activity-log-expanded.png#lightbox)
165242

166-
Scoring request details are stored in Application Insights. Application Insights is created in your subscription when you create a workspace. Logged information includes fields like HTTPMethod, UserAgent, ComputeType, RequestUrl, StatusCode, RequestId, and Duration.
243+
Scoring request details are stored in Application Insights. Application Insights is created in your subscription when you create a workspace. Logged information includes fields such as:
244+
245+
* HTTPMethod
246+
* UserAgent
247+
* ComputeType
248+
* RequestUrl
249+
* StatusCode
250+
* RequestId
251+
* Duration
167252

168253
> [!IMPORTANT]
169254
> Some actions in the Azure Machine Learning workspace don't log information to the activity log. For example, the start of a training run and the registration of a model aren't logged.
@@ -176,8 +261,8 @@ Scoring request details are stored in Application Insights. Application Insights
176261

177262
The following diagram shows the create workspace workflow.
178263

179-
* The user signs in to Azure AD from one of the supported Azure Machine Learning clients (Azure CLI, Python SDK, Azure portal) and requests the appropriate Azure Resource Manager token.
180-
* The user calls Azure Resource Manager to create the workspace.
264+
* You sign in to Azure AD from one of the supported Azure Machine Learning clients (Azure CLI, Python SDK, Azure portal) and request the appropriate Azure Resource Manager token.
265+
* You call Azure Resource Manager to create the workspace.
181266
* Azure Resource Manager contacts the Azure Machine Learning resource provider to provision the workspace.
182267

183268
Additional resources are created in the user's subscription during workspace creation:
@@ -205,7 +290,7 @@ The following diagram shows the training workflow.
205290

206291
* Azure Machine Learning is called with the snapshot ID for the code snapshot saved in the previous section.
207292
* Azure Machine Learning creates a run ID (optional) and a Machine Learning service token, which is later used by compute targets like Machine Learning Compute/VMs to communicate with the Machine Learning service.
208-
* You can choose either a managed compute target (like Machine Learning Compute) or an unmanaged compute target (like VMs) to run your training jobs. Here are the data flows for both scenarios:
293+
* You can choose either a managed compute target (like Machine Learning Compute) or an unmanaged compute target (like VMs) to run training jobs. Here are the data flows for both scenarios:
209294
* VMs/HDInsight, accessed by SSH credentials in a key vault in the Microsoft subscription. Azure Machine Learning runs management code on the compute target that:
210295

211296
1. Prepares the environment. (Docker is an option for VMs and local computers. See the following steps for Machine Learning Compute to understand how running experiments on Docker containers works.)
162 KB
Loading

0 commit comments

Comments
 (0)