You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/machine-learning/concept-enterprise-security.md
+98-13Lines changed: 98 additions & 13 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -9,7 +9,7 @@ ms.topic: conceptual
9
9
ms.author: aashishb
10
10
author: aashishb
11
11
ms.reviewer: larryfr
12
-
ms.date: 12/17/2019
12
+
ms.date: 01/09/2019
13
13
---
14
14
15
15
# Enterprise security for Azure Machine Learning
@@ -18,6 +18,9 @@ In this article, you'll learn about security features available for Azure Machin
18
18
19
19
When you use a cloud service, a best practice is to restrict access to only the users who need it. Start by understanding the authentication and authorization model used by the service. You might also want to restrict network access or securely join resources in your on-premises network with the cloud. Data encryption is also vital, both at rest and while data moves between services. Finally, you need to be able to monitor the service and produce an audit log of all activity.
20
20
21
+
> [!NOTE]
22
+
> The information in this article works with the Azure Machine Learning Python SDK version 1.0.83.1 or higher.
23
+
21
24
## Authentication
22
25
23
26
Multi-factor authentication is supported if Azure Active Directory (Azure AD) is configured to use it. Here's the authentication process:
@@ -28,7 +31,8 @@ Multi-factor authentication is supported if Azure Active Directory (Azure AD) is
28
31
29
32
[](media/concept-enterprise-security/authentication-expanded.png#lightbox)
30
33
31
-
See the [Set up authentication](how-to-setup-authentication.md) how-to for detailed examples and instructions on setting up authentication, including service principal authentication for automated workflows.
34
+
For more information, see [Set up authentication for Azure Machine Learning resources and workflows](how-to-setup-authentication.md). This article provides information and examples on authentication, including using service principals and automated workflows.
35
+
32
36
33
37
### Authentication for web service deployment
34
38
@@ -39,7 +43,7 @@ Azure Machine Learning supports two forms of authentication for web services: ke
39
43
|Key|Keys are static and do not need to be refreshed. Keys can be regenerated manually.|Disabled by default| Enabled by default|
40
44
|Token|Tokens expire after a specified time period and need to be refreshed.| Not available| Disabled by default |
41
45
42
-
See the [web-service authentication section](how-to-setup-authentication.md#web-service-authentication) for code examples on authenticating to web-services in Azure Machine Learning.
46
+
For code examples, see the [web-service authentication section](how-to-setup-authentication.md#web-service-authentication).
43
47
44
48
## Authorization
45
49
@@ -88,7 +92,7 @@ For more information on managed identities, see [Managed identities for Azure re
88
92
89
93
We don't recommend that admins revoke the access of the managed identity to the resources mentioned in the preceding table. You can restore access by using the resync keys operation.
90
94
91
-
Azure Machine Learning creates an additional application (the name starts with `aml-` or `Microsoft-AzureML-Support-App-`) with contributor-level access in your subscription for every workspace region. For example, if you have one workspace in East US and another workspace in North Europe in the same subscription, you'll see two of these applications. These applications enable Azure Machine Learning to help you manage compute resources.
95
+
Azure Machine Learning creates an additional application (the name starts with `aml-` or `Microsoft-AzureML-Support-App-`) with contributor-level access in your subscription for every workspace region. For example, if you have one workspace in East US and one in North Europe in the same subscription, you'll see two of these applications. These applications enable Azure Machine Learning to help you manage compute resources.
92
96
93
97
## Network security
94
98
@@ -100,29 +104,86 @@ For more information, see [How to run experiments and inference in a virtual net
100
104
101
105
### Encryption at rest
102
106
107
+
> [!IMPORTANT]
108
+
> If your workspace contains sensitive data we recommend setting the [hbi_workspace flag](https://docs.microsoft.com/python/api/azureml-core/azureml.core.workspace(class)?view=azure-ml-py#create-name--auth-none--subscription-id-none--resource-group-none--location-none--create-resource-group-true--sku--basic---friendly-name-none--storage-account-none--key-vault-none--app-insights-none--container-registry-none--cmk-keyvault-none--resource-cmk-uri-none--hbi-workspace-false--default-cpu-compute-target-none--default-gpu-compute-target-none--exist-ok-false--show-output-true-) while creating your workspace. This controls the amount of data Microsoft collects for diagnostic purposes and enables additional encryption in Microsoft managed environments.
109
+
110
+
103
111
#### Azure Blob storage
104
112
105
113
Azure Machine Learning stores snapshots, output, and logs in the Azure Blob storage account that's tied to the Azure Machine Learning workspace and your subscription. All the data stored in Azure Blob storage is encrypted at rest with Microsoft-managed keys.
106
114
107
-
For information on how to use your own keys for data stored in Azure Blob storage, see [Azure Storage encryption with customer-managed keys in Azure Key Vault](https://docs.microsoft.com/azure/storage/common/storage-service-encryption-customer-managed-keys).
115
+
For information on how to use your own keys for data stored in Azure Blob storage, see [Azure Storage encryption with customer-managed keys in Azure Key Vault](../storage/common/storage-encryption-keys-portal.md).
108
116
109
117
Training data is typically also stored in Azure Blob storage so that it's accessible to training compute targets. This storage isn't managed by Azure Machine Learning but mounted to compute targets as a remote file system.
110
118
111
-
For information on regenerating the access keys for the Azure storage accounts used with your workspace, see [Regenerate storage access keys](how-to-change-storage-access-key.md).
119
+
For information on regenerating the access keys, see [Regenerate storage access keys](how-to-change-storage-access-key.md).
112
120
113
121
#### Azure Cosmos DB
114
122
115
-
Azure Machine Learning stores metrics and metadata in the Azure Cosmos DB instance associated with a Microsoft subscription managed by Azure Machine Learning. All the data stored in Azure Cosmos DB is encrypted at rest with Microsoft-managed keys.
123
+
Azure Machine Learning stores metrics and metadata in an Azure Cosmos DB instance. This instance is associated with a Microsoft subscription managed by Azure Machine Learning. All the data stored in Azure Cosmos DB is encrypted at rest with Microsoft-managed keys.
124
+
125
+
To use your own (customer-managed) keys to encrypt the Azure Cosmos DB instance, you can create a dedicated Cosmos DB instance for use with your workspace. We recommend this approach if you want to store your data, such as run history information, outside of the multi-tenant Cosmos DB instance hosted in our Microsoft subscription.
126
+
127
+
> [!NOTE]
128
+
> This feature is currently available only in US East, US West 2, US South Central.
129
+
130
+
To enable provisioning a Cosmos DB instance in your subscription with customer-managed keys, perform the following actions:
131
+
132
+
* Enable customer-managed key capabilities for Cosmos DB. At this time, you must request access to use this capability. To do so, please contact [[email protected]](mailto:[email protected]).
133
+
134
+
* Register the Azure Machine Learning and Azure Cosmos DB resource providers in your subscription, if not done already.
135
+
136
+
* Authorize the Machine Learning App (in Identity and Access Management) with contributor permissions on your subscription.
137
+
138
+

139
+
140
+
* Use the following parameters when creating the Azure Machine Learning workspace. Both parameters are mandatory and supported in SDK, CLI, REST APIs, and Resource Manager templates.
141
+
142
+
*`resource_cmk_uri`: This parameter is the full resource URI of the customer managed key in your key vault, including the [version information for the key](../key-vault/about-keys-secrets-and-certificates.md#objects-identifiers-and-versioning).
143
+
144
+
*`cmk_keyvault`: This parameter is the resource ID of the key vault in your subscription. This key vault needs to be in the same region and subscription that you will use for the Azure Machine Learning workspace.
145
+
146
+
> [!NOTE]
147
+
> This key vault instance can be different than the key vault that is created by Azure Machine Learning when you provision the workspace. If you want to use the same key vault instance for the workspace, pass the same key vault while provisioning the workspace by using the [key_vault parameter](https://docs.microsoft.com/python/api/azureml-core/azureml.core.workspace(class)?view=azure-ml-py#create-name--auth-none--subscription-id-none--resource-group-none--location-none--create-resource-group-true--sku--basic---friendly-name-none--storage-account-none--key-vault-none--app-insights-none--container-registry-none--cmk-keyvault-none--resource-cmk-uri-none--hbi-workspace-false--default-cpu-compute-target-none--default-gpu-compute-target-none--exist-ok-false--show-output-true-).
148
+
149
+
This Cosmos DB instance is created in a Microsoft-managed resource group in your subscription.
150
+
151
+
> [!IMPORTANT]
152
+
> * If you need to delete this Cosmos DB instance, you must delete the Azure Machine Learning workspace that uses it.
153
+
> * The default [__Request Units__](../cosmos-db/request-units.md) for this Cosmos DB account is set at __8000__. Changing this value is unsupported.
154
+
155
+
For more information on customer-managed keys with Cosmos DB, see [Configure customer-managed keys for your Azure Cosmos DB account](../cosmos-db/how-to-setup-cmk.md).
116
156
117
157
#### Azure Container Registry
118
158
119
-
All container images in your registry (Azure Container Registry) are encrypted at rest. Azure automatically encrypts an image before storing it and decrypts it on the fly when Azure Machine Learning pulls the image.
159
+
All container images in your registry (Azure Container Registry) are encrypted at rest. Azure automatically encrypts an image before storing it and decrypts it when Azure Machine Learning pulls the image.
160
+
161
+
To use your own (customer-managed) keys to encrypt your Azure Container Registry, you need to create your own ACR and attach it while provisioning the workspace or encrypt the default instance that gets created at the time of workspace provisioning.
162
+
163
+
For an example of creating a workspace using an existing Azure Container Registry, see the following articles:
164
+
165
+
*[Create a workspace for Azure Machine Learning with Azure CLI](how-to-manage-workspace-cli.md).
166
+
*[Use an Azure Resource Manager template to create a workspace for Azure Machine Learning](how-to-create-workspace-template.md)
167
+
168
+
#### Azure Container Instance
169
+
170
+
Azure Container Instance does not support disk encryption. If you need disk encryption, we recommend [deploying to an Azure Kubernetes Service instance](how-to-deploy-azure-kubernetes-service.md) instead. In this case, you may also want to use Azure Machine Learning’s support for role-based access controls to prevent deployments to an Azure Container Instance in your subscription.
171
+
172
+
#### Azure Kubernetes Service
173
+
174
+
You may encrypt a deployed Azure Kubernetes Service resource using customer-managed keys at any time. For more information, see [https://aka.ms/aks/byok](https://aka.ms/aks/byok).
175
+
176
+
This process allows you to encrypt both the Data and the OS Disk of the deployed virtual machines in the Kubernetes cluster.
177
+
178
+
> [!IMPORTANT]
179
+
> This process only works with AKS K8s version 1.16 or higher. Azure Machine Learning added support for AKS 1.16 on Jan 13, 2020.
120
180
121
181
#### Machine Learning Compute
122
182
123
183
The OS disk for each compute node stored in Azure Storage is encrypted with Microsoft-managed keys in Azure Machine Learning storage accounts. This compute target is ephemeral, and clusters are typically scaled down when no runs are queued. The underlying virtual machine is de-provisioned, and the OS disk is deleted. Azure Disk Encryption isn't supported for the OS disk.
124
184
125
-
Each virtual machine also has a local temporary disk for OS operations. If you want, you can use the disk to stage training data. The disk isn't encrypted.
185
+
Each virtual machine also has a local temporary disk for OS operations. If you want, you can use the disk to stage training data. The disk is encrypted by default for workspaces with the `hbi_workspace` parameter set to `TRUE`. This environment is short-lived only for the duration of your run, and encryption support is limited to system-managed keys only.
186
+
126
187
For more information on how encryption at rest works in Azure, see [Azure data encryption at rest](https://docs.microsoft.com/azure/security/fundamentals/encryption-atrest).
127
188
128
189
### Encryption in transit
@@ -143,6 +204,22 @@ SSH passwords and keys to compute targets like Azure HDInsight and VMs are store
143
204
144
205
Each workspace has an associated system-assigned managed identity that has the same name as the workspace. This managed identity has access to all keys, secrets, and certificates in the key vault.
145
206
207
+
## Data collection and handling
208
+
209
+
### Microsoft collected data
210
+
211
+
Microsoft may collect non-user identifying information like resource names (for example the dataset name, or the machine learning experiment name), or job environment variables for diagnostic purposes. All such data is stored using Microsoft-managed keys in storage hosted in Microsoft owned subscriptions and follows [Microsoft’s standard Privacy policy and data handling standards](https://privacy.microsoft.com/privacystatement).
212
+
213
+
Microsoft also recommends not storing sensitive information (such as account key secrets) in environment variables. Environment variables are logged, encrypted, and stored by us.
214
+
215
+
You may opt out from diagnostic data being collected by setting the `hbi_workspace` parameter to `TRUE` while provisioning the workspace. This functionality is supported when using the AzureML Python SDK, CLI, REST APIs, or Azure Resource Manager templates.
216
+
217
+
### Microsoft-generated data
218
+
219
+
When using services such as Automated Machine Learning, Microsoft may generate a transient, pre-processed data for training multiple models. This data is stored in a datastore in your workspace, which allows you to enforce access controls and encryption appropriately.
220
+
221
+
You may also want to encrypt [diagnostic information logged from your deployed endpoint](how-to-enable-app-insights.md) into your Azure Application Insights instance.
222
+
146
223
## Monitoring
147
224
148
225
### Metrics
@@ -163,7 +240,15 @@ This screenshot shows the activity log of a workspace:
163
240
164
241
[](media/concept-enterprise-security/workspace-activity-log-expanded.png#lightbox)
165
242
166
-
Scoring request details are stored in Application Insights. Application Insights is created in your subscription when you create a workspace. Logged information includes fields like HTTPMethod, UserAgent, ComputeType, RequestUrl, StatusCode, RequestId, and Duration.
243
+
Scoring request details are stored in Application Insights. Application Insights is created in your subscription when you create a workspace. Logged information includes fields such as:
244
+
245
+
* HTTPMethod
246
+
* UserAgent
247
+
* ComputeType
248
+
* RequestUrl
249
+
* StatusCode
250
+
* RequestId
251
+
* Duration
167
252
168
253
> [!IMPORTANT]
169
254
> Some actions in the Azure Machine Learning workspace don't log information to the activity log. For example, the start of a training run and the registration of a model aren't logged.
@@ -176,8 +261,8 @@ Scoring request details are stored in Application Insights. Application Insights
176
261
177
262
The following diagram shows the create workspace workflow.
178
263
179
-
*The user signs in to Azure AD from one of the supported Azure Machine Learning clients (Azure CLI, Python SDK, Azure portal) and requests the appropriate Azure Resource Manager token.
180
-
*The user calls Azure Resource Manager to create the workspace.
264
+
*You sign in to Azure AD from one of the supported Azure Machine Learning clients (Azure CLI, Python SDK, Azure portal) and request the appropriate Azure Resource Manager token.
265
+
*You call Azure Resource Manager to create the workspace.
181
266
* Azure Resource Manager contacts the Azure Machine Learning resource provider to provision the workspace.
182
267
183
268
Additional resources are created in the user's subscription during workspace creation:
@@ -205,7 +290,7 @@ The following diagram shows the training workflow.
205
290
206
291
* Azure Machine Learning is called with the snapshot ID for the code snapshot saved in the previous section.
207
292
* Azure Machine Learning creates a run ID (optional) and a Machine Learning service token, which is later used by compute targets like Machine Learning Compute/VMs to communicate with the Machine Learning service.
208
-
* You can choose either a managed compute target (like Machine Learning Compute) or an unmanaged compute target (like VMs) to run your training jobs. Here are the data flows for both scenarios:
293
+
* You can choose either a managed compute target (like Machine Learning Compute) or an unmanaged compute target (like VMs) to run training jobs. Here are the data flows for both scenarios:
209
294
* VMs/HDInsight, accessed by SSH credentials in a key vault in the Microsoft subscription. Azure Machine Learning runs management code on the compute target that:
210
295
211
296
1. Prepares the environment. (Docker is an option for VMs and local computers. See the following steps for Machine Learning Compute to understand how running experiments on Docker containers works.)
0 commit comments