You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/machine-learning/tutorial-network-isolation-for-feature-store.md
+18-18Lines changed: 18 additions & 18 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -9,7 +9,7 @@ ms.subservice: core
9
9
ms.topic: tutorial
10
10
author: ynpandey
11
11
ms.author: yogipandey
12
-
ms.date: 08/30/2023
12
+
ms.date: 09/13/2023
13
13
ms.reviewer: franksolomon
14
14
ms.custom: sdkv2
15
15
#Customer intent: As a professional data scientist, I want to know how to build and deploy a model with Azure Machine Learning by using Python in a Jupyter Notebook.
An Azure Machine Learning managed feature store lets you discover, create and operationalize features. Features serve as the connective tissue in the machine learning lifecycle, starting from the prototyping phase, where you experiment with various features. That lifecycle continues to the operationalization phase, where you deploy your models, and use inference to look up feature data. For more information about feature stores, see [feature store concepts](./concept-what-is-managed-feature-store.md).
22
+
An Azure Machine Learning managed feature store lets you discover, create, and operationalize features. Features serve as the connective tissue in the machine learning lifecycle, starting from the prototyping phase, where you experiment with various features. That lifecycle continues to the operationalization phase, where you deploy your models, and inference steps look up the feature data. For more information about feature stores, see the [feature store concepts](./concept-what-is-managed-feature-store.md) document.
23
23
24
24
This tutorial describes how to configure secure ingress through a private endpoint, and secure egress through a managed virtual network.
25
25
@@ -55,10 +55,10 @@ Part 1 of this tutorial series showed how to create a feature set specification
55
55
56
56
For more information, see [Configure for serverless spark job](./how-to-managed-network.md#configure-for-serverless-spark-jobs).
57
57
58
-
* Your user account must have the `Owner` or `Contributor` role assigned to the resource group where the feature store will be created. Your user account also needs the `User Access Administrator` role.
58
+
* Your user account must have the `Owner` or `Contributor` role assigned to the resource group where you create the feature store. Your user account also needs the `User Access Administrator` role.
59
59
60
60
> [!IMPORTANT]
61
-
> For your Azure Machine Learning workspace, set the `isolation_mode` to `allow_internet_outbound`. This is the only `isolation_mode` option available at this time. However, we are actively working to add `allow_only_approved_outbound` isolation_mode functionality. As a workaround, this notebook will show how to connect to sources, materialization store and observation data securely through private endpoints.
61
+
> For your Azure Machine Learning workspace, set the `isolation_mode` to `allow_internet_outbound`. This is the only `isolation_mode` option available at this time. However, we are actively working to add `allow_only_approved_outbound` isolation_mode functionality. As a workaround, this tutorial will show how to connect to sources, materialization store and observation data securely through private endpoints.
62
62
63
63
## Set up
64
64
@@ -121,15 +121,15 @@ To prepare the notebook environment for development:
121
121
122
122
## Provision the necessary resources
123
123
124
-
You can create a new Azure Data Lake Storage (ADLS) Gen2 storage account and containers, or reuse existing storage account and container resources for the feature store. In a real-world situation, different storage accounts can host the ADLS Gen2 containers. Both options will work, depending on your specific requirements.
124
+
You can create a new Azure Data Lake Storage (ADLS) Gen2 storage account and containers, or reuse existing storage account and container resources for the feature store. In a real-world situation, different storage accounts can host the ADLS Gen2 containers. Both options work, depending on your specific requirements.
125
125
126
-
For this tutorial, we'll create three separate storage containers in the same ADLS Gen2 storage account:
126
+
For this tutorial, you create three separate storage containers in the same ADLS Gen2 storage account:
127
127
128
128
* Source data
129
129
* Offline store
130
130
* Observation data
131
131
132
-
1. Create an ADLS Gen2 storage account for source data, offline store and observation data.
132
+
1. Create an ADLS Gen2 storage account for source data, offline store, and observation data.
133
133
134
134
1. Provide the name of an Azure Data Lake Storage Gen2 storage account in the following code sample. You can execute the following code cell with the provided default settings. Optionally, you can override the default settings.
135
135
@@ -210,7 +210,7 @@ For this tutorial, we'll create three separate storage containers in the same AD
210
210
|Storage account of feature store offline store |Storage Blob Data Contributor role|
211
211
|Storage accounts of source data |Storage Blob Data Contributor role|
212
212
213
-
The next CLI commands will assign the **Storage Blob Data Contributor** role to the UAI. In this example, "Storage accounts of source data" doesn't apply because we read the sample data from a public access blob storage. To use your own data sources, you must assign the required roles to the UAI. To learn more about access control, see role-based access control for [Azure storage accounts](../storage/blobs/data-lake-storage-access-control-model.md#role-based-access-control-azure-rbac) and [Azure Machine Learning workspace](./how-to-assign-roles.md).
213
+
The next CLI commands will assign the **Storage Blob Data Contributor** role to the UAI. In this example, "Storage accounts of source data" doesn't apply because you read the sample data from a public access blob storage. To use your own data sources, you must assign the required roles to the UAI. To learn more about access control, see role-based access control for [Azure storage accounts](../storage/blobs/data-lake-storage-access-control-model.md#role-based-access-control-azure-rbac) and [Azure Machine Learning workspace](./how-to-assign-roles.md).
214
214
215
215
[!notebook-python[] (~/azureml-examples-main/sdk/python/featurestore_sample/notebooks/sdk_and_cli/network_isolation/Network Isolation for Feature store.ipynb?name=uai-offline-role-cli)]
216
216
@@ -248,7 +248,7 @@ For this tutorial, we'll create three separate storage containers in the same AD
248
248
249
249
[!notebook-python[] (~/azureml-examples-main/sdk/python/featurestore_sample/notebooks/sdk_and_cli/network_isolation/Network Isolation for Feature store.ipynb?name=uai-fs-role-cli)]
250
250
251
-
Follow these instructions to [get the Azure Active Directory Object ID for your user identity](/partner-center/find-ids-and-domain-names#find-the-user-object-id). Then, use your Azure Active Directory Object ID in the following command to assign **AzureML Data Scientist** role to your user identity on the created feature store.
251
+
Follow these instructions to [get the Azure AD Object ID for your user identity](/partner-center/find-ids-and-domain-names#find-the-user-object-id). Then, use your Azure AD Object ID in the following command to assign **AzureML Data Scientist** role to your user identity on the created feature store.
252
252
253
253
[!notebook-python[] (~/azureml-examples-main/sdk/python/featurestore_sample/notebooks/sdk_and_cli/network_isolation/Network Isolation for Feature store.ipynb?name=aad-fs-role-cli)]
254
254
@@ -290,7 +290,7 @@ For this tutorial, we'll create three separate storage containers in the same AD
290
290
291
291
### Create private endpoints for the defined outbound rules
292
292
293
-
A `provision-network` command creates private endpoints from the managed virtual network where the materialization job executes to the source, offline store, observation data, default storage account and the default key vault for the feature store. This command may need about 20 minutes to complete.
293
+
A `provision-network` command creates private endpoints from the managed virtual network where the materialization job executes to the source, offline store, observation data, default storage account, and the default key vault for the feature store. This command may need about 20 minutes to complete.
294
294
295
295
[!notebook-python[] (~/azureml-examples-main/sdk/python/featurestore_sample/notebooks/sdk_and_cli/network_isolation/Network Isolation for Feature store.ipynb?name=fs-vnet-provision-cli)]
296
296
@@ -300,7 +300,7 @@ For this tutorial, we'll create three separate storage containers in the same AD
300
300
301
301
## Update the managed virtual network for the project workspace
302
302
303
-
Next, we update the managed virtual network for the project workspace. First, we get the subscription ID, resource group, and workspace name for the project workspace.
303
+
Next, update the managed virtual network for the project workspace. First, get the subscription ID, resource group, and workspace name for the project workspace.
304
304
305
305
[!notebook-python[] (~/azureml-examples-main/sdk/python/featurestore_sample/notebooks/sdk_and_cli/network_isolation/Network Isolation for Feature store.ipynb?name=lookup-subid-rg-wsname)]
306
306
@@ -343,7 +343,7 @@ For this tutorial, we'll create three separate storage containers in the same AD
343
343
344
344
A feature set specification is a self-contained feature set definition that can be developed and tested locally.
345
345
346
-
Here, we want to create the following rolling window aggregate features:
346
+
Create the following rolling window aggregate features:
347
347
348
348
* transactions three-day count
349
349
* transactions amount three-day sum
@@ -378,7 +378,7 @@ For this tutorial, we'll create three separate storage containers in the same AD
378
378
379
379
## Register a feature-store entity
380
380
381
-
Entities help enforce use of the same join key definitions across feature sets that use the same logical entities. Entity examples could include account entities, customer entities, etc. Entities are typically created once and then reused across feature sets. See [top level feature store entities document](./concept-top-level-entities-in-managed-feature-store.md) for more information.
381
+
Entities help enforce use of the same join key definitions across feature sets that use the same logical entities. Entity examples could include account entities, customer entities, etc. Entities are typically created once and then reused across feature sets. For more information, see the [top level feature store entities document](./concept-top-level-entities-in-managed-feature-store.md).
382
382
383
383
This code cell creates an account entity for the feature store.
384
384
@@ -418,31 +418,31 @@ For this tutorial, we'll create three separate storage containers in the same AD
418
418
419
419
### Load observation data
420
420
421
-
We start by exploring the observation data. The core data used for training and inference typically involves observation data. This is then joined with feature data, to create a full training data resource. Observation data is the data captured during the time of the event. In this case, it has core transaction data including transaction ID, account ID, and transaction amount values. Here, since the observation data is used for training, it also has the target variable appended (`is_fraud`).
421
+
Start by exploring the observation data. The core data used for training and inference typically involves observation data. The core data is then joined with feature data, to create a full training data resource. Observation data is the data captured during the time of the event. In this case, it has core transaction data including transaction ID, account ID, and transaction amount values. Here, since the observation data is used for training, it also has the target variable appended (`is_fraud`).
422
422
423
423
[!notebook-python[] (~/azureml-examples-main/sdk/python/featurestore_sample/notebooks/sdk_and_cli/network_isolation/Network Isolation for Feature store.ipynb?name=load-obs-data)]
424
424
425
425
### Get the registered feature set, and list its features
426
426
427
-
Next, we get a feature set by providing its name and version, and then we list features in this feature set. Also, we print some sample feature values.
427
+
Next, get a feature set by providing its name and version, and then list features in this feature set. Also, print some sample feature values.
428
428
429
429
[!notebook-python[] (~/azureml-examples-main/sdk/python/featurestore_sample/notebooks/sdk_and_cli/network_isolation/Network Isolation for Feature store.ipynb?name=get-txn-fset)]
430
430
431
431
[!notebook-python[] (~/azureml-examples-main/sdk/python/featurestore_sample/notebooks/sdk_and_cli/network_isolation/Network Isolation for Feature store.ipynb?name=print-txn-fset-sample-values)]
432
432
433
433
### Select features, and generate training data
434
434
435
-
Here, we select features for the training data, and we use the feature store SDK to generate the training data.
435
+
Select features for the training data, and use the feature store SDK to generate the training data.
436
436
437
437
[!notebook-python[] (~/azureml-examples-main/sdk/python/featurestore_sample/notebooks/sdk_and_cli/network_isolation/Network Isolation for Feature store.ipynb?name=select-features-and-gen-training-data)]
438
438
439
-
We can see that a point-in-time join appended the features to the training data.
439
+
You can see that a point-in-time join appended the features to the training data.
440
440
441
441
## Optional next steps
442
442
443
443
Now that you successfully created a secure feature store and submitted a successful materialization run, you can go through the tutorial series to build an understanding of the feature store.
444
444
445
-
This tutorial contains a mixture of steps from tutorials 1 and 2 of this series. Remember to replace the necessary public storage containers used in the other notebooks with the ones created in this notebook, for the network isolation.
445
+
This tutorial contains a mixture of steps from tutorials 1 and 2 of this series. Remember to replace the necessary public storage containers used in the other tutorial notebooks with the ones created in this tutorial notebook, for the network isolation.
446
446
447
447
We have reached the end of the tutorial. Your training data uses features from a feature store. You can either save it to storage for later use, or directly run model training on it.
0 commit comments