You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/machine-learning/tutorial-develop-feature-set-with-custom-source.md
+10-9Lines changed: 10 additions & 9 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -9,7 +9,7 @@ ms.subservice: core
9
9
ms.topic: tutorial
10
10
author: ynpandey
11
11
ms.author: yogipandey
12
-
ms.date: 10/27/2023
12
+
ms.date: 11/28/2023
13
13
ms.reviewer: franksolomon
14
14
ms.custom:
15
15
- sdkv2
@@ -22,7 +22,7 @@ ms.custom:
22
22
23
23
An Azure Machine Learning managed feature store lets you discover, create, and operationalize features. Features serve as the connective tissue in the machine learning lifecycle, starting from the prototyping phase, where you experiment with various features. That lifecycle continues to the operationalization phase, where you deploy your models, and inference steps look up the feature data. For more information about feature stores, see [feature store concepts](./concept-what-is-managed-feature-store.md).
24
24
25
-
Part 1 of this tutorial series showed how to create a feature set specification with custom transformations, enable materialization and perform a backfill. Part 2 of this tutorial series showed how to experiment with features in the experimentation and training flows. Part 4 described how to run batch inference.
25
+
Part 1 of this tutorial series showed how to create a feature set specification with custom transformations, enable materialization and perform a backfill. Part 2 showed how to experiment with features in the experimentation and training flows. Part 3 explained recurrent materialization for the `transactions` feature set, and showed how to run a batch inference pipeline on the registered model. Part 4 described how to run batch inference.
26
26
27
27
In this tutorial, you'll
28
28
@@ -36,26 +36,27 @@ In this tutorial, you'll
36
36
> [!NOTE]
37
37
> This tutorial uses an Azure Machine Learning notebook with **Serverless Spark Compute**.
38
38
39
-
* Make sure you execute the notebook from Tutorial 1. That notebook includes creation of a feature store and a feature set, followed by enabling of materialization and performance of backfill.
39
+
* Make sure you complete the previous tutorials in this series. This tutorial reuses feature store and other resources created in those earlier tutorials.
40
40
41
41
## Set up
42
42
43
43
This tutorial uses the Python feature store core SDK (`azureml-featurestore`). The Python SDK is used for create, read, update, and delete (CRUD) operations, on feature stores, feature sets, and feature store entities.
44
44
45
45
You don't need to explicitly install these resources for this tutorial, because in the set-up instructions shown here, the `conda.yml` file covers them.
46
46
47
-
### Configure the Azure Machine Learning Spark notebook.
47
+
### Configure the Azure Machine Learning Spark notebook
48
48
49
49
You can create a new notebook and execute the instructions in this tutorial step by step. You can also open and run the existing notebook *featurestore_sample/notebooks/sdk_only/5. Develop a feature set with custom source.ipynb*. Keep this tutorial open and refer to it for documentation links and more explanation.
50
50
51
51
1. On the top menu, in the **Compute** dropdown list, select **Serverless Spark Compute** under **Azure Machine Learning Serverless Spark**.
52
52
53
53
2. Configure the session:
54
54
55
-
1. When the toolbar displays **Configure session**, select it.
56
-
2. On the **Python packages** tab, select **Upload Conda file**.
57
-
3. Upload the *conda.yml* file that you [uploaded in the first tutorial](./tutorial-get-started-with-feature-store.md#prepare-the-notebook-environment).
58
-
4. Optionally, increase the session time-out (idle time) to avoid frequent prerequisite reruns.
55
+
1. Select **Configure session** in the top status bar.
56
+
2. Select the **Python packages** tab, s
57
+
3. Select **Upload Conda file**.
58
+
4. Upload the *conda.yml* file that you [uploaded in the first tutorial](./tutorial-get-started-with-feature-store.md#prepare-the-notebook-environment).
59
+
5. Optionally, increase the session time-out (idle time) to avoid frequent prerequisite reruns.
59
60
60
61
## Set up the root directory for the samples
61
62
This code cell sets up the root directory for the samples. It needs about 10 minutes to install all dependencies and start the Spark session.
@@ -159,4 +160,4 @@ If you created a resource group for the tutorial, you can delete that resource g
159
160
## Next steps
160
161
161
162
*[Network isolation with feature store](./tutorial-network-isolation-for-feature-store.md)
Copy file name to clipboardExpand all lines: articles/machine-learning/tutorial-enable-recurrent-materialization-run-batch-inference.md
+7-6Lines changed: 7 additions & 6 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -9,7 +9,7 @@ ms.subservice: core
9
9
ms.topic: tutorial
10
10
author: rsethur
11
11
ms.author: seramasu
12
-
ms.date: 10/27/2023
12
+
ms.date: 11/28/2023
13
13
ms.reviewer: franksolomon
14
14
ms.custom: sdkv2, build-2023, ignite-2023
15
15
#Customer intent: As a professional data scientist, I want to know how to build and deploy a model with Azure Machine Learning by using Python in a Jupyter Notebook.
@@ -37,14 +37,15 @@ Before you proceed with this tutorial, be sure to complete the first and second
37
37
38
38
To run this tutorial, you can create a new notebook and execute the instructions step by step. You can also open and run the existing notebook named *3. Enable recurrent materialization and run batch inference*. You can find that notebook, and all the notebooks in this series, in the *featurestore_sample/notebooks* directory. You can choose *sdk_only* or *sdk_and_cli*. Keep this tutorial open and refer to it for documentation links and more explanation.
39
39
40
-
1.On the top menu, in the **Compute** dropdown list, select **Serverless Spark Compute** under **Azure Machine Learning Serverless Spark**.
40
+
1.In the **Compute** dropdown list in the top nav, select **Serverless Spark Compute** under **Azure Machine Learning Serverless Spark**.
41
41
42
42
2. Configure the session:
43
43
44
-
1. When the toolbar displays **Configure session**, select it.
45
-
2. On the **Python packages** tab, select **Upload conda file**.
46
-
3. Upload the *conda.yml* file that you [uploaded in the first tutorial](./tutorial-get-started-with-feature-store.md#prepare-the-notebook-environment).
47
-
4. Optionally, increase the session time-out (idle time) to avoid frequent prerequisite reruns.
44
+
1. Select **Configure session** in the top status bar.
45
+
2. Select the **Python packages** tab.
46
+
3. Select **Upload conda file**.
47
+
4. Select the `azureml-examples/sdk/python/featurestore-sample/project/env/online.yml` file from your local machine.
48
+
5. Optionally, increase the session time-out (idle time) to avoid frequent prerequisite reruns.
Copy file name to clipboardExpand all lines: articles/machine-learning/tutorial-get-started-with-feature-store.md
+30-7Lines changed: 30 additions & 7 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -8,7 +8,7 @@ ms.subservice: core
8
8
ms.topic: tutorial
9
9
author: rsethur
10
10
ms.author: seramasu
11
-
ms.date: 11/01/2023
11
+
ms.date: 11/28/2023
12
12
ms.reviewer: franksolomon
13
13
ms.custom: sdkv2, build-2023, ignite-2023
14
14
#Customer intent: As a professional data scientist, I want to know how to build and deploy a model with Azure Machine Learning by using Python in a Jupyter Notebook.
@@ -45,7 +45,7 @@ Before you proceed with this tutorial, be sure to cover these prerequisites:
45
45
46
46
* An Azure Machine Learning workspace. For more information about workspace creation, see [Quickstart: Create workspace resources](./quickstart-create-resources.md).
47
47
48
-
* On your user account, the Owner or Contributor role for the resource group where the feature store is created.
48
+
* On your user account, the Owner role for the resource group where the feature store is created.
49
49
50
50
If you choose to use a new resource group for this tutorial, you can easily delete all the resources by deleting the resource group.
51
51
@@ -59,18 +59,25 @@ This tutorial uses an Azure Machine Learning Spark notebook for development.
59
59
60
60
:::image type="content" source="media/tutorial-get-started-with-feature-store/clone-featurestore-example-notebooks.png" lightbox="media/tutorial-get-started-with-feature-store/clone-featurestore-example-notebooks.png" alt-text="Screenshot that shows selection of the sample directory in Azure Machine Learning studio.":::
61
61
62
-
1. The **Select target directory** panel opens. Select the user directory (in this case, **testUser**), and then select **Clone**.
62
+
1. The **Select target directory** panel opens. Select the **Users** directory, then select _your user name_, and finally select **Clone**.
63
63
64
64
:::image type="content" source="media/tutorial-get-started-with-feature-store/select-target-directory.png" lightbox="media/tutorial-get-started-with-feature-store/select-target-directory.png" alt-text="Screenshot showing selection of the target directory location in Azure Machine Learning studio for the sample resource.":::
65
65
66
66
1. To configure the notebook environment, you must upload the *conda.yml* file:
67
67
68
68
1. Select **Notebooks** on the left pane, and then select the **Files** tab.
69
-
1. Browse to the *env* directory (select **Users** > **testUser** > **featurestore_sample** > **project** > **env**), and then select the *conda.yml* file. In this path, *testUser* is the user directory.
69
+
1. Browse to the *env* directory (select **Users** > **your_user_name** > **featurestore_sample** > **project** > **env**), and then select the *conda.yml* file.
70
70
1. Select **Download**.
71
71
72
72
:::image type="content" source="media/tutorial-get-started-with-feature-store/download-conda-file.png" lightbox="media/tutorial-get-started-with-feature-store/download-conda-file.png" alt-text="Screenshot that shows selection of the Conda YAML file in Azure Machine Learning studio.":::
73
73
74
+
1. Select **Serverless Spark Compute** in the top navigation **Compute** dropdown. This operation might take one to two minutes. Wait for a status bar in the top to display **Configure session**.
75
+
1. Select **Configure session** in the top status bar.
76
+
1. Select **Python packages**.
77
+
1. Select **Upload conda files**.
78
+
1. Select the `conda.yml` file you downloaded on your local device.
79
+
1. (Optional) Increase the session time-out (idle time in minutes) to reduce the serverless spark cluster startup time.
80
+
74
81
1. In the Azure Machine Learning environment, open the notebook, and then select **Configure session**.
75
82
76
83
:::image type="content" source="media/tutorial-get-started-with-feature-store/open-configure-session.png" lightbox="media/tutorial-get-started-with-feature-store/open-configure-session.png" alt-text="Screenshot that shows selections for configuring a session for a notebook.":::
@@ -104,7 +111,7 @@ Not applicable.
104
111
105
112
### [SDK and CLI track](#tab/SDK-and-CLI-track)
106
113
107
-
1. Install the Azure Machine Learning extension.
114
+
1. Install the Azure Machine Learning CLI extension.
108
115
109
116
[!notebook-python[] (~/azureml-examples-main/sdk/python/featurestore_sample/notebooks/sdk_and_cli/1. Develop a feature set and register with managed feature store.ipynb?name=install-ml-ext-cli)]
110
117
@@ -331,7 +338,7 @@ As a best practice, entities help enforce use of the same join key definition ac
331
338
332
339
1. Initialize the feature store CRUD client.
333
340
334
-
As explained earlier in this tutorial, `MLClient` is used for creating, reading, updating, and deleting a feature store asset. The notebook code cell sample shown here searches for the feature store that you created in an earlier step. Here, you can't reuse the same `ml_client` value that you used earlier in this tutorial, because it is scoped at the resource group level. Proper scoping is a prerequisite for feature store creation.
341
+
As explained earlier in this tutorial, `MLClient` is used for creating, reading, updating, and deleting a feature store asset. The notebook code cell sample shown here searches for the feature store that you created in an earlier step. Here, you can't reuse the same `ml_client` value that you used earlier in this tutorial, because it's scoped at the resource group level. Proper scoping is a prerequisite for feature store creation.
335
342
336
343
In this code sample, the client is scoped at feature store level.
337
344
@@ -433,10 +440,26 @@ The Storage Blob Data Reader role must be assigned to your user account on the o
433
440
434
441
### [SDK track](#tab/SDK-track)
435
442
443
+
#### Set spark.sql.shuffle.partitions in the yaml file according to the feature data size
444
+
445
+
The spark configuration `spark.sql.shuffle.partitions` is an OPTIONAL parameter that can affect the number of parquet files generated (per day) when the feature set is materialized into the offline store. The default value of this parameter is 200. As best practice, avoid generation of many small parquet files. If offline feature retrieval becomes slow after feature set materialization, go to the corresponding folder in the offline store to check whether the issue involves too many small parquet files (per day), and adjust the value of this parameter accordingly.
446
+
447
+
> [!NOTE]
448
+
> The sample data used in this notebook is small. Therefore, this parameter is set to 1 in the
449
+
> featureset_asset_offline_enabled.yaml file.
450
+
436
451
[!notebook-python[] (~/azureml-examples-main/sdk/python/featurestore_sample/notebooks/sdk_only/1. Develop a feature set and register with managed feature store.ipynb?name=enable-offline-mat-txns-fset)]
437
452
438
453
### [SDK and CLI track](#tab/SDK-and-CLI-track)
439
454
455
+
#### Set spark.sql.shuffle.partitions in the yaml file according to the feature data size
456
+
457
+
The spark configuration `spark.sql.shuffle.partitions` is an OPTIONAL parameter that can affect the number of parquet files generated (per day) when the feature set is materialized into the offline store. The default value of this parameter is 200. As best practice, avoid generation of many small parquet files. If offline feature retrieval becomes slow after feature set materialization, go to the corresponding folder in the offline store to check whether the issue involves too many small parquet files (per day), and adjust the value of this parameter accordingly.
458
+
459
+
> [!NOTE]
460
+
> The sample data used in this notebook is small. Therefore, this parameter is set to 1 in the
461
+
> featureset_asset_offline_enabled.yaml file.
462
+
440
463
[!notebook-python[] (~/azureml-examples-main/sdk/python/featurestore_sample/notebooks/sdk_and_cli/1. Develop a feature set and register with managed feature store.ipynb?name=enable-offline-mat-txns-fset-cli)]
441
464
442
465
---
@@ -503,7 +526,7 @@ You can explore feature materialization status for a feature set in the **Materi
503
526
- The data can have a maximum of 2,000 *data intervals*. If your data contains more than 2,000 *data intervals*, create a new feature set version.
504
527
- You can provide a list of more than one data statuses (for example, `["None", "Incomplete"]`) in a single backfill job.
505
528
- During backfill, a new materialization job is submitted for each *data interval* that falls within the defined feature window.
506
-
- If a materialization job is pending, or it is running for a *data interval* that hasn't yet been backfilled, a new job isn't submitted for that *data interval*.
529
+
- If a materialization job is pending, or that job is running for a *data interval* that hasn't yet been backfilled, a new job isn't submitted for that *data interval*.
Copy file name to clipboardExpand all lines: articles/machine-learning/tutorial-online-materialization-inference.md
+6-6Lines changed: 6 additions & 6 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -9,7 +9,7 @@ ms.subservice: core
9
9
ms.topic: tutorial
10
10
author: ynpandey
11
11
ms.author: yogipandey
12
-
ms.date: 10/27/2023
12
+
ms.date: 11/28/2023
13
13
ms.reviewer: franksolomon
14
14
ms.custom: sdkv2, ignite-2023
15
15
#Customer intent: As a professional data scientist, I want to know how to build and deploy a model with Azure Machine Learning by using Python in a Jupyter Notebook.
@@ -46,14 +46,14 @@ You don't need to explicitly install these resources for this tutorial, because
46
46
47
47
You can create a new notebook and execute the instructions in this tutorial step by step. You can also open and run the existing notebook *featurestore_sample/notebooks/sdk_only/4. Enable online store and run online inference.ipynb*. Keep this tutorial open and refer to it for documentation links and more explanation.
48
48
49
-
1.On the top menu, in the **Compute** dropdown list, select **Serverless Spark Compute** under**Azure Machine Learning Serverless Spark**.
49
+
1.In the **Compute** dropdown list in the top nav, select**Serverless Spark Compute**.
50
50
51
51
2. Configure the session:
52
52
53
-
1. Download *featurestore-sample/project/env/online.yml* file to your local machine.
54
-
2.When the toolbar displays **Configure session**, select it.
55
-
3.On the **Python packages** tab, select **Upload Conda file**.
56
-
4. Upload the *online.yml* file in the same way as described in [uploading *conda.yml* file in the first tutorial](./tutorial-get-started-with-feature-store.md#prepare-the-notebook-environment).
53
+
1. Download *azureml-examples/sdk/python/featurestore-sample/project/env/online.yml* file to your local machine.
54
+
2.In **configure session** in the top nav, select **Python packages**
55
+
3.Select **Upload Conda file**
56
+
4. Upload the *online.yml* file from your local machine, with the same steps as described in [uploading *conda.yml* file in the first tutorial](./tutorial-get-started-with-feature-store.md#prepare-the-notebook-environment).
57
57
5. Optionally, increase the session time-out (idle time) to avoid frequent prerequisite reruns.
58
58
59
59
2. This code cell starts the Spark session. It needs about 10 minutes to install all dependencies and start the Spark session.
0 commit comments