You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/machine-learning/tutorial-online-materialization-inference.md
+40-40Lines changed: 40 additions & 40 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -9,32 +9,32 @@ ms.subservice: core
9
9
ms.topic: tutorial
10
10
author: fbsolo-ms1
11
11
ms.author: franksolomon
12
-
ms.date: 11/28/2023
12
+
ms.date: 11/21/2024
13
13
ms.reviewer: yogipandey
14
14
ms.custom: sdkv2, ignite-2023
15
15
#Customer intent: As a professional data scientist, I want to know how to build and deploy a model with Azure Machine Learning by using Python in a Jupyter Notebook.
16
16
---
17
17
18
18
# Tutorial 4: Enable online materialization and run online inference
19
19
20
-
An Azure Machine Learning managed feature store lets you discover, create, and operationalize features. Features serve as the connective tissue in the machine learning lifecycle, starting from the prototyping phase, where you experiment with various features. That lifecycle continues to the operationalization phase, where you deploy your models, and inference steps look up the feature data. For more information about feature stores, see [feature store concepts](./concept-what-is-managed-feature-store.md).
20
+
An Azure Machine Learning managed feature store lets you discover, create, and operationalize features. Features serve as the connective tissue in the machine learning lifecycle, starting from the prototyping phase, where you experiment with various features. That lifecycle continues to the operationalization phase, where you deploy your models, and inference steps look up the feature data. For more information about feature stores, visit the [feature store concepts](./concept-what-is-managed-feature-store.md) resource.
21
21
22
22
Part 1 of this tutorial series showed how to create a feature set specification with custom transformations, and use that feature set to generate training data. Part 2 of the series showed how to enable materialization, and perform a backfill. Additionally, Part 2 showed how to experiment with features, as a way to improve model performance. Part 3 showed how a feature store increases agility in the experimentation and training flows. Part 3 also described how to run batch inference.
23
23
24
-
In this tutorial, you'll
24
+
In this tutorial, you
25
25
26
26
> [!div class="checklist"]
27
-
> * Set up an Azure Cache for Redis.
28
-
> * Attach a cache to a feature store as the online materialization store, and grant the necessary permissions.
29
-
> * Materialize a feature set to the online store.
30
-
> * Test an online deployment with mock data.
27
+
> * Set up an Azure Cache for Redis
28
+
> * Attach a cache to a feature store as the online materialization store, and grant the necessary permissions
29
+
> * Materialize a feature set to the online store
30
+
> * Test an online deployment with mock data
31
31
32
32
## Prerequisites
33
33
34
34
> [!NOTE]
35
-
> This tutorial uses Azure Machine Learning notebook with **Serverless Spark Compute**.
35
+
> This tutorial uses a Azure Machine Learning notebook with **Serverless Spark Compute**.
36
36
37
-
*Make sure you complete parts 1 through 4 of this tutorial series. This tutorial reuses the feature store and other resources created in the earlier tutorials.
37
+
*Be sure to complete parts 1 through 4 of this tutorial series. This tutorial reuses the feature store and other resources created in those earlier tutorials.
38
38
39
39
## Set up
40
40
@@ -48,34 +48,34 @@ You don't need to explicitly install these resources for this tutorial, because
48
48
49
49
1. In the **Compute** dropdown list in the top nav, select **Serverless Spark Compute**.
50
50
51
-
2. Configure the session:
51
+
1. Configure the session:
52
52
53
-
1. Download *azureml-examples/sdk/python/featurestore-sample/project/env/online.yml* file to your local machine.
54
-
2. In **configure session** in the top nav, select **Python packages**
55
-
3. Select **Upload Conda file**
56
-
4. Upload the *online.yml* file from your local machine, with the same steps as described in [uploading *conda.yml* file in the first tutorial](./tutorial-get-started-with-feature-store.md#prepare-the-notebook-environment).
57
-
5. Optionally, increase the session time-out (idle time) to avoid frequent prerequisite reruns.
53
+
1. Download the *azureml-examples/sdk/python/featurestore-sample/project/env/online.yml* file to your local machine
54
+
1. In **configure session** in the top nav, select **Python packages**
55
+
1. Select **Upload Conda file**
56
+
1. Upload the *online.yml* file from your local machine, with the same steps as described in [uploading *conda.yml* file in the first tutorial](./tutorial-get-started-with-feature-store.md#prepare-the-notebook-environment)
57
+
1. Optionally, increase the session time-out (idle time) to avoid frequent prerequisite reruns
58
58
59
-
2. This code cell starts the Spark session. It needs about 10 minutes to install all dependencies and start the Spark session.
59
+
1. This code cell starts the Spark session. It needs about 10 minutes to install all dependencies and start the Spark session.
4. Initialize the `MLClient` for the project workspace, where the tutorial notebook runs. The `MLClient` is used for the create, read, update, and delete (CRUD) operations.
67
+
1. Initialize the `MLClient` for the project workspace where the tutorial notebook runs. The `MLClient` is used for the create, read, update, and delete (CRUD) operations.
5. Initialize the `MLClient` for the feature store workspace, for the create, read, update, and delete (CRUD) operations on the feature store workspace.
71
+
1. Initialize the `MLClient` for the feature store workspace, for the create, read, update, and delete (CRUD) operations on the feature store workspace.
> A **feature store workspace** supports feature reuse across projects. A **project workspace** - the current workspace in use - leverages features from a specific feature store, to train and inference models. Many project workspaces can share and reuse the same feature store workspace.
77
77
78
-
6. As mentioned earlier, this tutorial uses the Python feature store core SDK (`azureml-featurestore`). This initialized SDK client is used for create, read, update, and delete (CRUD) operations, on feature stores, feature sets, and feature store entities.
78
+
1. As mentioned earlier, this tutorial uses the Python feature store core SDK (`azureml-featurestore`). This initialized SDK client is used for create, read, update, and delete (CRUD) operations, on feature stores, feature sets, and feature store entities.
1. You can create a new Redis instance. You would select the Redis Cache tier (basic, standard, premium, or enterprise). Choose an SKU family available for the cache tier you select. For more information about tiers and cache performance, see[this resource](/azure/azure-cache-for-redis/cache-best-practices-performance). For more information about SKU tiers and Azure cache families, see[this resource](https://azure.microsoft.com/pricing/details/cache/).
90
+
1. You can create a new Redis instance. You would select the appropriate Redis Cache tier (basic, standard, premium, or enterprise). Choose an SKU family available for the cache tier you select. For more information about tiers and cache performance, visit[this resource](/azure/azure-cache-for-redis/cache-best-practices-performance). For more information about SKU tiers and Azure cache families, visit[this resource](https://azure.microsoft.com/pricing/details/cache/).
91
91
92
92
Execute this code cell to create an Azure Cache for Redis with premium tier, SKU family `P`, and cache capacity 2. It might take between 5 and 10 minutes to prepare the Redis instance.
93
93
@@ -106,28 +106,28 @@ The feature store needs the Azure Cache for Redis as an attached resource, for u
106
106
> [!NOTE]
107
107
> During a feature store update, setting `grant_materiaization_permissions=True` alone will not grant the required RBAC permissions to the UAI. The role assignments to UAI will happen only when one of the following is updated:
108
108
> - Materialization identity
109
-
> - Online store target
110
109
> - Offline store target
110
+
> - Online store target
111
111
112
112
For an example that shows how to do this with the SDK, visit the [Tutorial: Different Approaches for Provisioning a Managed Feature Store](https://github.com/Azure/azureml-examples/blob/main/sdk/python/featurestore_sample/notebooks/sdk_and_cli/4.Provision-feature-store.ipynb) resource.
113
113
114
114
## Materialize the `accounts` feature set data to online store
115
115
116
116
### Enable materialization on the `accounts` feature set
117
117
118
-
Earlier in this tutorial series, you did **not** materialize the accounts feature set because it had precomputed features, and only batch inference scenarios used it. This code cell enables online materialization so that the features become available in the online store, with low latency access. For consistency, it also enables offline materialization. Enabling offline materialization is optional.
118
+
Earlier in this tutorial series, you did **not** materialize the accounts feature set, because it had precomputed features, and only batch inference scenarios used it. This code cell enables online materialization so that the features become available in the online store, with low latency access. For consistency, it also enables offline materialization. Enabling offline materialization is optional.
The `begin_backfill` function backfills data to all the materialization stores enabled for this feature set. Here offline and online materialization are both enabled. This code cell backfills the data to both online and offline materialization stores.
124
+
The `begin_backfill` function backfills data to all the materialization stores enabled for this feature set. Here, offline and online materialization are both enabled. This code cell backfills the data to both online and offline materialization stores.
> - The `feature_window_start_time` and `feature_window_end_time` granularily is limited to seconds. Any milliseconds provided in the `datetime` object will be ignored.
130
-
> - A materialization job will only be submitted if there is data in the feature window matching the `data_status` defined while submitting the backfill job.
129
+
> - The `feature_window_start_time` and `feature_window_end_time` granularily is limited to seconds. Any millisecond value provided in the `datetime` object will be ignored.
130
+
> - A materialization job will only be submitted if there is data in the feature window that matches the `data_status` defined while submitting the backfill job.
131
131
132
132
This code cell tracks completion of the backfill job. With the Azure Cache for Redis premium tier provisioned earlier, this step might need approximately 10 minutes to complete.
133
133
@@ -152,10 +152,10 @@ Earlier in this tutorial series, you materialized `transactions` feature set dat
152
152
## Further explore online feature materialization
153
153
You can explore the feature materialization status for a feature set from the **Materialization jobs** UI.
154
154
155
-
1. Open the [Azure Machine Learning global landing page](https://ml.azure.com/home).
156
-
1. Select **Feature stores** in the left pane.
157
-
1. From the list of accessible feature stores, select the feature store for which you performed the backfill.
158
-
1. Select the **Materialization jobs** tab.
155
+
1. Open the [Azure Machine Learning global landing page](https://ml.azure.com/home)
156
+
1. Select **Feature stores** in the left pane
157
+
1. From the list of accessible feature stores, select the feature store for which you performed the backfill
158
+
1. Select the **Materialization jobs** tab
159
159
160
160
:::image type="content" source="media/tutorial-online-materialization-inference/feature-set-materialization-ui.png" lightbox="media/tutorial-online-materialization-inference/feature-set-materialization-ui.png" alt-text="Screenshot that shows the feature set Materialization jobs UI.":::
161
161
@@ -168,12 +168,12 @@ You can explore the feature materialization status for a feature set from the **
168
168
- Your data can have a maximum of 2,000 *data intervals*. If your data contains more than 2,000 *data intervals*, create a new feature set version.
169
169
- You can provide a list of more than one data statuses (for example, `["None", "Incomplete"]`) in a single backfill job.
170
170
- During backfill, a new materialization job is submitted for each *data interval* that falls in the defined feature window.
171
-
- A new job isn't submitted for a *data interval* if a materialization job is already pending, or is running for a *data interval* that hasn't yet been backfilled.
171
+
- A new job isn't submitted for a *data interval* if a materialization job is already pending, or if it's running for a *data interval* that hasn't yet been backfilled.
172
172
173
173
### Updating online materialization store
174
-
-If an online materialization store is to be updated at the feature store level, then all feature sets in the feature store should have online materialization disabled.
174
+
-To update an online materialization store at the feature store level, all feature sets in the feature store should have online materialization disabled.
175
175
- If online materialization is disabled on a feature set, the materialization status of the already-materialized data in the online materialization store is reset. This renders the already-materialized data unusable. You must resubmit your materialization jobs after you enable online materialization.
176
-
- If only offline materialization was initially enabled for a feature set, and online materialization is enabled later:
176
+
- If only offline materialization was initially enabled for a feature set, and online materialization is then enabled later:
177
177
- The default data materialization status of the data in the online store will be `None`.
178
178
- When the first online materialization job is submitted, the data already materialized in the offline store, if available, is used to calculate online features.
179
179
- If the *data interval* for online materialization partially overlaps the *data interval* of already materialized data located in the offline store, separate materialization jobs are submitted for the overlapping and nonoverlapping parts of the *data interval*.
@@ -192,18 +192,18 @@ Now, use your development environment to look up features from the online materi
192
192
193
193
Prepare some observation data for testing, and use that data to look up features from the online materialization store. During the online look-up, the keys (`accountID`) defined in the observation sample data might not exist in the Redis (due to `TTL`). In this case:
194
194
195
-
1. Open the Azure portal.
196
-
1. Navigate to the Redis instance.
197
-
1. Open the console for the Redis instance, and check for existing keys with the `KEYS *` command.
198
-
1. Replace the `accountID` values in the sample observation data with the existing keys.
195
+
1. Open the Azure portal
196
+
1. Navigate to the Redis instance
197
+
1. Open the console for the Redis instance, and check for existing keys with the `KEYS *` command
198
+
1. Replace the `accountID` values in the sample observation data with the existing keys
These steps looked up features from the online store. In the next step, you'll test online features using an Azure Machine Learning managed online endpoint.
202
+
These steps looked up features from the online store. In the next step, you'll test online features, with an Azure Machine Learning managed online endpoint.
203
203
204
204
## Test online features from Azure Machine Learning managed online endpoint
205
205
206
-
A managed online endpoint deploys and scores models for online/realtime inference. You can use any available inference technology - like Kubernetes, for example.
206
+
A managed online endpoint deploys and scores models for online/realtime inference. You can use any available inference technology - for example, Kubernetes.
207
207
208
208
This step involves these actions:
209
209
@@ -264,7 +264,7 @@ Deploy the model to online endpoint with this code cell. The deployment might ne
264
264
265
265
### Test online deployment with mock data
266
266
267
-
Execute this code cell to test the online deployment with the mock data. You should see`0` or `1` as the output of this cell.
267
+
Execute this code cell to test the online deployment with the mock data. The cell should show`0` or `1` as its output.
0 commit comments