|
| 1 | +--- |
| 2 | +title: "Tutorial 5: Enable online materialization and run online inference (preview)" |
| 3 | +titleSuffix: Azure Machine Learning managed feature store - basics |
| 4 | +description: This is part 5 of a tutorial series on managed feature store. |
| 5 | +services: machine-learning |
| 6 | +ms.service: machine-learning |
| 7 | + |
| 8 | +ms.subservice: core |
| 9 | +ms.topic: tutorial |
| 10 | +author: ynpandey |
| 11 | +ms.author: yogipandey |
| 12 | +ms.date: 09/12/2023 |
| 13 | +ms.reviewer: franksolomon |
| 14 | +ms.custom: sdkv2 |
| 15 | +#Customer intent: As a professional data scientist, I want to know how to build and deploy a model with Azure Machine Learning by using Python in a Jupyter Notebook. |
| 16 | +--- |
| 17 | + |
| 18 | +# Tutorial 5: Enable online materialization and run online inference (preview) |
| 19 | + |
| 20 | +[!INCLUDE [preview disclaimer](includes/machine-learning-preview-generic-disclaimer.md)] |
| 21 | + |
| 22 | +An Azure Machine Learning managed feature store lets you discover, create, and operationalize features. Features serve as the connective tissue in the machine learning lifecycle, starting from the prototyping phase, where you experiment with various features. That lifecycle continues to the operationalization phase, where you deploy your models, and use inference to look up feature data. For more information about feature stores, see [feature store concepts](./concept-what-is-managed-feature-store.md). |
| 23 | + |
| 24 | +Part 1 of this tutorial series showed how to create a feature set specification with custom transformations, and use that feature set to generate training data. Part 2 of the tutorial series showed how to enable materialization and perform a backfill. Part 3 of this tutorial series showed how to experiment with features, as a way to improve model performance. Part 3 also showed how a feature store increases agility in the experimentation and training flows. Part 4 described how to run batch inference. |
| 25 | + |
| 26 | +In this tutorial, you'll |
| 27 | + |
| 28 | +> [!div class="checklist"] |
| 29 | +> * Set up an Azure Cache for Redis |
| 30 | +> * Attach a cache to a feature store as the online materialization store, and grant the necessary permissions. |
| 31 | +> * Materialize a feature set to the online store. |
| 32 | +> * Test an online deployment with mock data. |
| 33 | +
|
| 34 | +## Prerequisites |
| 35 | + |
| 36 | +> [!NOTE] |
| 37 | +> This tutorial uses Azure Machine Learning notebook with **Serverless Spark Compute**. |
| 38 | +
|
| 39 | +* Make sure you complete parts 1 through 4 of this tutorial series. This tutorial reuses the feature store and other resources created in the earlier tutorials. |
| 40 | + |
| 41 | +## Set up |
| 42 | + |
| 43 | +This tutorial uses the Python feature store core SDK (`azureml-featurestore`). The Python SDK is used for create, read, update, and delete (CRUD) operations, on feature stores, feature sets, and feature store entities. |
| 44 | + |
| 45 | +You don't need to explicitly install these resources for this tutorial, because in the set-up instructions shown here, the `online.yaml` file covers them. |
| 46 | + |
| 47 | +To prepare the notebook environment for development: |
| 48 | + |
| 49 | +1. Clone the [azureml-examples](https://github.com/azure/azureml-examples) repository to your local GitHub resources with this command: |
| 50 | + |
| 51 | + `git clone --depth 1 https://github.com/Azure/azureml-examples` |
| 52 | + |
| 53 | + You can also download a zip file from the [azureml-examples](https://github.com/azure/azureml-examples) repository. At this page, first select the `code` dropdown, and then select `Download ZIP`. Then, unzip the contents into a folder on your local device. |
| 54 | + |
| 55 | +1. Upload the feature store samples directory to the project workspace |
| 56 | + |
| 57 | + 1. In the Azure Machine Learning workspace, open the Azure Machine Learning studio UI. |
| 58 | + 1. Select **Notebooks** in left navigation panel. |
| 59 | + 1. Select your user name in the directory listing. |
| 60 | + 1. Select ellipses (**...**) and then select **Upload folder**. |
| 61 | + 1. Select the feature store samples folder from the cloned directory path: `azureml-examples/sdk/python/featurestore-sample`. |
| 62 | + |
| 63 | +1. Run the tutorial |
| 64 | + |
| 65 | + * Option 1: Create a new notebook, and execute the instructions in this document, step by step. |
| 66 | + * Option 2: Open existing notebook `featurestore_sample/notebooks/sdk_only/5. Enable online store and run online inference.ipynb`. You may keep this document open and refer to it for more explanation and documentation links. |
| 67 | + |
| 68 | + 1. Select **Serverless Spark Compute** in the top navigation **Compute** dropdown. This operation might take one to two minutes. Wait for a status bar in the top to display **Configure session**. |
| 69 | + 1. Select **Configure session** in the top status bar. |
| 70 | + 1. Select **Python packages**. |
| 71 | + 1. Select **Upload conda file**. |
| 72 | + 1. Select file `azureml-examples/sdk/python/featurestore-sample/project/env/online.yml` located on your local device. |
| 73 | + 1. (Optional) Increase the session time-out (idle time in minutes) to reduce the serverless spark cluster startup time. |
| 74 | + |
| 75 | +1. This code cell starts the Spark session. It needs about 10 minutes to install all dependencies and start the Spark session. |
| 76 | + |
| 77 | + [!notebook-python[] (~/azureml-examples-main/sdk/python/featurestore_sample/notebooks/sdk_only/5. Enable online store and run online inference.ipynb?name=start-spark-session)] |
| 78 | + |
| 79 | +1. Set up the root directory for the samples |
| 80 | + |
| 81 | + [!notebook-python[] (~/azureml-examples-main/sdk/python/featurestore_sample/notebooks/sdk_only/5. Enable online store and run online inference.ipynb?name=root-dir)] |
| 82 | + |
| 83 | +1. Initialize the `MLClient` for the project workspace, where this tutorial notebook runs. The `MLClient` is used for the create, read, update, and delete (CRUD) operations. |
| 84 | + |
| 85 | + [!notebook-python[] (~/azureml-examples-main/sdk/python/featurestore_sample/notebooks/sdk_only/5. Enable online store and run online inference.ipynb?name=init-prj-ws-client)] |
| 86 | + |
| 87 | +1. Initialize the `MLClient` for the feature store workspace, for the create, read, update, and delete (CRUD) operations on the feature store workspace. |
| 88 | + |
| 89 | + [!notebook-python[] (~/azureml-examples-main/sdk/python/featurestore_sample/notebooks/sdk_only/5. Enable online store and run online inference.ipynb?name=init-fs-ws-client)] |
| 90 | + |
| 91 | + > [!NOTE] |
| 92 | + > A **feature store workspace** supports feature reuse across projects. A **project workspace** - the current workspace in use - leverages features from a specific feature store, to train and inference models. Many project workspaces can share and reuse the same feature store workspace. |
| 93 | +
|
| 94 | +1. As mentioned earlier, this tutorial uses the Python feature store core SDK (`azureml-featurestore`). This initialized SDK client is used for create, read, update, and delete (CRUD) operations, on feature stores, feature sets, and feature store entities. |
| 95 | + |
| 96 | + [!notebook-python[] (~/azureml-examples-main/sdk/python/featurestore_sample/notebooks/sdk_only/5. Enable online store and run online inference.ipynb?name=init-fs-core-sdk)] |
| 97 | + |
| 98 | +## Prepare Azure cache for Redis |
| 99 | + |
| 100 | +This tutorial uses Azure Cache for Redis as the online materialization store. You can create a new Redis instance, or reuse an existing instance. |
| 101 | + |
| 102 | +1. Set values for the Azure Cache for Redis resource, to use as online materialization store. In this code cell, define the name of the Azure Cache for Redis resource to create or reuse. You can override other default settings. |
| 103 | + |
| 104 | + [!notebook-python[] (~/azureml-examples-main/sdk/python/featurestore_sample/notebooks/sdk_only/5. Enable online store and run online inference.ipynb?name=redis-settings)] |
| 105 | + |
| 106 | +1. You can create a new Redis instance. You would select the Redis cache tier (basic, standard, premium, or enterprise). Choose an SKU family available for the cache tier you select. For more information about tiers and cache performance, see [this resource](https://learn.microsoft.com/azure/azure-cache-for-redis/cache-best-practices-performance). For more information about SKU tiers and Azure cache families, see [this resource](https://azure.microsoft.com/en-us/pricing/details/cache/). |
| 107 | + |
| 108 | +Execute this code cell to create an Azure Cache for Redis with premium tier, SKU family `P`, and cache capacity 2. It may take from five to 10 minutes to prepare the Redis instance. |
| 109 | + |
| 110 | + [!notebook-python[] (~/azureml-examples-main/sdk/python/featurestore_sample/notebooks/sdk_only/5. Enable online store and run online inference.ipynb?name=provision-redis)] |
| 111 | + |
| 112 | +1. Additionally, this code cell reuses an existing Redis instance with the previously defined name. |
| 113 | + |
| 114 | + [!notebook-python[] (~/azureml-examples-main/sdk/python/featurestore_sample/notebooks/sdk_only/5. Enable online store and run online inference.ipynb?name=reuse-redis)] |
| 115 | + |
| 116 | +1. Retrieve the user-assigned managed identity (UAI) used that the feature store used for materialization. This code cell retrieves the principal ID, client ID, and ARM ID property values for the UAI used by the feature store for data materialization. |
| 117 | + |
| 118 | + [!notebook-python[] (~/azureml-examples-main/sdk/python/featurestore_sample/notebooks/sdk_only/5. Enable online store and run online inference.ipynb?name=retrieve-uai)] |
| 119 | + |
| 120 | +1. Grant the `Contributor` role to the UAI on the Azure Cache for Redis. This role is required to write data into Redis during materialization. This code cell grants the `Contributor` role to the UAI on the Azure Cache for Redis. |
| 121 | + |
| 122 | + [!notebook-python[] (~/azureml-examples-main/sdk/python/featurestore_sample/notebooks/sdk_only/5. Enable online store and run online inference.ipynb?name=uai-redis-rbac)] |
| 123 | + |
| 124 | +## Attach online materialization store to the feature store |
| 125 | + |
| 126 | +The feature store needs the Azure Cache for Redis as an attached resource, for use as the online materialization store. This code cell handles that step. |
| 127 | + |
| 128 | + [!notebook-python[] (~/azureml-examples-main/sdk/python/featurestore_sample/notebooks/sdk_only/5. Enable online store and run online inference.ipynb?name=attach-online-store)] |
| 129 | + |
| 130 | +## Materialize the `accounts` feature set data to online store |
| 131 | + |
| 132 | +### Enable materialization on the `accounts` feature set |
| 133 | + |
| 134 | +Earlier in this tutorial series, you did **not** materialize the accounts feature set because it had precomputed features, and only batch inference scenarios used it. This code cell enables online materialization so that the features become available in the online store, with low latency access. For consistency, it also enables offline materialization. Enabling offline materialization is optional. |
| 135 | + |
| 136 | + [!notebook-python[] (~/azureml-examples-main/sdk/python/featurestore_sample/notebooks/sdk_only/5. Enable online store and run online inference.ipynb?name=enable-accounts-material)] |
| 137 | + |
| 138 | +### Backfill the `account` feature set |
| 139 | + |
| 140 | +The `backfill` command backfills data to all the materialization stores enabled for this feature set. Here offline and online materialization are both enabled. In this code cell, `backfill` proceeds on both offline and online materialization stores. |
| 141 | + |
| 142 | + [!notebook-python[] (~/azureml-examples-main/sdk/python/featurestore_sample/notebooks/sdk_only/5. Enable online store and run online inference.ipynb?name=start-accounts-backfill)] |
| 143 | + |
| 144 | +This code cell tracks completion of the backfill job. With the Azure Cache for Redis premium tier provisioned earlier, this step may take approximately 10 minutes to complete. |
| 145 | + |
| 146 | + [!notebook-python[] (~/azureml-examples-main/sdk/python/featurestore_sample/notebooks/sdk_only/5. Enable online store and run online inference.ipynb?name=track-accounts-backfill)] |
| 147 | + |
| 148 | +## Materialize `transactions` feature set data to the online store |
| 149 | + |
| 150 | +Earlier in this tutorial series, you materialized `transactions` feature set data to the offline materialization store. |
| 151 | + |
| 152 | +1. This code cell enables the `transactions` feature set online materialization. |
| 153 | + |
| 154 | + [!notebook-python[] (~/azureml-examples-main/sdk/python/featurestore_sample/notebooks/sdk_only/5. Enable online store and run online inference.ipynb?name=enable-transact-material)] |
| 155 | + |
| 156 | +1. This code cell backfills the data to both the online and offline materialization store, to ensure that both cells have the latest data. The recurrent materialization job, which we set up in tutorial 2 of this series, now materializes data to both online and offline materialization stores. |
| 157 | + |
| 158 | + [!notebook-python[] (~/azureml-examples-main/sdk/python/featurestore_sample/notebooks/sdk_only/5. Enable online store and run online inference.ipynb?name=start-transact-material)] |
| 159 | + |
| 160 | + This code cell tracks completion of the backfill job. Using the premium tier Azure Cache for Redis provisioned earlier, this step may take approximately 3-4 minutes to complete. |
| 161 | + |
| 162 | + [!notebook-python[] (~/azureml-examples-main/sdk/python/featurestore_sample/notebooks/sdk_only/5. Enable online store and run online inference.ipynb?name=track-transact-material)] |
| 163 | + |
| 164 | +## Test locally |
| 165 | + |
| 166 | +Use your development environment to look up features from the online materialization store. This notebook can serve as a valid development environment. |
| 167 | + |
| 168 | + This code cell parses the list of features from the existing feature retrieval specification. |
| 169 | + |
| 170 | + [!notebook-python[] (~/azureml-examples-main/sdk/python/featurestore_sample/notebooks/sdk_only/5. Enable online store and run online inference.ipynb?name=parse-feat-list)] |
| 171 | + |
| 172 | + This code retrieves feature values from the online materialization store. |
| 173 | + |
| 174 | + [!notebook-python[] (~/azureml-examples-main/sdk/python/featurestore_sample/notebooks/sdk_only/5. Enable online store and run online inference.ipynb?name=init-online-lookup)] |
| 175 | + |
| 176 | +Prepare some observation data for testing, and use that data to look up features from the online materialization store. During the online look-up, the keys (`accountID`) defined in the observation sample data might not exist in the Redis (due to `TTL`). In this case: |
| 177 | + |
| 178 | +1. Open the Azure portal. |
| 179 | +1. Navigate to the Redis instance. |
| 180 | +1. Open the console for the Redis instance, and check for existing keys with the `KEYS *` command. |
| 181 | +1. Replace the `accountID` values in the sample observation data with the existing keys. |
| 182 | + |
| 183 | + [!notebook-python[] (~/azureml-examples-main/sdk/python/featurestore_sample/notebooks/sdk_only/5. Enable online store and run online inference.ipynb?name=online-feat-loockup)] |
| 184 | + |
| 185 | +These steps looked up features from the online store. In the next step, you'll test online features using an Azure Machine Learning managed online endpoint. |
| 186 | + |
| 187 | +## Test online features from Azure Machine Learning managed online endpoint |
| 188 | + |
| 189 | +A managed online endpoint deploys and scores models for online/realtime inference. You can use any available inference technology - like Kubernetes, for example. |
| 190 | + |
| 191 | +This step involves these actions: |
| 192 | + |
| 193 | +1. Create an Azure Machine Learning managed online endpoint. |
| 194 | +1. Grant required role-based access control (RBAC) permissions. |
| 195 | +1. Deploy the model that you trained in the tutorial 3 of this tutorial series. The scoring script used in this step has the code to look up online features. |
| 196 | +1. Score the model with sample data. The endpoint looked up the online features, and that the model scoring completed successfully. |
| 197 | + |
| 198 | +### Create Azure Machine Learning managed online endpoint |
| 199 | + |
| 200 | +Visit [this resource](https://learn.microsoft.com/azure/machine-learning/how-to-deploy-online-endpoints?view=azureml-api-2&tabs=azure-cli) to learn more about managed online endpoints. With the managed feature store API, you can also look up online features from other inference platforms. |
| 201 | + |
| 202 | +This code cell defines the `fraud-model` managed online endpoint. |
| 203 | + |
| 204 | + [!notebook-python[] (~/azureml-examples-main/sdk/python/featurestore_sample/notebooks/sdk_only/5. Enable online store and run online inference.ipynb?name=define-endpoint)] |
| 205 | + |
| 206 | +This code cell creates the managed online endpoint defined in the previous code cell. |
| 207 | + |
| 208 | + [!notebook-python[] (~/azureml-examples-main/sdk/python/featurestore_sample/notebooks/sdk_only/5. Enable online store and run online inference.ipynb?name=create-endpoint)] |
| 209 | + |
| 210 | +### Grant required RBAC permissions |
| 211 | + |
| 212 | +Here, you grant required RBAC permissions to the managed online endpoint on the Redis instance and feature store. The scoring code in the model deployment needs these RBAC permissions to successfully look up features from the online store with the managed feature store API. |
| 213 | + |
| 214 | +#### Get managed identity of the managed online endpoint |
| 215 | + |
| 216 | +This code cell retrieves the managed identity of the managed online endpoint: |
| 217 | + |
| 218 | + [!notebook-python[] (~/azureml-examples-main/sdk/python/featurestore_sample/notebooks/sdk_only/5. Enable online store and run online inference.ipynb?name=get-endpoint-identity)] |
| 219 | + |
| 220 | +#### Grant the `Contributor` role to the online endpoint managed identity on the Azure Cache for Redis |
| 221 | + |
| 222 | +This code cell grants the `Contributor` role to the online endpoint managed identity on the Redis instance. This RBAC permission is needed to materialize data into the Redis online store. |
| 223 | + |
| 224 | + [!notebook-python[] (~/azureml-examples-main/sdk/python/featurestore_sample/notebooks/sdk_only/5. Enable online store and run online inference.ipynb?name=endpoint-redis-rbac)] |
| 225 | + |
| 226 | +#### Grant `AzureML Data Scientist` role to the online endpoint managed identity on the feature store |
| 227 | + |
| 228 | +This code cell grants the `AzureML Data Scientist` role to the online endpoint managed identity on the feature store. This RBAC permission is required for successful deployment of the model to the online endpoint. |
| 229 | + |
| 230 | + [!notebook-python[] (~/azureml-examples-main/sdk/python/featurestore_sample/notebooks/sdk_only/5. Enable online store and run online inference.ipynb?name=endpoint-fs-rbac)] |
| 231 | + |
| 232 | +#### Deploy the model to the online endpoint |
| 233 | + |
| 234 | +Review the scoring script `project/fraud_model/online_inference/src/scoring.py`. The scoring script |
| 235 | + |
| 236 | +1. Loads the feature metadata from the feature retrieval specification packaged with the model during model training. Tutorial 3 of this tutorial series covered this task. The specification has features from both the `transactions` and `accounts` feature sets. |
| 237 | +1. Looks up the online features using the index keys from the request, when an input inference request is received. In this case, for both feature sets, the index column is `accountID`. |
| 238 | +1. Passes the features to the model to perform the inference, and returns the response. The response is a boolean value that represents the variable `is_fraud`. |
| 239 | + |
| 240 | +Next, execute this code cell to create a managed online deployment definition for model deployment. |
| 241 | + |
| 242 | + [!notebook-python[] (~/azureml-examples-main/sdk/python/featurestore_sample/notebooks/sdk_only/5. Enable online store and run online inference.ipynb?name=define-online-deployment)] |
| 243 | + |
| 244 | +Deploy the model to online endpoint with this code cell. The deployment may need four to five minutes. |
| 245 | + |
| 246 | + [!notebook-python[] (~/azureml-examples-main/sdk/python/featurestore_sample/notebooks/sdk_only/5. Enable online store and run online inference.ipynb?name=begin-online-deployment)] |
| 247 | + |
| 248 | +### Test online deployment with mock data |
| 249 | + |
| 250 | +Execute this code cell to test the online deployment with the mock data. You should see `0` or `1` as the output of this cell. |
| 251 | + |
| 252 | + [!notebook-python[] (~/azureml-examples-main/sdk/python/featurestore_sample/notebooks/sdk_only/5. Enable online store and run online inference.ipynb?name=test-online-deployment)] |
| 253 | + |
| 254 | +## Next steps |
| 255 | + |
| 256 | +* [Network isolation with feature store (preview)](./tutorial-network-isolation-for-feature-store.md) |
| 257 | +* [Azure Machine Learning feature stores samples repository](https://github.com/Azure/azureml-examples/tree/main/sdk/python/featurestore_sample) |
0 commit comments