Skip to content

Commit 1f7dea7

Browse files
Merge pull request #568 from fbsolo-ms1/document-freshness-maintenance
Freshness update for tutorial-experiment-train-models-using-features.md . . .
2 parents 6db1df0 + faf89d5 commit 1f7dea7

File tree

1 file changed

+34
-34
lines changed

1 file changed

+34
-34
lines changed

articles/machine-learning/tutorial-experiment-train-models-using-features.md

Lines changed: 34 additions & 34 deletions
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ ms.subservice: core
99
ms.topic: tutorial
1010
author: fbsolo-ms1
1111
ms.author: franksolomon
12-
ms.date: 10/27/2023
12+
ms.date: 09/30/2024
1313
ms.reviewer: seramasu
1414
ms.custom: sdkv2, build-2023, ignite-2023, update-code
1515
#Customer intent: As a professional data scientist, I want to know how to build and deploy a model with Azure Machine Learning by using Python in a Jupyter Notebook.
@@ -19,12 +19,12 @@ ms.custom: sdkv2, build-2023, ignite-2023, update-code
1919

2020
This tutorial series shows how features seamlessly integrate all phases of the machine learning lifecycle: prototyping, training, and operationalization.
2121

22-
The first tutorial showed how to create a feature set specification with custom transformations, and then use that feature set to generate training data, enable materialization, and perform a backfill. This tutorial shows how to enable materialization, and perform a backfill. It also shows how to experiment with features, as a way to improve model performance.
22+
The first tutorial showed how to create a feature set specification with custom transformations. Then, it showed how to use that feature set to generate training data, enable materialization, and perform a backfill. This tutorial shows how to enable materialization and perform a backfill. It also shows how to experiment with features, as a way to improve model performance.
2323

2424
In this tutorial, you learn how to:
2525

2626
> [!div class="checklist"]
27-
> * Prototype a new `accounts` feature set specification, by using existing precomputed values as features. Then, register the local feature set specification as a feature set in the feature store. This process differs from the first tutorial, where you created a feature set that had custom transformations.
27+
> * Prototype a new `accounts` feature set specification, through use of existing precomputed values as features. Then, register the local feature set specification as a feature set in the feature store. This process differs from the first tutorial, where you created a feature set that had custom transformations.
2828
> * Select features for the model from the `transactions` and `accounts` feature sets, and save them as a feature retrieval specification.
2929
> * Run a training pipeline that uses the feature retrieval specification to train a new model. This pipeline uses the built-in feature retrieval component to generate the training data.
3030
@@ -40,22 +40,22 @@ Before you proceed with this tutorial, be sure to complete the first tutorial in
4040

4141
1. On the top menu, in the **Compute** dropdown list, select **Serverless Spark Compute** under **Azure Machine Learning Serverless Spark**.
4242

43-
2. Configure the session:
43+
1. Configure the session:
4444

4545
1. When the toolbar displays **Configure session**, select it.
46-
2. On the **Python packages** tab, select **Upload Conda file**.
47-
3. Upload the *conda.yml* file that you [uploaded in the first tutorial](./tutorial-get-started-with-feature-store.md#prepare-the-notebook-environment).
48-
4. Optionally, increase the session time-out (idle time) to avoid frequent prerequisite reruns.
46+
1. On the **Python packages** tab, select **Upload Conda file**.
47+
1. Upload the *conda.yml* file that you [uploaded in the first tutorial](./tutorial-get-started-with-feature-store.md#prepare-the-notebook-environment).
48+
1. As an option, you can increase the session time-out (idle time) to avoid frequent prerequisite reruns.
4949

50-
2. Start the Spark session.
50+
1. Start the Spark session.
5151

5252
[!notebook-python[] (~/azureml-examples-main/sdk/python/featurestore_sample/notebooks/sdk_only/2.Experiment-train-models-using-features.ipynb?name=start-spark-session)]
5353

54-
3. Set up the root directory for the samples.
54+
1. Set up the root directory for the samples.
5555

5656
[!notebook-python[] (~/azureml-examples-main/sdk/python/featurestore_sample/notebooks/sdk_only/2.Experiment-train-models-using-features.ipynb?name=root-dir)]
5757

58-
4. Set up the CLI.
58+
1. Set up the CLI.
5959
### [Python SDK](#tab/python)
6060

6161
Not applicable.
@@ -66,33 +66,33 @@ Before you proceed with this tutorial, be sure to complete the first tutorial in
6666

6767
[!notebook-python[] (~/azureml-examples-main/sdk/python/featurestore_sample/notebooks/sdk_and_cli/2.Experiment-train-models-using-features.ipynb?name=install-ml-ext-cli)]
6868

69-
2. Authenticate.
69+
1. Authenticate.
7070

7171
[!notebook-python[] (~/azureml-examples-main/sdk/python/featurestore_sample/notebooks/sdk_and_cli/2.Experiment-train-models-using-features.ipynb?name=auth-cli)]
7272

73-
3. Set the default subscription.
73+
1. Set the default subscription.
7474

7575
[!notebook-python[] (~/azureml-examples-main/sdk/python/featurestore_sample/notebooks/sdk_and_cli/2.Experiment-train-models-using-features.ipynb?name=set-default-subs-cli)]
7676

7777
---
7878

79-
5. Initialize the project workspace variables.
79+
1. Initialize the project workspace variables.
8080

8181
This is the current workspace, and the tutorial notebook runs in this resource.
8282

8383
[!notebook-python[] (~/azureml-examples-main/sdk/python/featurestore_sample/notebooks/sdk_only/2.Experiment-train-models-using-features.ipynb?name=init-ws-crud-client)]
8484

85-
6. Initialize the feature store variables.
85+
1. Initialize the feature store variables.
8686

87-
Be sure to update the `featurestore_name` and `featurestore_location` values to reflect what you created in the first tutorial.
87+
Be sure to update the `featurestore_name` and `featurestore_location` values, to reflect what you created in the first tutorial.
8888

8989
[!notebook-python[] (~/azureml-examples-main/sdk/python/featurestore_sample/notebooks/sdk_only/2.Experiment-train-models-using-features.ipynb?name=init-fs-crud-client)]
9090

91-
7. Initialize the feature store consumption client.
91+
1. Initialize the feature store consumption client.
9292

9393
[!notebook-python[] (~/azureml-examples-main/sdk/python/featurestore_sample/notebooks/sdk_only/2.Experiment-train-models-using-features.ipynb?name=init-fs-core-sdk)]
9494

95-
8. Create a compute cluster named `cpu-cluster` in the project workspace.
95+
1. Create a compute cluster named `cpu-cluster` in the project workspace.
9696

9797
You need this compute cluster when you run the training/batch inference jobs.
9898

@@ -104,12 +104,12 @@ In the first tutorial, you created a `transactions` feature set that had custom
104104

105105
To onboard precomputed features, you can create a feature set specification without writing any transformation code. You use a feature set specification to develop and test a feature set in a fully local development environment.
106106

107-
You don't need to connect to a feature store. In this procedure, you create the feature set specification locally, and then sample the values from it. For capabilities of managed feature store, you must use a feature asset definition to register the feature set specification with a feature store. Later steps in this tutorial provide more details.
107+
You don't need to connect to a feature store. In this procedure, you create the feature set specification locally, and then sample the values from it. To benefit from the capabilities of managed feature store, you must use a feature asset definition to register the feature set specification with a feature store. Later steps in this tutorial provide more details.
108108

109109
1. Explore the source data for the accounts.
110110

111111
> [!NOTE]
112-
> This notebook uses sample data hosted in a publicly accessible blob container. Only a `wasbs` driver can read it in Spark. When you create feature sets by using your own source data, host those feature sets in an Azure Data Lake Storage Gen2 account, and use an `abfss` driver in the data path.
112+
> This notebook uses sample data hosted in a publicly accessible blob container. Only a `wasbs` driver can read it in Spark. When you create feature sets through use of your own source data, host those feature sets in an Azure Data Lake Storage Gen2 account, and use an `abfss` driver in the data path.
113113
114114
[!notebook-python[] (~/azureml-examples-main/sdk/python/featurestore_sample/notebooks/sdk_only/2.Experiment-train-models-using-features.ipynb?name=explore-accts-fset-src-data)]
115115

@@ -133,7 +133,7 @@ You don't need to connect to a feature store. In this procedure, you create the
133133

134134
- `index_columns`: The join keys required to access values from the feature set.
135135

136-
To learn more, see [Understanding top-level entities in managed feature store](./concept-top-level-entities-in-managed-feature-store.md) and the [CLI (v2) feature set specification YAML schema](./reference-yaml-featureset-spec.md).
136+
To learn more, visit the [Understanding top-level entities in managed feature store](./concept-top-level-entities-in-managed-feature-store.md) and the [CLI (v2) feature set specification YAML schema](./reference-yaml-featureset-spec.md) resources.
137137

138138
As an extra benefit, persisting supports source control.
139139

@@ -143,7 +143,7 @@ You don't need to connect to a feature store. In this procedure, you create the
143143

144144
## Locally experiment with unregistered features and register with feature store when ready
145145

146-
As you develop features, you might want to locally test and validate them before you register them with the feature store or run training pipelines in the cloud. A combination of a local unregistered feature set (`accounts`) and a feature set registered in the feature store (`transactions`) generates training data for the machine learning model.
146+
As you develop features, you might want to locally test and validate them, before you register them with the feature store or run training pipelines in the cloud. A combination of a local unregistered feature set (`accounts`) and a feature set registered in the feature store (`transactions`) generates training data for the machine learning model.
147147

148148
1. Select features for the model.
149149

@@ -159,7 +159,7 @@ As you develop features, you might want to locally test and validate them before
159159

160160
1. Register the `accounts` feature set with the feature store.
161161

162-
After you locally experiment with feature definitions, and they seem reasonable, you can register a feature set asset definition with the feature store.
162+
After you locally experiment with feature definitions, and if they seem reasonable, you can register a feature set asset definition with the feature store.
163163

164164
[!notebook-python[] (~/azureml-examples-main/sdk/python/featurestore_sample/notebooks/sdk_only/2.Experiment-train-models-using-features.ipynb?name=reg-accts-fset)]
165165

@@ -169,7 +169,7 @@ As you develop features, you might want to locally test and validate them before
169169

170170
## Run a training experiment
171171

172-
In the following steps, you select a list of features, run a training pipeline, and register the model. You can repeat these steps until the model performs as you want.
172+
In these steps, you select a list of features, run a training pipeline, and register the model. You can repeat these steps until the model performs as you want.
173173

174174
1. Optionally, discover features from the feature store UI.
175175

@@ -187,19 +187,19 @@ In the following steps, you select a list of features, run a training pipeline,
187187

188188
1. Select features for the model, and export the model as a feature retrieval specification.
189189

190-
In the previous steps, you selected features from a combination of registered and unregistered feature sets, for local experimentation and testing. You can now experiment in the cloud. Your model-shipping agility increases if you save the selected features as a feature retrieval specification, and then use the specification in the machine learning operations (MLOps) or continuous integration and continuous delivery (CI/CD) flow for training and inference.
190+
In the previous steps, you selected features from a combination of registered and unregistered feature sets for local experimentation and testing. You can now experiment in the cloud. Your model-shipping agility increases if you save the selected features as a feature retrieval specification, and then use the specification in the machine learning operations (MLOps) or continuous integration and continuous delivery (CI/CD) flow for training and inference.
191191

192192
1. Select features for the model.
193193

194194
[!notebook-python[] (~/azureml-examples-main/sdk/python/featurestore_sample/notebooks/sdk_only/2.Experiment-train-models-using-features.ipynb?name=select-reg-features)]
195195

196-
2. Export the selected features as a feature retrieval specification.
196+
1. Export the selected features as a feature retrieval specification.
197197

198-
A feature retrieval specification is a portable definition of the feature list associated with a model. It can help streamline the development and operationalization of a machine learning model. It becomes an input to the training pipeline that generates the training data. It's then packaged with the model.
198+
A feature retrieval specification is a portable definition of the feature list associated with a model. It can help streamline the development and operationalization of a machine learning model. It becomes an input to the training pipeline that generates the training data. Then, it's packaged with the model.
199199

200200
The inference phase uses the feature retrieval to look up the features. It integrates all phases of the machine learning lifecycle. Changes to the training/inference pipeline can stay at a minimum as you experiment and deploy.
201201

202-
Use of the feature retrieval specification and the built-in feature retrieval component is optional. You can directly use the `get_offline_features()` API, as shown earlier. The name of the specification should be *feature_retrieval_spec.yaml* when it's packaged with the model. This way, the system can recognize it.
202+
Use of the feature retrieval specification and the built-in feature retrieval component is optional. You can directly use the `get_offline_features()` API, as shown earlier. The name of the specification should be *feature_retrieval_spec.yaml* when you package it with the model. This way, the system can recognize it.
203203

204204
[!notebook-python[] (~/azureml-examples-main/sdk/python/featurestore_sample/notebooks/sdk_only/2.Experiment-train-models-using-features.ipynb?name=export-as-frspec)]
205205

@@ -211,7 +211,7 @@ In this procedure, you manually trigger the training pipeline. In a production s
211211

212212
The training pipeline has these steps:
213213

214-
1. Feature retrieval: For its input, this built-in component takes the feature retrieval specification, the observation data, and the time-stamp column name. It then generates the training data as output. It runs these steps as a managed Spark job.
214+
1. Feature retrieval: For its input, this built-in component takes the feature retrieval specification, the observation data, and the **time-stamp** column name. It then generates the training data as output. It runs these steps as a managed Spark job.
215215

216216
1. Training: Based on the training data, this step trains the model and then generates a model (not yet registered).
217217

@@ -228,20 +228,20 @@ In this procedure, you manually trigger the training pipeline. In a production s
228228

229229
- To display the pipeline steps, select the hyperlink for the **Web View** pipeline, and open it in a new window.
230230

231-
2. Use the feature retrieval specification in the model artifacts:
231+
1. Use the feature retrieval specification in the model artifacts:
232232

233233
1. On the left pane of the current workspace, select **Models** with the right mouse button.
234-
2. Select **Open in a new tab or window**.
235-
3. Select **fraud_model**.
236-
4. Select **Artifacts**.
234+
1. Select **Open in a new tab or window**.
235+
1. Select **fraud_model**.
236+
1. Select **Artifacts**.
237237

238-
The feature retrieval specification is packaged along with the model. The model registration step in the training pipeline handled this step. You created the feature retrieval specification during experimentation. Now it's part of the model definition. In the next tutorial, you'll see how inferencing uses it.
238+
The feature retrieval specification is packaged along with the model. The model registration step in the training pipeline handled this step. You created the feature retrieval specification during experimentation. Now it's part of the model definition. In the next tutorial, you'll see how the inferencing process uses it.
239239

240240
## View the feature set and model dependencies
241241

242242
1. View the list of feature sets associated with the model.
243243

244-
On the same **Models** page, select the **Feature sets** tab. This tab shows both the `transactions` and `accounts` feature sets on which this model depends.
244+
On the same **Models** page, select the **Feature sets** tab. This tab shows both the `transactions` and `accounts` feature sets. This model depends on these feature sets.
245245

246246
1. View the list of models that use the feature sets:
247247

0 commit comments

Comments
 (0)