Skip to content

Commit 2ed2c9f

Browse files
Merge pull request #259276 from fbsolo-ms1/update-for-modified-notebooks
Update .MD files to match .IPYNB notebook updates . . .
2 parents e22f1c0 + bdf917e commit 2ed2c9f

4 files changed

+53
-28
lines changed

articles/machine-learning/tutorial-develop-feature-set-with-custom-source.md

Lines changed: 10 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ ms.subservice: core
99
ms.topic: tutorial
1010
author: ynpandey
1111
ms.author: yogipandey
12-
ms.date: 10/27/2023
12+
ms.date: 11/28/2023
1313
ms.reviewer: franksolomon
1414
ms.custom:
1515
- sdkv2
@@ -22,7 +22,7 @@ ms.custom:
2222

2323
An Azure Machine Learning managed feature store lets you discover, create, and operationalize features. Features serve as the connective tissue in the machine learning lifecycle, starting from the prototyping phase, where you experiment with various features. That lifecycle continues to the operationalization phase, where you deploy your models, and inference steps look up the feature data. For more information about feature stores, see [feature store concepts](./concept-what-is-managed-feature-store.md).
2424

25-
Part 1 of this tutorial series showed how to create a feature set specification with custom transformations, enable materialization and perform a backfill. Part 2 of this tutorial series showed how to experiment with features in the experimentation and training flows. Part 4 described how to run batch inference.
25+
Part 1 of this tutorial series showed how to create a feature set specification with custom transformations, enable materialization and perform a backfill. Part 2 showed how to experiment with features in the experimentation and training flows. Part 3 explained recurrent materialization for the `transactions` feature set, and showed how to run a batch inference pipeline on the registered model. Part 4 described how to run batch inference.
2626

2727
In this tutorial, you'll
2828

@@ -36,26 +36,27 @@ In this tutorial, you'll
3636
> [!NOTE]
3737
> This tutorial uses an Azure Machine Learning notebook with **Serverless Spark Compute**.
3838
39-
* Make sure you execute the notebook from Tutorial 1. That notebook includes creation of a feature store and a feature set, followed by enabling of materialization and performance of backfill.
39+
* Make sure you complete the previous tutorials in this series. This tutorial reuses feature store and other resources created in those earlier tutorials.
4040

4141
## Set up
4242

4343
This tutorial uses the Python feature store core SDK (`azureml-featurestore`). The Python SDK is used for create, read, update, and delete (CRUD) operations, on feature stores, feature sets, and feature store entities.
4444

4545
You don't need to explicitly install these resources for this tutorial, because in the set-up instructions shown here, the `conda.yml` file covers them.
4646

47-
### Configure the Azure Machine Learning Spark notebook.
47+
### Configure the Azure Machine Learning Spark notebook
4848

4949
You can create a new notebook and execute the instructions in this tutorial step by step. You can also open and run the existing notebook *featurestore_sample/notebooks/sdk_only/5. Develop a feature set with custom source.ipynb*. Keep this tutorial open and refer to it for documentation links and more explanation.
5050

5151
1. On the top menu, in the **Compute** dropdown list, select **Serverless Spark Compute** under **Azure Machine Learning Serverless Spark**.
5252

5353
2. Configure the session:
5454

55-
1. When the toolbar displays **Configure session**, select it.
56-
2. On the **Python packages** tab, select **Upload Conda file**.
57-
3. Upload the *conda.yml* file that you [uploaded in the first tutorial](./tutorial-get-started-with-feature-store.md#prepare-the-notebook-environment).
58-
4. Optionally, increase the session time-out (idle time) to avoid frequent prerequisite reruns.
55+
1. Select **Configure session** in the top status bar.
56+
2. Select the **Python packages** tab, s
57+
3. Select **Upload Conda file**.
58+
4. Upload the *conda.yml* file that you [uploaded in the first tutorial](./tutorial-get-started-with-feature-store.md#prepare-the-notebook-environment).
59+
5. Optionally, increase the session time-out (idle time) to avoid frequent prerequisite reruns.
5960

6061
## Set up the root directory for the samples
6162
This code cell sets up the root directory for the samples. It needs about 10 minutes to install all dependencies and start the Spark session.
@@ -159,4 +160,4 @@ If you created a resource group for the tutorial, you can delete that resource g
159160
## Next steps
160161

161162
* [Network isolation with feature store](./tutorial-network-isolation-for-feature-store.md)
162-
* [Azure Machine Learning feature stores samples repository](https://github.com/Azure/azureml-examples/tree/main/sdk/python/featurestore_sample)
163+
* [Azure Machine Learning feature stores samples repository](https://github.com/Azure/azureml-examples/tree/main/sdk/python/featurestore_sample)

articles/machine-learning/tutorial-enable-recurrent-materialization-run-batch-inference.md

Lines changed: 7 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ ms.subservice: core
99
ms.topic: tutorial
1010
author: rsethur
1111
ms.author: seramasu
12-
ms.date: 10/27/2023
12+
ms.date: 11/28/2023
1313
ms.reviewer: franksolomon
1414
ms.custom: sdkv2, build-2023, ignite-2023
1515
#Customer intent: As a professional data scientist, I want to know how to build and deploy a model with Azure Machine Learning by using Python in a Jupyter Notebook.
@@ -37,14 +37,15 @@ Before you proceed with this tutorial, be sure to complete the first and second
3737

3838
To run this tutorial, you can create a new notebook and execute the instructions step by step. You can also open and run the existing notebook named *3. Enable recurrent materialization and run batch inference*. You can find that notebook, and all the notebooks in this series, in the *featurestore_sample/notebooks* directory. You can choose *sdk_only* or *sdk_and_cli*. Keep this tutorial open and refer to it for documentation links and more explanation.
3939

40-
1. On the top menu, in the **Compute** dropdown list, select **Serverless Spark Compute** under **Azure Machine Learning Serverless Spark**.
40+
1. In the **Compute** dropdown list in the top nav, select **Serverless Spark Compute** under **Azure Machine Learning Serverless Spark**.
4141

4242
2. Configure the session:
4343

44-
1. When the toolbar displays **Configure session**, select it.
45-
2. On the **Python packages** tab, select **Upload conda file**.
46-
3. Upload the *conda.yml* file that you [uploaded in the first tutorial](./tutorial-get-started-with-feature-store.md#prepare-the-notebook-environment).
47-
4. Optionally, increase the session time-out (idle time) to avoid frequent prerequisite reruns.
44+
1. Select **Configure session** in the top status bar.
45+
2. Select the **Python packages** tab.
46+
3. Select **Upload conda file**.
47+
4. Select the `azureml-examples/sdk/python/featurestore-sample/project/env/online.yml` file from your local machine.
48+
5. Optionally, increase the session time-out (idle time) to avoid frequent prerequisite reruns.
4849

4950
2. Start the Spark session.
5051

articles/machine-learning/tutorial-get-started-with-feature-store.md

Lines changed: 30 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ ms.subservice: core
88
ms.topic: tutorial
99
author: rsethur
1010
ms.author: seramasu
11-
ms.date: 11/01/2023
11+
ms.date: 11/28/2023
1212
ms.reviewer: franksolomon
1313
ms.custom: sdkv2, build-2023, ignite-2023
1414
#Customer intent: As a professional data scientist, I want to know how to build and deploy a model with Azure Machine Learning by using Python in a Jupyter Notebook.
@@ -45,7 +45,7 @@ Before you proceed with this tutorial, be sure to cover these prerequisites:
4545

4646
* An Azure Machine Learning workspace. For more information about workspace creation, see [Quickstart: Create workspace resources](./quickstart-create-resources.md).
4747

48-
* On your user account, the Owner or Contributor role for the resource group where the feature store is created.
48+
* On your user account, the Owner role for the resource group where the feature store is created.
4949

5050
If you choose to use a new resource group for this tutorial, you can easily delete all the resources by deleting the resource group.
5151

@@ -59,18 +59,25 @@ This tutorial uses an Azure Machine Learning Spark notebook for development.
5959

6060
:::image type="content" source="media/tutorial-get-started-with-feature-store/clone-featurestore-example-notebooks.png" lightbox="media/tutorial-get-started-with-feature-store/clone-featurestore-example-notebooks.png" alt-text="Screenshot that shows selection of the sample directory in Azure Machine Learning studio.":::
6161

62-
1. The **Select target directory** panel opens. Select the user directory (in this case, **testUser**), and then select **Clone**.
62+
1. The **Select target directory** panel opens. Select the **Users** directory, then select _your user name_, and finally select **Clone**.
6363

6464
:::image type="content" source="media/tutorial-get-started-with-feature-store/select-target-directory.png" lightbox="media/tutorial-get-started-with-feature-store/select-target-directory.png" alt-text="Screenshot showing selection of the target directory location in Azure Machine Learning studio for the sample resource.":::
6565

6666
1. To configure the notebook environment, you must upload the *conda.yml* file:
6767

6868
1. Select **Notebooks** on the left pane, and then select the **Files** tab.
69-
1. Browse to the *env* directory (select **Users** > **testUser** > **featurestore_sample** > **project** > **env**), and then select the *conda.yml* file. In this path, *testUser* is the user directory.
69+
1. Browse to the *env* directory (select **Users** > **your_user_name** > **featurestore_sample** > **project** > **env**), and then select the *conda.yml* file.
7070
1. Select **Download**.
7171

7272
:::image type="content" source="media/tutorial-get-started-with-feature-store/download-conda-file.png" lightbox="media/tutorial-get-started-with-feature-store/download-conda-file.png" alt-text="Screenshot that shows selection of the Conda YAML file in Azure Machine Learning studio.":::
7373

74+
1. Select **Serverless Spark Compute** in the top navigation **Compute** dropdown. This operation might take one to two minutes. Wait for a status bar in the top to display **Configure session**.
75+
1. Select **Configure session** in the top status bar.
76+
1. Select **Python packages**.
77+
1. Select **Upload conda files**.
78+
1. Select the `conda.yml` file you downloaded on your local device.
79+
1. (Optional) Increase the session time-out (idle time in minutes) to reduce the serverless spark cluster startup time.
80+
7481
1. In the Azure Machine Learning environment, open the notebook, and then select **Configure session**.
7582

7683
:::image type="content" source="media/tutorial-get-started-with-feature-store/open-configure-session.png" lightbox="media/tutorial-get-started-with-feature-store/open-configure-session.png" alt-text="Screenshot that shows selections for configuring a session for a notebook.":::
@@ -104,7 +111,7 @@ Not applicable.
104111

105112
### [SDK and CLI track](#tab/SDK-and-CLI-track)
106113

107-
1. Install the Azure Machine Learning extension.
114+
1. Install the Azure Machine Learning CLI extension.
108115

109116
[!notebook-python[] (~/azureml-examples-main/sdk/python/featurestore_sample/notebooks/sdk_and_cli/1. Develop a feature set and register with managed feature store.ipynb?name=install-ml-ext-cli)]
110117

@@ -331,7 +338,7 @@ As a best practice, entities help enforce use of the same join key definition ac
331338

332339
1. Initialize the feature store CRUD client.
333340

334-
As explained earlier in this tutorial, `MLClient` is used for creating, reading, updating, and deleting a feature store asset. The notebook code cell sample shown here searches for the feature store that you created in an earlier step. Here, you can't reuse the same `ml_client` value that you used earlier in this tutorial, because it is scoped at the resource group level. Proper scoping is a prerequisite for feature store creation.
341+
As explained earlier in this tutorial, `MLClient` is used for creating, reading, updating, and deleting a feature store asset. The notebook code cell sample shown here searches for the feature store that you created in an earlier step. Here, you can't reuse the same `ml_client` value that you used earlier in this tutorial, because it's scoped at the resource group level. Proper scoping is a prerequisite for feature store creation.
335342

336343
In this code sample, the client is scoped at feature store level.
337344

@@ -433,10 +440,26 @@ The Storage Blob Data Reader role must be assigned to your user account on the o
433440

434441
### [SDK track](#tab/SDK-track)
435442

443+
#### Set spark.sql.shuffle.partitions in the yaml file according to the feature data size
444+
445+
The spark configuration `spark.sql.shuffle.partitions` is an OPTIONAL parameter that can affect the number of parquet files generated (per day) when the feature set is materialized into the offline store. The default value of this parameter is 200. As best practice, avoid generation of many small parquet files. If offline feature retrieval becomes slow after feature set materialization, go to the corresponding folder in the offline store to check whether the issue involves too many small parquet files (per day), and adjust the value of this parameter accordingly.
446+
447+
> [!NOTE]
448+
> The sample data used in this notebook is small. Therefore, this parameter is set to 1 in the
449+
> featureset_asset_offline_enabled.yaml file.
450+
436451
[!notebook-python[] (~/azureml-examples-main/sdk/python/featurestore_sample/notebooks/sdk_only/1. Develop a feature set and register with managed feature store.ipynb?name=enable-offline-mat-txns-fset)]
437452

438453
### [SDK and CLI track](#tab/SDK-and-CLI-track)
439454

455+
#### Set spark.sql.shuffle.partitions in the yaml file according to the feature data size
456+
457+
The spark configuration `spark.sql.shuffle.partitions` is an OPTIONAL parameter that can affect the number of parquet files generated (per day) when the feature set is materialized into the offline store. The default value of this parameter is 200. As best practice, avoid generation of many small parquet files. If offline feature retrieval becomes slow after feature set materialization, go to the corresponding folder in the offline store to check whether the issue involves too many small parquet files (per day), and adjust the value of this parameter accordingly.
458+
459+
> [!NOTE]
460+
> The sample data used in this notebook is small. Therefore, this parameter is set to 1 in the
461+
> featureset_asset_offline_enabled.yaml file.
462+
440463
[!notebook-python[] (~/azureml-examples-main/sdk/python/featurestore_sample/notebooks/sdk_and_cli/1. Develop a feature set and register with managed feature store.ipynb?name=enable-offline-mat-txns-fset-cli)]
441464

442465
---
@@ -503,7 +526,7 @@ You can explore feature materialization status for a feature set in the **Materi
503526
- The data can have a maximum of 2,000 *data intervals*. If your data contains more than 2,000 *data intervals*, create a new feature set version.
504527
- You can provide a list of more than one data statuses (for example, `["None", "Incomplete"]`) in a single backfill job.
505528
- During backfill, a new materialization job is submitted for each *data interval* that falls within the defined feature window.
506-
- If a materialization job is pending, or it is running for a *data interval* that hasn't yet been backfilled, a new job isn't submitted for that *data interval*.
529+
- If a materialization job is pending, or that job is running for a *data interval* that hasn't yet been backfilled, a new job isn't submitted for that *data interval*.
507530
- You can retry a failed materialization job.
508531

509532
> [!NOTE]

articles/machine-learning/tutorial-online-materialization-inference.md

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ ms.subservice: core
99
ms.topic: tutorial
1010
author: ynpandey
1111
ms.author: yogipandey
12-
ms.date: 10/27/2023
12+
ms.date: 11/28/2023
1313
ms.reviewer: franksolomon
1414
ms.custom: sdkv2, ignite-2023
1515
#Customer intent: As a professional data scientist, I want to know how to build and deploy a model with Azure Machine Learning by using Python in a Jupyter Notebook.
@@ -46,14 +46,14 @@ You don't need to explicitly install these resources for this tutorial, because
4646

4747
You can create a new notebook and execute the instructions in this tutorial step by step. You can also open and run the existing notebook *featurestore_sample/notebooks/sdk_only/4. Enable online store and run online inference.ipynb*. Keep this tutorial open and refer to it for documentation links and more explanation.
4848

49-
1. On the top menu, in the **Compute** dropdown list, select **Serverless Spark Compute** under **Azure Machine Learning Serverless Spark**.
49+
1. In the **Compute** dropdown list in the top nav, select **Serverless Spark Compute**.
5050

5151
2. Configure the session:
5252

53-
1. Download *featurestore-sample/project/env/online.yml* file to your local machine.
54-
2. When the toolbar displays **Configure session**, select it.
55-
3. On the **Python packages** tab, select **Upload Conda file**.
56-
4. Upload the *online.yml* file in the same way as described in [uploading *conda.yml* file in the first tutorial](./tutorial-get-started-with-feature-store.md#prepare-the-notebook-environment).
53+
1. Download *azureml-examples/sdk/python/featurestore-sample/project/env/online.yml* file to your local machine.
54+
2. In **configure session** in the top nav, select **Python packages**
55+
3. Select **Upload Conda file**
56+
4. Upload the *online.yml* file from your local machine, with the same steps as described in [uploading *conda.yml* file in the first tutorial](./tutorial-get-started-with-feature-store.md#prepare-the-notebook-environment).
5757
5. Optionally, increase the session time-out (idle time) to avoid frequent prerequisite reruns.
5858

5959
2. This code cell starts the Spark session. It needs about 10 minutes to install all dependencies and start the Spark session.

0 commit comments

Comments
 (0)