Merge pull request #259276 from fbsolo-ms1/update-for-modified-notebooks

prmerger-automator[bot] · web-flow · commit 2ed2c9fb6a0a · 2023-11-28T19:43:11.000Z
Update .MD files to match .IPYNB notebook updates . . .
diff --git a/articles/machine-learning/tutorial-develop-feature-set-with-custom-source.md b/articles/machine-learning/tutorial-develop-feature-set-with-custom-source.md
@@ -9,7 +9,7 @@ ms.subservice: core
 ms.topic: tutorial
 author: ynpandey
 ms.author: yogipandey
-ms.date: 10/27/2023
+ms.date: 11/28/2023
 ms.reviewer: franksolomon
 ms.custom:
   - sdkv2
@@ -22,7 +22,7 @@ ms.custom:
 
 An Azure Machine Learning managed feature store lets you discover, create, and operationalize features. Features serve as the connective tissue in the machine learning lifecycle, starting from the prototyping phase, where you experiment with various features. That lifecycle continues to the operationalization phase, where you deploy your models, and inference steps look up the feature data. For more information about feature stores, see [feature store concepts](./concept-what-is-managed-feature-store.md).
 
-Part 1 of this tutorial series showed how to create a feature set specification with custom transformations, enable materialization and perform a backfill. Part 2 of this tutorial series showed how to experiment with features in the experimentation and training flows. Part 4 described how to run batch inference.
+Part 1 of this tutorial series showed how to create a feature set specification with custom transformations, enable materialization and perform a backfill. Part 2 showed how to experiment with features in the experimentation and training flows. Part 3 explained recurrent materialization for the `transactions` feature set, and showed how to run a batch inference pipeline on the registered model. Part 4 described how to run batch inference.
 
 In this tutorial, you'll
 
@@ -36,26 +36,27 @@ In this tutorial, you'll
 > [!NOTE]
 > This tutorial uses an Azure Machine Learning notebook with **Serverless Spark Compute**.
 
-* Make sure you execute the notebook from Tutorial 1. That notebook includes creation of a feature store and a feature set, followed by enabling of materialization and performance of backfill.
+* Make sure you complete the previous tutorials in this series. This tutorial reuses feature store and other resources created in those earlier tutorials.
 
 ## Set up
 
 This tutorial uses the Python feature store core SDK (`azureml-featurestore`). The Python SDK is used for create, read, update, and delete (CRUD) operations, on feature stores, feature sets, and feature store entities.
 
 You don't need to explicitly install these resources for this tutorial, because in the set-up instructions shown here, the `conda.yml` file covers them.
 
-### Configure the Azure Machine Learning Spark notebook.
+### Configure the Azure Machine Learning Spark notebook
 
 You can create a new notebook and execute the instructions in this tutorial step by step. You can also open and run the existing notebook *featurestore_sample/notebooks/sdk_only/5. Develop a feature set with custom source.ipynb*. Keep this tutorial open and refer to it for documentation links and more explanation.
 
 1. On the top menu, in the **Compute** dropdown list, select **Serverless Spark Compute** under **Azure Machine Learning Serverless Spark**.
 
 2. Configure the session:
 
-    1. When the toolbar displays **Configure session**, select it.
-    2. On the **Python packages** tab, select **Upload Conda file**.
-    3. Upload the *conda.yml* file that you [uploaded in the first tutorial](./tutorial-get-started-with-feature-store.md#prepare-the-notebook-environment).
-    4. Optionally, increase the session time-out (idle time) to avoid frequent prerequisite reruns.
+    1. Select **Configure session** in the top status bar.
+    2. Select the **Python packages** tab, s
+    3. Select **Upload Conda file**.
+    4. Upload the *conda.yml* file that you [uploaded in the first tutorial](./tutorial-get-started-with-feature-store.md#prepare-the-notebook-environment).
+    5. Optionally, increase the session time-out (idle time) to avoid frequent prerequisite reruns.
 
 ## Set up the root directory for the samples
 This code cell sets up the root directory for the samples. It needs about 10 minutes to install all dependencies and start the Spark session.
@@ -159,4 +160,4 @@ If you created a resource group for the tutorial, you can delete that resource g
 ## Next steps
 
 * [Network isolation with feature store](./tutorial-network-isolation-for-feature-store.md)
-* [Azure Machine Learning feature stores samples repository](https://github.com/Azure/azureml-examples/tree/main/sdk/python/featurestore_sample)
+* [Azure Machine Learning feature stores samples repository](https://github.com/Azure/azureml-examples/tree/main/sdk/python/featurestore_sample)
diff --git a/articles/machine-learning/tutorial-enable-recurrent-materialization-run-batch-inference.md b/articles/machine-learning/tutorial-enable-recurrent-materialization-run-batch-inference.md
@@ -9,7 +9,7 @@ ms.subservice: core
 ms.topic: tutorial
 author: rsethur
 ms.author: seramasu
-ms.date: 10/27/2023
+ms.date: 11/28/2023
 ms.reviewer: franksolomon
 ms.custom: sdkv2, build-2023, ignite-2023
 #Customer intent: As a professional data scientist, I want to know how to build and deploy a model with Azure Machine Learning by using Python in a Jupyter Notebook.
@@ -37,14 +37,15 @@ Before you proceed with this tutorial, be sure to complete the first and second
 
    To run this tutorial, you can create a new notebook and execute the instructions step by step. You can also open and run the existing notebook named *3. Enable recurrent materialization and run batch inference*. You can find that notebook, and all the notebooks in this series, in the *featurestore_sample/notebooks* directory. You can choose *sdk_only* or *sdk_and_cli*. Keep this tutorial open and refer to it for documentation links and more explanation.
 
-   1. On the top menu, in the **Compute** dropdown list, select **Serverless Spark Compute** under **Azure Machine Learning Serverless Spark**.
+   1. In the **Compute** dropdown list in the top nav, select **Serverless Spark Compute** under **Azure Machine Learning Serverless Spark**.
 
    2. Configure the session:
 
-      1. When the toolbar displays **Configure session**, select it.
-      2. On the **Python packages** tab, select **Upload conda file**.
-      3. Upload the *conda.yml* file that you [uploaded in the first tutorial](./tutorial-get-started-with-feature-store.md#prepare-the-notebook-environment).
-      4. Optionally, increase the session time-out (idle time) to avoid frequent prerequisite reruns.
+      1. Select **Configure session** in the top status bar.
+      2. Select the **Python packages** tab.
+      3. Select **Upload conda file**.
+      4. Select the `azureml-examples/sdk/python/featurestore-sample/project/env/online.yml` file from your local machine.
+      5. Optionally, increase the session time-out (idle time) to avoid frequent prerequisite reruns.
 
 2. Start the Spark session.
 
diff --git a/articles/machine-learning/tutorial-get-started-with-feature-store.md b/articles/machine-learning/tutorial-get-started-with-feature-store.md
@@ -8,7 +8,7 @@ ms.subservice: core
 ms.topic: tutorial
 author: rsethur
 ms.author: seramasu
-ms.date: 11/01/2023
+ms.date: 11/28/2023
 ms.reviewer: franksolomon
 ms.custom: sdkv2, build-2023, ignite-2023
 #Customer intent: As a professional data scientist, I want to know how to build and deploy a model with Azure Machine Learning by using Python in a Jupyter Notebook.
@@ -45,7 +45,7 @@ Before you proceed with this tutorial, be sure to cover these prerequisites:
 
 * An Azure Machine Learning workspace. For more information about workspace creation, see [Quickstart: Create workspace resources](./quickstart-create-resources.md).
 
-* On your user account, the Owner or Contributor role for the resource group where the feature store is created.
+* On your user account, the Owner role for the resource group where the feature store is created.
 
    If you choose to use a new resource group for this tutorial, you can easily delete all the resources by deleting the resource group.
 
@@ -59,18 +59,25 @@ This tutorial uses an Azure Machine Learning Spark notebook for development.
 
    :::image type="content" source="media/tutorial-get-started-with-feature-store/clone-featurestore-example-notebooks.png" lightbox="media/tutorial-get-started-with-feature-store/clone-featurestore-example-notebooks.png" alt-text="Screenshot that shows selection of the sample directory in Azure Machine Learning studio.":::
 
-1. The **Select target directory** panel opens. Select the user directory (in this case, **testUser**), and then select **Clone**.
+1. The **Select target directory** panel opens. Select the **Users** directory, then select _your user name_, and finally select **Clone**.
 
    :::image type="content" source="media/tutorial-get-started-with-feature-store/select-target-directory.png" lightbox="media/tutorial-get-started-with-feature-store/select-target-directory.png" alt-text="Screenshot showing selection of the target directory location in Azure Machine Learning studio for the sample resource.":::
 
 1. To configure the notebook environment, you must upload the *conda.yml* file:
 
    1. Select **Notebooks** on the left pane, and then select the **Files** tab.
-   1. Browse to the *env* directory (select **Users** > **testUser** > **featurestore_sample** > **project** > **env**), and then select the *conda.yml* file. In this path, *testUser* is the user directory.
+   1. Browse to the *env* directory (select **Users** > **your_user_name** > **featurestore_sample** > **project** > **env**), and then select the *conda.yml* file.
    1. Select **Download**.
 
    :::image type="content" source="media/tutorial-get-started-with-feature-store/download-conda-file.png" lightbox="media/tutorial-get-started-with-feature-store/download-conda-file.png" alt-text="Screenshot that shows selection of the Conda YAML file in Azure Machine Learning studio.":::
 
+   1. Select **Serverless Spark Compute** in the top navigation **Compute** dropdown. This operation might take one to two minutes. Wait for a status bar in the top to display **Configure session**.
+   1. Select **Configure session** in the top status bar.
+   1. Select **Python packages**.
+   1. Select **Upload conda files**.
+   1. Select the `conda.yml` file you downloaded on your local device.
+   1. (Optional) Increase the session time-out (idle time in minutes) to reduce the serverless spark cluster startup time.
+
 1. In the Azure Machine Learning environment, open the notebook, and then select **Configure session**.
 
    :::image type="content" source="media/tutorial-get-started-with-feature-store/open-configure-session.png" lightbox="media/tutorial-get-started-with-feature-store/open-configure-session.png" alt-text="Screenshot that shows selections for configuring a session for a notebook.":::
@@ -104,7 +111,7 @@ Not applicable.
 
 ### [SDK and CLI track](#tab/SDK-and-CLI-track)
 
-1. Install the Azure Machine Learning extension.
+1. Install the Azure Machine Learning CLI extension.
 
    [!notebook-python[] (~/azureml-examples-main/sdk/python/featurestore_sample/notebooks/sdk_and_cli/1. Develop a feature set and register with managed feature store.ipynb?name=install-ml-ext-cli)]
 
@@ -331,7 +338,7 @@ As a best practice, entities help enforce use of the same join key definition ac
 
 1. Initialize the feature store CRUD client.
 
-   As explained earlier in this tutorial, `MLClient` is used for creating, reading, updating, and deleting a feature store asset. The notebook code cell sample shown here searches for the feature store that you created in an earlier step. Here, you can't reuse the same `ml_client` value that you used earlier in this tutorial, because it is scoped at the resource group level. Proper scoping is a prerequisite for feature store creation.
+   As explained earlier in this tutorial, `MLClient` is used for creating, reading, updating, and deleting a feature store asset. The notebook code cell sample shown here searches for the feature store that you created in an earlier step. Here, you can't reuse the same `ml_client` value that you used earlier in this tutorial, because it's scoped at the resource group level. Proper scoping is a prerequisite for feature store creation.
 
    In this code sample, the client is scoped at feature store level.
 
@@ -433,10 +440,26 @@ The Storage Blob Data Reader role must be assigned to your user account on the o
 
 ### [SDK track](#tab/SDK-track)
 
+#### Set spark.sql.shuffle.partitions in the yaml file according to the feature data size
+
+   The spark configuration `spark.sql.shuffle.partitions` is an OPTIONAL parameter that can affect the number of parquet files generated (per day) when the feature set is materialized into the offline store. The default value of this parameter is 200. As best practice, avoid generation of many small parquet files. If offline feature retrieval becomes slow after feature set materialization, go to the corresponding folder in the offline store to check whether the issue involves too many small parquet files (per day), and adjust the value of this parameter accordingly.
+
+   > [!NOTE]
+   > The sample data used in this notebook is small. Therefore, this parameter is set to 1 in the
+   > featureset_asset_offline_enabled.yaml file.
+
    [!notebook-python[] (~/azureml-examples-main/sdk/python/featurestore_sample/notebooks/sdk_only/1. Develop a feature set and register with managed feature store.ipynb?name=enable-offline-mat-txns-fset)]
 
 ### [SDK and CLI track](#tab/SDK-and-CLI-track)
 
+#### Set spark.sql.shuffle.partitions in the yaml file according to the feature data size
+
+   The spark configuration `spark.sql.shuffle.partitions` is an OPTIONAL parameter that can affect the number of parquet files generated (per day) when the feature set is materialized into the offline store. The default value of this parameter is 200. As best practice, avoid generation of many small parquet files. If offline feature retrieval becomes slow after feature set materialization, go to the corresponding folder in the offline store to check whether the issue involves too many small parquet files (per day), and adjust the value of this parameter accordingly.
+
+   > [!NOTE]
+   > The sample data used in this notebook is small. Therefore, this parameter is set to 1 in the
+   > featureset_asset_offline_enabled.yaml file.
+
    [!notebook-python[] (~/azureml-examples-main/sdk/python/featurestore_sample/notebooks/sdk_and_cli/1. Develop a feature set and register with managed feature store.ipynb?name=enable-offline-mat-txns-fset-cli)]
 
 ---
@@ -503,7 +526,7 @@ You can explore feature materialization status for a feature set in the **Materi
 - The data can have a maximum of 2,000 *data intervals*. If your data contains more than 2,000 *data intervals*, create a new feature set version.
 - You can provide a list of more than one data statuses (for example, `["None", "Incomplete"]`) in a single backfill job.
 - During backfill, a new materialization job is submitted for each *data interval* that falls within the defined feature window.
-- If a materialization job is pending, or it is running for a *data interval* that hasn't yet been backfilled, a new job isn't submitted for that *data interval*.
+- If a materialization job is pending, or that job is running for a *data interval* that hasn't yet been backfilled, a new job isn't submitted for that *data interval*.
 - You can retry a failed materialization job.
 
    > [!NOTE]
diff --git a/articles/machine-learning/tutorial-online-materialization-inference.md b/articles/machine-learning/tutorial-online-materialization-inference.md
@@ -9,7 +9,7 @@ ms.subservice: core
 ms.topic: tutorial
 author: ynpandey
 ms.author: yogipandey
-ms.date: 10/27/2023
+ms.date: 11/28/2023
 ms.reviewer: franksolomon
 ms.custom: sdkv2, ignite-2023
 #Customer intent: As a professional data scientist, I want to know how to build and deploy a model with Azure Machine Learning by using Python in a Jupyter Notebook.
@@ -46,14 +46,14 @@ You don't need to explicitly install these resources for this tutorial, because
 
    You can create a new notebook and execute the instructions in this tutorial step by step. You can also open and run the existing notebook *featurestore_sample/notebooks/sdk_only/4. Enable online store and run online inference.ipynb*. Keep this tutorial open and refer to it for documentation links and more explanation.
 
-   1. On the top menu, in the **Compute** dropdown list, select **Serverless Spark Compute** under **Azure Machine Learning Serverless Spark**.
+   1. In the **Compute** dropdown list in the top nav, select **Serverless Spark Compute**.
 
    2. Configure the session:
 
-      1. Download *featurestore-sample/project/env/online.yml* file to your local machine.
-      2. When the toolbar displays **Configure session**, select it.
-      3. On the **Python packages** tab, select **Upload Conda file**.
-      4. Upload the *online.yml* file in the same way as described in [uploading *conda.yml* file in the first tutorial](./tutorial-get-started-with-feature-store.md#prepare-the-notebook-environment).
+      1. Download *azureml-examples/sdk/python/featurestore-sample/project/env/online.yml* file to your local machine.
+      2. In **configure session** in the top nav, select **Python packages**
+      3. Select **Upload Conda file**
+      4. Upload the *online.yml* file from your local machine, with the same steps as described in [uploading *conda.yml* file in the first tutorial](./tutorial-get-started-with-feature-store.md#prepare-the-notebook-environment).
       5. Optionally, increase the session time-out (idle time) to avoid frequent prerequisite reruns.
 
 2. This code cell starts the Spark session. It needs about 10 minutes to install all dependencies and start the Spark session.