Merge pull request #223230 from fbsolo-ms1/tutorial-for-SK

v-stsavell · web-flow · commit 5aa495f5e2f6 · 2023-01-06T15:56:30.000-06:00
Yogi P requested a file update . . .
diff --git a/articles/machine-learning/quickstart-spark-jobs.md b/articles/machine-learning/quickstart-spark-jobs.md
@@ -22,16 +22,6 @@ In this quickstart guide, you'll learn how to submit a Spark job using Azure Mac
 
 ## Prerequisites
 
-# [Studio UI](#tab/studio-ui)
-- An Azure subscription; if you don't have an Azure subscription, [create a free account](https://azure.microsoft.com/free) before you begin.
-- An Azure Machine Learning workspace. See [Create workspace resources](./quickstart-create-resources.md).
-- An Azure Data Lake Storage (ADLS) Gen 2 storage account. See [Create an Azure Data Lake Storage (ADLS) Gen 2 storage account](../storage/blobs/create-data-lake-storage-account.md).
-- To enable this feature:
-  1. Navigate to Azure Machine Learning studio UI.
-  2. Select **Manage preview features** (megaphone icon) among the icons on the top right side of the screen.
-  3. In **Managed preview feature** panel, toggle on **Run notebooks and jobs on managed Spark** feature.
-  :::image type="content" source="media/quickstart-spark-jobs/how-to-enable-managed-spark-preview.png" lightbox="media/quickstart-spark-jobs/how-to-enable-managed-spark-preview.png" alt-text="Expandable screenshot showing option for enabling Managed Spark preview.":::
-
 # [CLI](#tab/cli)
 [!INCLUDE [cli v2](../../includes/machine-learning-cli-v2.md)]
 - An Azure subscription; if you don't have an Azure subscription, [create a free account](https://azure.microsoft.com/free) before you begin.
@@ -60,6 +50,16 @@ In this quickstart guide, you'll learn how to submit a Spark job using Azure Mac
 >  - [Visual Studio Code connected to an Azure Machine Learning compute instance](./how-to-set-up-vs-code-remote.md?tabs=studio).
 >  - your local computer that has [the Azure Machine Learning SDK for Python](/python/api/overview/azure/ml/installv2) installed.
 
+# [Studio UI](#tab/studio-ui)
+- An Azure subscription; if you don't have an Azure subscription, [create a free account](https://azure.microsoft.com/free) before you begin.
+- An Azure Machine Learning workspace. See [Create workspace resources](./quickstart-create-resources.md).
+- An Azure Data Lake Storage (ADLS) Gen 2 storage account. See [Create an Azure Data Lake Storage (ADLS) Gen 2 storage account](../storage/blobs/create-data-lake-storage-account.md).
+- To enable this feature:
+  1. Navigate to Azure Machine Learning studio UI.
+  2. Select **Manage preview features** (megaphone icon) among the icons on the top right side of the screen.
+  3. In **Managed preview feature** panel, toggle on **Run notebooks and jobs on managed Spark** feature.
+  :::image type="content" source="media/quickstart-spark-jobs/how-to-enable-managed-spark-preview.png" lightbox="media/quickstart-spark-jobs/how-to-enable-managed-spark-preview.png" alt-text="Expandable screenshot showing option for enabling Managed Spark preview.":::
+
 ---
 
 ## Add role assignments in Azure storage accounts
@@ -132,62 +132,6 @@ The above script takes two arguments `--titanic_data` and `--wrangled_data`, whi
 
 ## Submit a standalone Spark job
 
-# [Studio UI](#tab/studio-ui)
-First, upload the parameterized Python code `titanic.py` to the Azure Blob storage container for workspace default datastore `workspaceblobstore`. To submit a standalone Spark job using the Azure Machine Learning studio UI:
-
-:::image type="content" source="media/quickstart-spark-jobs/create-standalone-spark-job.png" lightbox="media/quickstart-spark-jobs/create-standalone-spark-job.png" alt-text="Expandable screenshot showing creation of a new Spark job in Azure Machine Learning studio UI.":::
-
-1. In the left pane, select **+ New**.
-2. Select **Spark job (preview)**.
-3. On the **Compute** screen:
-
-    :::image type="content" source="media/quickstart-spark-jobs/create-standalone-spark-job-compute.png" lightbox="media/quickstart-spark-jobs/create-standalone-spark-job-compute.png" alt-text="Expandable screenshot showing compute selection screen for a new Spark job in Azure Machine Learning studio UI.":::
-
-   1. Under **Select compute type**, select **Spark automatic compute (Preview)** for Managed (Automatic) Spark compute.
-   2. Select **Virtual machine size**. The following instance types are currently supported:
-      - `Standard_E4s_v3`
-      - `Standard_E8s_v3`
-      - `Standard_E16s_v3`
-      - `Standard_E32s_v3`
-      - `Standard_E64s_v3`
-   3. Select **Spark runtime version** as **Spark 3.2**.
-   4. Select **Next**.
-4. On the **Environment** screen, select **Next**.
-5. On **Job settings** screen:
-    1. Provide a job **Name**, or use the job **Name**, which is generated by default.
-    2. Select an **Experiment name** from the dropdown menu.
-    3. Under **Add tags**, provide **Name** and **Value**, then select **Add**. Adding tags is optional.
-    4. Under the **Code** section:
-        1. Select **Azure Machine Learning workspace default blob storage** from **Choose code location** dropdown.
-        2. Under **Path to code file to upload**, select **Browse**.
-        3. In the pop-up screen titled **Path selection**, select the path of code file `titanic.py` on the workspace default datastore `workspaceblobstore`.
-        4. Select **Save**.
-        5. Input `titanic.py` as the name of **Entry file** for the standalone job.
-        6. To add an input, select **+ Add input** under **Inputs** and
-            1. Enter **Input name** as `titanic_data`. The input should refer to this name later in the **Arguments**.
-            2. Select **Input type** as **Data**.
-            3. Select **Data type** as **File**.
-            4. Select **Data source** as **URI**.
-            5. Enter an Azure Data Lake Storage (ADLS) Gen 2 data URI for `titanic.csv` file in format `abfss://<FILE_SYSTEM_NAME>@<STORAGE_ACCOUNT_NAME>.dfs.core.windows.net/<PATH_TO_DATA>`.
-        7.  To add an input, select **+ Add output** under **Outputs** and
-            1. Enter **Output name** as `wrangled_data`. The output should refer to this name later in the **Arguments**.
-            2. Select **Output type** as **Folder**.
-            3. For **Output URI destination**, enter an Azure Data Lake Storage (ADLS) Gen 2 folder URI in format `abfss://<FILE_SYSTEM_NAME>@<STORAGE_ACCOUNT_NAME>.dfs.core.windows.net/<PATH_TO_DATA>`.
-        8.  Enter **Arguments**  as `--titanic_data ${{inputs.titanic_data}} --wrangled_data ${{outputs.wrangled_data}}`.
-    5. Under the **Spark configurations** section:
-        1. For **Executor size**:
-            1. Enter the number of executor **Cores** as 2 and executor **Memory (GB)** as 2.
-            2. For **Dynamically allocated executors**, select **Disabled**.
-            3. Enter the number of **Executor instances** as 2.
-        2. For **Driver size**, enter number of driver **Cores** as 1 and driver **Memory (GB)** as 2.
-    6. Select **Next**.
-6. On the **Review** screen:
-    1. Review the job specification before submitting it.
-    2. Select **Create** to submit the standalone Spark job.
-
-> [!NOTE]
-> A standalone job submitted from the Studio UI using an Azure Machine Learning Managed (Automatic) Spark compute defaults to user identity passthrough for data access.
-
 # [CLI](#tab/cli)
 [!INCLUDE [cli v2](../../includes/machine-learning-cli-v2.md)]
 This example YAML specification shows a standalone Spark job. It uses an Azure Machine Learning Managed (Automatic) Spark compute, user identity passthrough, and input/output data URI in format `abfss://<FILE_SYSTEM_NAME>@<STORAGE_ACCOUNT_NAME>.dfs.core.windows.net/<PATH_TO_DATA>`:
@@ -308,6 +252,63 @@ In the above code sample:
     - `Standard_E32S_V3`
     - `Standard_E64S_V3`
 
+# [Studio UI](#tab/studio-ui)
+First, upload the parameterized Python code `titanic.py` to the Azure Blob storage container for workspace default datastore `workspaceblobstore`. To submit a standalone Spark job using the Azure Machine Learning studio UI:
+
+:::image type="content" source="media/quickstart-spark-jobs/create-standalone-spark-job.png" lightbox="media/quickstart-spark-jobs/create-standalone-spark-job.png" alt-text="Expandable screenshot showing creation of a new Spark job in Azure Machine Learning studio UI.":::
+
+1. In the left pane, select **+ New**.
+2. Select **Spark job (preview)**.
+3. On the **Compute** screen:
+
+    :::image type="content" source="media/quickstart-spark-jobs/create-standalone-spark-job-compute.png" lightbox="media/quickstart-spark-jobs/create-standalone-spark-job-compute.png" alt-text="Expandable screenshot showing compute selection screen for a new Spark job in Azure Machine Learning studio UI.":::
+
+   1. Under **Select compute type**, select **Spark automatic compute (Preview)** for Managed (Automatic) Spark compute.
+   2. Select **Virtual machine size**. The following instance types are currently supported:
+      - `Standard_E4s_v3`
+      - `Standard_E8s_v3`
+      - `Standard_E16s_v3`
+      - `Standard_E32s_v3`
+      - `Standard_E64s_v3`
+   3. Select **Spark runtime version** as **Spark 3.2**.
+   4. Select **Next**.
+4. On the **Environment** screen, select **Next**.
+5. On **Job settings** screen:
+    1. Provide a job **Name**, or use the job **Name**, which is generated by default.
+    2. Select an **Experiment name** from the dropdown menu.
+    3. Under **Add tags**, provide **Name** and **Value**, then select **Add**. Adding tags is optional.
+    4. Under the **Code** section:
+        1. Select **Azure Machine Learning workspace default blob storage** from **Choose code location** dropdown.
+        2. Under **Path to code file to upload**, select **Browse**.
+        3. In the pop-up screen titled **Path selection**, select the path of code file `titanic.py` on the workspace default datastore `workspaceblobstore`.
+        4. Select **Save**.
+        5. Input `titanic.py` as the name of **Entry file** for the standalone job.
+        6. To add an input, select **+ Add input** under **Inputs** and
+            1. Enter **Input name** as `titanic_data`. The input should refer to this name later in the **Arguments**.
+            2. Select **Input type** as **Data**.
+            3. Select **Data type** as **File**.
+            4. Select **Data source** as **URI**.
+            5. Enter an Azure Data Lake Storage (ADLS) Gen 2 data URI for `titanic.csv` file in format `abfss://<FILE_SYSTEM_NAME>@<STORAGE_ACCOUNT_NAME>.dfs.core.windows.net/<PATH_TO_DATA>`.
+        7.  To add an input, select **+ Add output** under **Outputs** and
+            1. Enter **Output name** as `wrangled_data`. The output should refer to this name later in the **Arguments**.
+            2. Select **Output type** as **Folder**.
+            3. For **Output URI destination**, enter an Azure Data Lake Storage (ADLS) Gen 2 folder URI in format `abfss://<FILE_SYSTEM_NAME>@<STORAGE_ACCOUNT_NAME>.dfs.core.windows.net/<PATH_TO_DATA>`.
+        8.  Enter **Arguments**  as `--titanic_data ${{inputs.titanic_data}} --wrangled_data ${{outputs.wrangled_data}}`.
+    5. Under the **Spark configurations** section:
+        1. For **Executor size**:
+            1. Enter the number of executor **Cores** as 2 and executor **Memory (GB)** as 2.
+            2. For **Dynamically allocated executors**, select **Disabled**.
+            3. Enter the number of **Executor instances** as 2.
+        2. For **Driver size**, enter number of driver **Cores** as 1 and driver **Memory (GB)** as 2.
+    6. Select **Next**.
+6. On the **Review** screen:
+    1. Review the job specification before submitting it.
+    2. Select **Create** to submit the standalone Spark job.
+
+> [!NOTE]
+> A standalone job submitted from the Studio UI using an Azure Machine Learning Managed (Automatic) Spark compute defaults to user identity passthrough for data access.
+
+
 ---
 
 > [!TIP]