Merge pull request #217435 from fbsolo-ms1/spark-docs-minor-updates

v-dirichards · web-flow · commit bb25bdee187c · 2022-11-07T14:27:21.000-06:00
Minor changes to the Spark-related files . . .
diff --git a/articles/machine-learning/how-to-manage-synapse-spark-pool.md b/articles/machine-learning/how-to-manage-synapse-spark-pool.md
@@ -66,7 +66,7 @@ The **Attach Synapse Spark pool (preview)** panel will open on the right side of
 
 [!INCLUDE [cli v2](../../includes/machine-learning-cli-v2.md)]
 
-The Azure Machine Learning CLI provides the ability to attach and manage a Synapse Spark pool from the command line interface, using intuitive YAML syntax and commands.
+With the Azure Machine Learning CLI, we can attach and manage a Synapse Spark pool from the command line interface, using intuitive YAML syntax and commands.
 
 To define an attached Synapse Spark pool using YAML syntax, the YAML file should cover these properties: 
 
@@ -129,7 +129,7 @@ The YAML files above can be used in the `az ml compute attach` command as the `-
 az ml compute attach --file <YAML_SPECIFICATION_FILE_NAME>.yaml --subscription <SUBSCRIPTION_ID> --resource-group <RESOURCE_GROUP> --workspace-name <AML_WORKSPACE_NAME>
 ```
 
-This shows the expected output of the above command:
+This sample shows the expected output of the above command:
 
 ```azurecli
 Class SynapseSparkCompute: This is an experimental class, and may change at any time. Please see https://aka.ms/azuremlexperimental for more information.
@@ -174,7 +174,7 @@ To display details of an attached Synapse Spark pool, execute the `az ml compute
 az ml compute show --name <ATTACHED_SPARK_POOL_NAME> --subscription <SUBSCRIPTION_ID> --resource-group <RESOURCE_GROUP> --workspace-name <AML_WORKSPACE_NAME>
 ```
 
-This shows the expected output of the above command:
+This sample shows the expected output of the above command:
 
 ```azurecli
 <ATTACHED_SPARK_POOL_NAME>
@@ -209,7 +209,7 @@ To see a list of all computes, including the attached Synapse Spark pools in a w
    az ml compute list --subscription <SUBSCRIPTION_ID> --resource-group <RESOURCE_GROUP> --workspace-name <AML_WORKSPACE_NAME>
 ```
 
-This shows the expected output of the above command:
+This sample shows the expected output of the above command:
 
 ```azurecli
 [
@@ -417,7 +417,7 @@ Execute the `az ml compute update` command, with appropriate parameters, to upda
 az ml compute update --identity SystemAssigned --subscription <SUBSCRIPTION_ID> --resource-group <RESOURCE_GROUP> --workspace-name <AML_WORKSPACE_NAME> --name <ATTACHED_SPARK_POOL_NAME>
 ```
 
-This shows the expected output of the above command:
+This sample shows the expected output of the above command:
 
 ```azurecli
 Class SynapseSparkCompute: This is an experimental class, and may change at any time. Please see https://aka.ms/azuremlexperimental for more information.
@@ -460,7 +460,7 @@ az ml compute update --identity UserAssigned --user-assigned-identities /subscri
 
 ```
 
-This shows the expected output of the above command:
+This sample shows the expected output of the above command:
 
 ```azurecli
 Class SynapseSparkCompute: This is an experimental class, and may change at any time. Please see https://aka.ms/azuremlexperimental for more information.
@@ -582,7 +582,7 @@ We might want to detach an attached Synapse Spark pool, to clean up a workspace.
 
 # [Studio UI](#tab/studio-ui)
 
-The Azure Machine Learning studio UI also provides a way to detach an attached Synapse Spark pool. To do this:
+The Azure Machine Learning studio UI also provides a way to detach an attached Synapse Spark pool. Follow these steps to do this:
 
 1. Open the **Details** page for the Synapse Spark pool, in the Azure Machine Learning studio.
 
@@ -592,15 +592,15 @@ The Azure Machine Learning studio UI also provides a way to detach an attached S
 
 [!INCLUDE [cli v2](../../includes/machine-learning-cli-v2.md)]
 
-An attached Synapse Spark pool can be detached by executing the `az ml compute detach` command with name of the pool passed using `--name` parameter as following:
+An attached Synapse Spark pool can be detached by executing the `az ml compute detach` command with name of the pool passed using `--name` parameter as shown here:
 
 ```azurecli
 
 az ml compute detach --name <ATTACHED_SPARK_POOL_NAME> --subscription <SUBSCRIPTION_ID> --resource-group <RESOURCE_GROUP> --workspace-name <AML_WORKSPACE_NAME>
  
 ```
 
-This shows the expected output of the above command:
+This sample shows the expected output of the above command:
 
 ```azurecli 
 Are you sure you want to perform this operation? (y/n): y
@@ -611,7 +611,7 @@ Are you sure you want to perform this operation? (y/n): y
 
 [!INCLUDE [sdk v2](../../includes/machine-learning-sdk-v2.md)]
 
- An `MLClient.compute.begin_delete()` function call will do this for us. Pass the `name` of the attached Synapse Spark pool, along with the action `Detach`, to the function. This code snippet detaches a Synapse Spark pool from an Azure Machine Learning workspace:
+ We will use an `MLClient.compute.begin_delete()` function call. Pass the `name` of the attached Synapse Spark pool, along with the action `Detach`, to the function. This code snippet detaches a Synapse Spark pool from an Azure Machine Learning workspace:
 
 ```python
 # import required libraries
@@ -640,6 +640,6 @@ Some user scenarios may require access to a Synapse Spark Pool, during an Azure
 
 ## Next steps
 
-- [Interactive Data Wrangling with Apache Spark in Azure Machine Learning (preview)](/interactive-data-wrangling-with-apache-spark-azure-ml.md)
+- [Interactive Data Wrangling with Apache Spark in Azure Machine Learning (preview)](./interactive-data-wrangling-with-apache-spark-azure-ml.md)
 
-- [Submit Spark jobs in Azure Machine Learning (preview)](/how-to-submit-spark-jobs.md)
+- [Submit Spark jobs in Azure Machine Learning (preview)](./how-to-submit-spark-jobs.md)
diff --git a/articles/machine-learning/how-to-submit-spark-jobs.md b/articles/machine-learning/how-to-submit-spark-jobs.md
@@ -16,21 +16,21 @@ ms.custom: template-how-to
 
 [!INCLUDE [preview disclaimer](../../includes/machine-learning-preview-generic-disclaimer.md)]
 
-Azure Machine Learning provides the ability to submit standalone machine learning jobs or creating a [machine learning pipeline](/concept-ml-pipelines.md) comprising multiple steps in a machine learning workflow. Azure Machine Learning supports creation of a standalone Spark job, and creation of a reusable Spark component that can be used in Azure Machine Learning pipelines. In this article you will learn how to submit Spark jobs using:
+Azure Machine Learning provides the ability to submit standalone machine learning jobs or creating a [machine learning pipeline](./concept-ml-pipelines.md) comprising multiple steps in a machine learning workflow. Azure Machine Learning supports creation of a standalone Spark job, and creation of a reusable Spark component that can be used in Azure Machine Learning pipelines. In this article you will learn how to submit Spark jobs using:
 - Azure Machine Learning studio UI
 - Azure Machine Learning CLI
 - Azure Machine Learning SDK
 
 ## Prerequisites
 - An Azure subscription; if you don't have an Azure subscription, [create a free account](https://azure.microsoft.com/free) before you begin
 - An Azure Machine Learning workspace. See [Create workspace resources](./quickstart-create-resources.md)
-- [An attached Synapse Spark pool in the Azure Machine Learning workspace](/how-to-manage-synapse-spark-pool.md).
+- [An attached Synapse Spark pool in the Azure Machine Learning workspace](./how-to-manage-synapse-spark-pool.md).
 - [Configure your development environment](./how-to-configure-environment.md), or [create an Azure Machine Learning compute instance](./concept-compute-instance.md#create)
 - [Install the Azure Machine Learning SDK for Python](/python/api/overview/azure/ml/installv2)
 - [Install Azure Machine Learning CLI](./how-to-configure-cli.md?tabs=public)
 
 ## Ensuring resource access for Spark jobs
-Spark jobs can use either user identity passthrough or a managed identity to access data and other resource. Different mechanisms for accessing resources while using attached Synapse Spark pool and Managed (Automatic) Spark compute are summarized in the following table.
+Spark jobs can use either user identity passthrough or a managed identity to access data and other resource. Different mechanisms for accessing resources while using attached Synapse Spark pool and Managed (Automatic) Spark compute are summarized in the following table. 
 
 |Spark pool|Supported identities|Default identity|
 | ---------- | -------------------- | ---------------- |
@@ -66,9 +66,9 @@ armclient PATCH https://management.azure.com/subscriptions/<SUBSCRIPTION_ID>/res
 > To ensure successful execution of spark job, the identity being used for the Spark job should be assigned **Contributor** and **Storage Blob Data Contributor** roles on the Azure storage account used for data input and output.
 
 ## Submit a standalone Spark job
-Once a Python script is developed by [interactive data wrangling](/interactive-data-wrangling-with-apache-spark-azure-ml.md), it can be used for submitting a batch job to process a larger volume of data after making necessary changes for parameterization of the Python script. A simple data wrangling batch job can be submitted as a standalone Spark job. 
+Once a Python script is developed by [interactive data wrangling](./interactive-data-wrangling-with-apache-spark-azure-ml.md), it can be used for submitting a batch job to process a larger volume of data after making necessary changes for parameterization of the Python script. A simple data wrangling batch job can be submitted as a standalone Spark job. 
 
-A Spark job requires a Python script that takes arguments, which can be developed by modifying the Python code developed from [interactive data wrangling](/interactive-data-wrangling-with-apache-spark-azure-ml.md). A sample Python script is shown here.
+A Spark job requires a Python script that takes arguments, which can be developed by modifying the Python code developed from [interactive data wrangling](./interactive-data-wrangling-with-apache-spark-azure-ml.md). A sample Python script is shown here.
 
 ```python
 
@@ -126,7 +126,7 @@ A standalone Spark job can be defined as a YAML specification file, which can be
       - `spark.dynamicAllocation.maxExecutors` - the maximum number of Spark executors instances, for dynamic allocation.
   -   If dynamic allocation of executors is disabled, define this property:
       - `spark.executor.instances` - the number of Spark executor instances.
-- `environment` - an [Azure Machine Learning environment](/reference-yaml-environment) to run the job.
+- `environment` - an [Azure Machine Learning environment](./reference-yaml-environment.md) to run the job.
 - `args` - the command line arguments that should be passed to the job entry point Python script or class. See the YAML specification file provided below for an example.
 - `compute` - this property defines the name of an attached Synapse Spark pool, as shown in this example:
 ```yaml
@@ -429,7 +429,7 @@ To submit a standalone Spark job using the Azure Machine Learning studio UI:
     1. Select **Create** to submit the standalone Spark job.
 
 ## Spark component in a pipeline job
-A Spark component allows the flexibility to use the same component in multiple [Azure Machine Learning pipelines](/concept-ml-pipelines) as a pipeline step. 
+A Spark component allows the flexibility to use the same component in multiple [Azure Machine Learning pipelines](./concept-ml-pipelines.md) as a pipeline step. 
 
 # [Azure CLI](#tab/cli)
 
@@ -501,7 +501,7 @@ conf:
 
 ```
 
-The Spark component defined in the above YAML specification file can be used in an Azure Machine Learning pipeline job. See [pipeline job YAML schema](/reference-yaml-job-pipeline.md) to learn more about the YAML syntax that defines a pipeline job. This is an example YAML specification file for a pipeline job, with a Spark component:
+The Spark component defined in the above YAML specification file can be used in an Azure Machine Learning pipeline job. See [pipeline job YAML schema](./reference-yaml-job-pipeline.md) to learn more about the YAML syntax that defines a pipeline job. This is an example YAML specification file for a pipeline job, with a Spark component:
 
 ```yaml
 
diff --git a/articles/machine-learning/interactive-data-wrangling-with-apache-spark-azure-ml.md b/articles/machine-learning/interactive-data-wrangling-with-apache-spark-azure-ml.md
@@ -389,5 +389,5 @@ df.head()
 
 - [Code samples for interactive data wrangling with Apache Spark in Azure Machine Learning](https://github.com/Azure/azureml-examples/tree/main/sdk/python/data-wrangling)
 - [Optimize Apache Spark jobs in Azure Synapse Analytics](../synapse-analytics/spark/apache-spark-performance.md)
-- [What are Azure Machine Learning pipelines?](/concept-ml-pipelines.md)
-- [Submit Spark jobs in Azure Machine Learning (preview)](/how-to-submit-spark-jobs.md)
+- [What are Azure Machine Learning pipelines?](./concept-ml-pipelines.md)
+- [Submit Spark jobs in Azure Machine Learning (preview)](./how-to-submit-spark-jobs.md)
diff --git a/articles/machine-learning/toc.yml b/articles/machine-learning/toc.yml
@@ -317,6 +317,9 @@
         - name: Manage compute resources
           displayName: compute target, dsvm, Data Science Virtual Machine, local, cluster, ACI, container instance, Databricks, data lake, lake, HDI, HDInsight, low priority, managed identity
           href: how-to-create-attach-compute-studio.md
+        - name: Attach and Manage a Synapse Spark pool
+          displayName: Attach and Manage a Synapse Spark pool
+          href: how-to-manage-synapse-spark-pool.md
     - name: AKS and Azure Arc-enabled Kubernetes
       items:
         - name: What is Kubernetes compute target
@@ -408,6 +411,12 @@
         - name: Read & write data in jobs
           displayName: Read & write data in jobs
           href: how-to-read-write-data-v2.md
+        - name: Submit Spark jobs in Azure Machine Learning
+          displayName: Submit Spark jobs in Azure Machine Learning
+          href: how-to-submit-spark-jobs.md
+        - name: Interactive Data Wrangling with Apache Spark
+          displayName: Interactive Data Wrangling with Apache Spark
+          href: interactive-data-wrangling-with-apache-spark-azure-ml.md
         - name: Data administration
           displayName: Data administration
           href: how-to-administrate-data-authentication.md