Skip to content

Commit bb25bde

Browse files
authored
Merge pull request #217435 from fbsolo-ms1/spark-docs-minor-updates
Minor changes to the Spark-related files . . .
2 parents 30c0fe0 + 2910aaf commit bb25bde

File tree

4 files changed

+31
-22
lines changed

4 files changed

+31
-22
lines changed

articles/machine-learning/how-to-manage-synapse-spark-pool.md

Lines changed: 12 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -66,7 +66,7 @@ The **Attach Synapse Spark pool (preview)** panel will open on the right side of
6666

6767
[!INCLUDE [cli v2](../../includes/machine-learning-cli-v2.md)]
6868

69-
The Azure Machine Learning CLI provides the ability to attach and manage a Synapse Spark pool from the command line interface, using intuitive YAML syntax and commands.
69+
With the Azure Machine Learning CLI, we can attach and manage a Synapse Spark pool from the command line interface, using intuitive YAML syntax and commands.
7070

7171
To define an attached Synapse Spark pool using YAML syntax, the YAML file should cover these properties:
7272

@@ -129,7 +129,7 @@ The YAML files above can be used in the `az ml compute attach` command as the `-
129129
az ml compute attach --file <YAML_SPECIFICATION_FILE_NAME>.yaml --subscription <SUBSCRIPTION_ID> --resource-group <RESOURCE_GROUP> --workspace-name <AML_WORKSPACE_NAME>
130130
```
131131

132-
This shows the expected output of the above command:
132+
This sample shows the expected output of the above command:
133133

134134
```azurecli
135135
Class SynapseSparkCompute: This is an experimental class, and may change at any time. Please see https://aka.ms/azuremlexperimental for more information.
@@ -174,7 +174,7 @@ To display details of an attached Synapse Spark pool, execute the `az ml compute
174174
az ml compute show --name <ATTACHED_SPARK_POOL_NAME> --subscription <SUBSCRIPTION_ID> --resource-group <RESOURCE_GROUP> --workspace-name <AML_WORKSPACE_NAME>
175175
```
176176

177-
This shows the expected output of the above command:
177+
This sample shows the expected output of the above command:
178178

179179
```azurecli
180180
<ATTACHED_SPARK_POOL_NAME>
@@ -209,7 +209,7 @@ To see a list of all computes, including the attached Synapse Spark pools in a w
209209
az ml compute list --subscription <SUBSCRIPTION_ID> --resource-group <RESOURCE_GROUP> --workspace-name <AML_WORKSPACE_NAME>
210210
```
211211

212-
This shows the expected output of the above command:
212+
This sample shows the expected output of the above command:
213213

214214
```azurecli
215215
[
@@ -417,7 +417,7 @@ Execute the `az ml compute update` command, with appropriate parameters, to upda
417417
az ml compute update --identity SystemAssigned --subscription <SUBSCRIPTION_ID> --resource-group <RESOURCE_GROUP> --workspace-name <AML_WORKSPACE_NAME> --name <ATTACHED_SPARK_POOL_NAME>
418418
```
419419

420-
This shows the expected output of the above command:
420+
This sample shows the expected output of the above command:
421421

422422
```azurecli
423423
Class SynapseSparkCompute: This is an experimental class, and may change at any time. Please see https://aka.ms/azuremlexperimental for more information.
@@ -460,7 +460,7 @@ az ml compute update --identity UserAssigned --user-assigned-identities /subscri
460460
461461
```
462462

463-
This shows the expected output of the above command:
463+
This sample shows the expected output of the above command:
464464

465465
```azurecli
466466
Class SynapseSparkCompute: This is an experimental class, and may change at any time. Please see https://aka.ms/azuremlexperimental for more information.
@@ -582,7 +582,7 @@ We might want to detach an attached Synapse Spark pool, to clean up a workspace.
582582

583583
# [Studio UI](#tab/studio-ui)
584584

585-
The Azure Machine Learning studio UI also provides a way to detach an attached Synapse Spark pool. To do this:
585+
The Azure Machine Learning studio UI also provides a way to detach an attached Synapse Spark pool. Follow these steps to do this:
586586

587587
1. Open the **Details** page for the Synapse Spark pool, in the Azure Machine Learning studio.
588588

@@ -592,15 +592,15 @@ The Azure Machine Learning studio UI also provides a way to detach an attached S
592592

593593
[!INCLUDE [cli v2](../../includes/machine-learning-cli-v2.md)]
594594

595-
An attached Synapse Spark pool can be detached by executing the `az ml compute detach` command with name of the pool passed using `--name` parameter as following:
595+
An attached Synapse Spark pool can be detached by executing the `az ml compute detach` command with name of the pool passed using `--name` parameter as shown here:
596596

597597
```azurecli
598598
599599
az ml compute detach --name <ATTACHED_SPARK_POOL_NAME> --subscription <SUBSCRIPTION_ID> --resource-group <RESOURCE_GROUP> --workspace-name <AML_WORKSPACE_NAME>
600600
601601
```
602602

603-
This shows the expected output of the above command:
603+
This sample shows the expected output of the above command:
604604

605605
```azurecli
606606
Are you sure you want to perform this operation? (y/n): y
@@ -611,7 +611,7 @@ Are you sure you want to perform this operation? (y/n): y
611611

612612
[!INCLUDE [sdk v2](../../includes/machine-learning-sdk-v2.md)]
613613

614-
An `MLClient.compute.begin_delete()` function call will do this for us. Pass the `name` of the attached Synapse Spark pool, along with the action `Detach`, to the function. This code snippet detaches a Synapse Spark pool from an Azure Machine Learning workspace:
614+
We will use an `MLClient.compute.begin_delete()` function call. Pass the `name` of the attached Synapse Spark pool, along with the action `Detach`, to the function. This code snippet detaches a Synapse Spark pool from an Azure Machine Learning workspace:
615615

616616
```python
617617
# import required libraries
@@ -640,6 +640,6 @@ Some user scenarios may require access to a Synapse Spark Pool, during an Azure
640640

641641
## Next steps
642642

643-
- [Interactive Data Wrangling with Apache Spark in Azure Machine Learning (preview)](/interactive-data-wrangling-with-apache-spark-azure-ml.md)
643+
- [Interactive Data Wrangling with Apache Spark in Azure Machine Learning (preview)](./interactive-data-wrangling-with-apache-spark-azure-ml.md)
644644

645-
- [Submit Spark jobs in Azure Machine Learning (preview)](/how-to-submit-spark-jobs.md)
645+
- [Submit Spark jobs in Azure Machine Learning (preview)](./how-to-submit-spark-jobs.md)

articles/machine-learning/how-to-submit-spark-jobs.md

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -16,21 +16,21 @@ ms.custom: template-how-to
1616

1717
[!INCLUDE [preview disclaimer](../../includes/machine-learning-preview-generic-disclaimer.md)]
1818

19-
Azure Machine Learning provides the ability to submit standalone machine learning jobs or creating a [machine learning pipeline](/concept-ml-pipelines.md) comprising multiple steps in a machine learning workflow. Azure Machine Learning supports creation of a standalone Spark job, and creation of a reusable Spark component that can be used in Azure Machine Learning pipelines. In this article you will learn how to submit Spark jobs using:
19+
Azure Machine Learning provides the ability to submit standalone machine learning jobs or creating a [machine learning pipeline](./concept-ml-pipelines.md) comprising multiple steps in a machine learning workflow. Azure Machine Learning supports creation of a standalone Spark job, and creation of a reusable Spark component that can be used in Azure Machine Learning pipelines. In this article you will learn how to submit Spark jobs using:
2020
- Azure Machine Learning studio UI
2121
- Azure Machine Learning CLI
2222
- Azure Machine Learning SDK
2323

2424
## Prerequisites
2525
- An Azure subscription; if you don't have an Azure subscription, [create a free account](https://azure.microsoft.com/free) before you begin
2626
- An Azure Machine Learning workspace. See [Create workspace resources](./quickstart-create-resources.md)
27-
- [An attached Synapse Spark pool in the Azure Machine Learning workspace](/how-to-manage-synapse-spark-pool.md).
27+
- [An attached Synapse Spark pool in the Azure Machine Learning workspace](./how-to-manage-synapse-spark-pool.md).
2828
- [Configure your development environment](./how-to-configure-environment.md), or [create an Azure Machine Learning compute instance](./concept-compute-instance.md#create)
2929
- [Install the Azure Machine Learning SDK for Python](/python/api/overview/azure/ml/installv2)
3030
- [Install Azure Machine Learning CLI](./how-to-configure-cli.md?tabs=public)
3131

3232
## Ensuring resource access for Spark jobs
33-
Spark jobs can use either user identity passthrough or a managed identity to access data and other resource. Different mechanisms for accessing resources while using attached Synapse Spark pool and Managed (Automatic) Spark compute are summarized in the following table.
33+
Spark jobs can use either user identity passthrough or a managed identity to access data and other resource. Different mechanisms for accessing resources while using attached Synapse Spark pool and Managed (Automatic) Spark compute are summarized in the following table.
3434

3535
|Spark pool|Supported identities|Default identity|
3636
| ---------- | -------------------- | ---------------- |
@@ -66,9 +66,9 @@ armclient PATCH https://management.azure.com/subscriptions/<SUBSCRIPTION_ID>/res
6666
> To ensure successful execution of spark job, the identity being used for the Spark job should be assigned **Contributor** and **Storage Blob Data Contributor** roles on the Azure storage account used for data input and output.
6767
6868
## Submit a standalone Spark job
69-
Once a Python script is developed by [interactive data wrangling](/interactive-data-wrangling-with-apache-spark-azure-ml.md), it can be used for submitting a batch job to process a larger volume of data after making necessary changes for parameterization of the Python script. A simple data wrangling batch job can be submitted as a standalone Spark job.
69+
Once a Python script is developed by [interactive data wrangling](./interactive-data-wrangling-with-apache-spark-azure-ml.md), it can be used for submitting a batch job to process a larger volume of data after making necessary changes for parameterization of the Python script. A simple data wrangling batch job can be submitted as a standalone Spark job.
7070

71-
A Spark job requires a Python script that takes arguments, which can be developed by modifying the Python code developed from [interactive data wrangling](/interactive-data-wrangling-with-apache-spark-azure-ml.md). A sample Python script is shown here.
71+
A Spark job requires a Python script that takes arguments, which can be developed by modifying the Python code developed from [interactive data wrangling](./interactive-data-wrangling-with-apache-spark-azure-ml.md). A sample Python script is shown here.
7272

7373
```python
7474

@@ -126,7 +126,7 @@ A standalone Spark job can be defined as a YAML specification file, which can be
126126
- `spark.dynamicAllocation.maxExecutors` - the maximum number of Spark executors instances, for dynamic allocation.
127127
- If dynamic allocation of executors is disabled, define this property:
128128
- `spark.executor.instances` - the number of Spark executor instances.
129-
- `environment` - an [Azure Machine Learning environment](/reference-yaml-environment) to run the job.
129+
- `environment` - an [Azure Machine Learning environment](./reference-yaml-environment.md) to run the job.
130130
- `args` - the command line arguments that should be passed to the job entry point Python script or class. See the YAML specification file provided below for an example.
131131
- `compute` - this property defines the name of an attached Synapse Spark pool, as shown in this example:
132132
```yaml
@@ -429,7 +429,7 @@ To submit a standalone Spark job using the Azure Machine Learning studio UI:
429429
1. Select **Create** to submit the standalone Spark job.
430430

431431
## Spark component in a pipeline job
432-
A Spark component allows the flexibility to use the same component in multiple [Azure Machine Learning pipelines](/concept-ml-pipelines) as a pipeline step.
432+
A Spark component allows the flexibility to use the same component in multiple [Azure Machine Learning pipelines](./concept-ml-pipelines.md) as a pipeline step.
433433

434434
# [Azure CLI](#tab/cli)
435435

@@ -501,7 +501,7 @@ conf:
501501
502502
```
503503

504-
The Spark component defined in the above YAML specification file can be used in an Azure Machine Learning pipeline job. See [pipeline job YAML schema](/reference-yaml-job-pipeline.md) to learn more about the YAML syntax that defines a pipeline job. This is an example YAML specification file for a pipeline job, with a Spark component:
504+
The Spark component defined in the above YAML specification file can be used in an Azure Machine Learning pipeline job. See [pipeline job YAML schema](./reference-yaml-job-pipeline.md) to learn more about the YAML syntax that defines a pipeline job. This is an example YAML specification file for a pipeline job, with a Spark component:
505505

506506
```yaml
507507

articles/machine-learning/interactive-data-wrangling-with-apache-spark-azure-ml.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -389,5 +389,5 @@ df.head()
389389

390390
- [Code samples for interactive data wrangling with Apache Spark in Azure Machine Learning](https://github.com/Azure/azureml-examples/tree/main/sdk/python/data-wrangling)
391391
- [Optimize Apache Spark jobs in Azure Synapse Analytics](../synapse-analytics/spark/apache-spark-performance.md)
392-
- [What are Azure Machine Learning pipelines?](/concept-ml-pipelines.md)
393-
- [Submit Spark jobs in Azure Machine Learning (preview)](/how-to-submit-spark-jobs.md)
392+
- [What are Azure Machine Learning pipelines?](./concept-ml-pipelines.md)
393+
- [Submit Spark jobs in Azure Machine Learning (preview)](./how-to-submit-spark-jobs.md)

articles/machine-learning/toc.yml

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -317,6 +317,9 @@
317317
- name: Manage compute resources
318318
displayName: compute target, dsvm, Data Science Virtual Machine, local, cluster, ACI, container instance, Databricks, data lake, lake, HDI, HDInsight, low priority, managed identity
319319
href: how-to-create-attach-compute-studio.md
320+
- name: Attach and Manage a Synapse Spark pool
321+
displayName: Attach and Manage a Synapse Spark pool
322+
href: how-to-manage-synapse-spark-pool.md
320323
- name: AKS and Azure Arc-enabled Kubernetes
321324
items:
322325
- name: What is Kubernetes compute target
@@ -408,6 +411,12 @@
408411
- name: Read & write data in jobs
409412
displayName: Read & write data in jobs
410413
href: how-to-read-write-data-v2.md
414+
- name: Submit Spark jobs in Azure Machine Learning
415+
displayName: Submit Spark jobs in Azure Machine Learning
416+
href: how-to-submit-spark-jobs.md
417+
- name: Interactive Data Wrangling with Apache Spark
418+
displayName: Interactive Data Wrangling with Apache Spark
419+
href: interactive-data-wrangling-with-apache-spark-azure-ml.md
411420
- name: Data administration
412421
displayName: Data administration
413422
href: how-to-administrate-data-authentication.md

0 commit comments

Comments
 (0)