You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The Azure Machine Learning CLI provides the ability to attach and manage a Synapse Spark pool from the command line interface, using intuitive YAML syntax and commands.
69
+
With the Azure Machine Learning CLI, we can attach and manage a Synapse Spark pool from the command line interface, using intuitive YAML syntax and commands.
70
70
71
71
To define an attached Synapse Spark pool using YAML syntax, the YAML file should cover these properties:
72
72
@@ -129,7 +129,7 @@ The YAML files above can be used in the `az ml compute attach` command as the `-
129
129
az ml compute attach --file <YAML_SPECIFICATION_FILE_NAME>.yaml --subscription <SUBSCRIPTION_ID> --resource-group <RESOURCE_GROUP> --workspace-name <AML_WORKSPACE_NAME>
130
130
```
131
131
132
-
This shows the expected output of the above command:
132
+
This sample shows the expected output of the above command:
133
133
134
134
```azurecli
135
135
Class SynapseSparkCompute: This is an experimental class, and may change at any time. Please see https://aka.ms/azuremlexperimental for more information.
@@ -174,7 +174,7 @@ To display details of an attached Synapse Spark pool, execute the `az ml compute
174
174
az ml compute show --name <ATTACHED_SPARK_POOL_NAME> --subscription <SUBSCRIPTION_ID> --resource-group <RESOURCE_GROUP> --workspace-name <AML_WORKSPACE_NAME>
175
175
```
176
176
177
-
This shows the expected output of the above command:
177
+
This sample shows the expected output of the above command:
178
178
179
179
```azurecli
180
180
<ATTACHED_SPARK_POOL_NAME>
@@ -209,7 +209,7 @@ To see a list of all computes, including the attached Synapse Spark pools in a w
209
209
az ml compute list --subscription <SUBSCRIPTION_ID> --resource-group <RESOURCE_GROUP> --workspace-name <AML_WORKSPACE_NAME>
210
210
```
211
211
212
-
This shows the expected output of the above command:
212
+
This sample shows the expected output of the above command:
213
213
214
214
```azurecli
215
215
[
@@ -417,7 +417,7 @@ Execute the `az ml compute update` command, with appropriate parameters, to upda
417
417
az ml compute update --identity SystemAssigned --subscription <SUBSCRIPTION_ID> --resource-group <RESOURCE_GROUP> --workspace-name <AML_WORKSPACE_NAME> --name <ATTACHED_SPARK_POOL_NAME>
418
418
```
419
419
420
-
This shows the expected output of the above command:
420
+
This sample shows the expected output of the above command:
421
421
422
422
```azurecli
423
423
Class SynapseSparkCompute: This is an experimental class, and may change at any time. Please see https://aka.ms/azuremlexperimental for more information.
@@ -460,7 +460,7 @@ az ml compute update --identity UserAssigned --user-assigned-identities /subscri
460
460
461
461
```
462
462
463
-
This shows the expected output of the above command:
463
+
This sample shows the expected output of the above command:
464
464
465
465
```azurecli
466
466
Class SynapseSparkCompute: This is an experimental class, and may change at any time. Please see https://aka.ms/azuremlexperimental for more information.
@@ -582,7 +582,7 @@ We might want to detach an attached Synapse Spark pool, to clean up a workspace.
582
582
583
583
# [Studio UI](#tab/studio-ui)
584
584
585
-
The Azure Machine Learning studio UI also provides a way to detach an attached Synapse Spark pool. To do this:
585
+
The Azure Machine Learning studio UI also provides a way to detach an attached Synapse Spark pool. Follow these steps to do this:
586
586
587
587
1. Open the **Details** page for the Synapse Spark pool, in the Azure Machine Learning studio.
588
588
@@ -592,15 +592,15 @@ The Azure Machine Learning studio UI also provides a way to detach an attached S
An attached Synapse Spark pool can be detached by executing the `az ml compute detach` command with name of the pool passed using `--name` parameter as following:
595
+
An attached Synapse Spark pool can be detached by executing the `az ml compute detach` command with name of the pool passed using `--name` parameter as shown here:
596
596
597
597
```azurecli
598
598
599
599
az ml compute detach --name <ATTACHED_SPARK_POOL_NAME> --subscription <SUBSCRIPTION_ID> --resource-group <RESOURCE_GROUP> --workspace-name <AML_WORKSPACE_NAME>
600
600
601
601
```
602
602
603
-
This shows the expected output of the above command:
603
+
This sample shows the expected output of the above command:
604
604
605
605
```azurecli
606
606
Are you sure you want to perform this operation? (y/n): y
@@ -611,7 +611,7 @@ Are you sure you want to perform this operation? (y/n): y
An `MLClient.compute.begin_delete()` function call will do this for us. Pass the `name` of the attached Synapse Spark pool, along with the action `Detach`, to the function. This code snippet detaches a Synapse Spark pool from an Azure Machine Learning workspace:
614
+
We will use an `MLClient.compute.begin_delete()` function call. Pass the `name` of the attached Synapse Spark pool, along with the action `Detach`, to the function. This code snippet detaches a Synapse Spark pool from an Azure Machine Learning workspace:
615
615
616
616
```python
617
617
# import required libraries
@@ -640,6 +640,6 @@ Some user scenarios may require access to a Synapse Spark Pool, during an Azure
640
640
641
641
## Next steps
642
642
643
-
- [Interactive Data Wrangling with Apache Spark in Azure Machine Learning (preview)](/interactive-data-wrangling-with-apache-spark-azure-ml.md)
643
+
- [Interactive Data Wrangling with Apache Spark in Azure Machine Learning (preview)](./interactive-data-wrangling-with-apache-spark-azure-ml.md)
644
644
645
-
- [Submit Spark jobs in Azure Machine Learning (preview)](/how-to-submit-spark-jobs.md)
645
+
- [Submit Spark jobs in Azure Machine Learning (preview)](./how-to-submit-spark-jobs.md)
Azure Machine Learning provides the ability to submit standalone machine learning jobs or creating a [machine learning pipeline](/concept-ml-pipelines.md) comprising multiple steps in a machine learning workflow. Azure Machine Learning supports creation of a standalone Spark job, and creation of a reusable Spark component that can be used in Azure Machine Learning pipelines. In this article you will learn how to submit Spark jobs using:
19
+
Azure Machine Learning provides the ability to submit standalone machine learning jobs or creating a [machine learning pipeline](./concept-ml-pipelines.md) comprising multiple steps in a machine learning workflow. Azure Machine Learning supports creation of a standalone Spark job, and creation of a reusable Spark component that can be used in Azure Machine Learning pipelines. In this article you will learn how to submit Spark jobs using:
20
20
- Azure Machine Learning studio UI
21
21
- Azure Machine Learning CLI
22
22
- Azure Machine Learning SDK
23
23
24
24
## Prerequisites
25
25
- An Azure subscription; if you don't have an Azure subscription, [create a free account](https://azure.microsoft.com/free) before you begin
26
26
- An Azure Machine Learning workspace. See [Create workspace resources](./quickstart-create-resources.md)
27
-
-[An attached Synapse Spark pool in the Azure Machine Learning workspace](/how-to-manage-synapse-spark-pool.md).
27
+
-[An attached Synapse Spark pool in the Azure Machine Learning workspace](./how-to-manage-synapse-spark-pool.md).
28
28
-[Configure your development environment](./how-to-configure-environment.md), or [create an Azure Machine Learning compute instance](./concept-compute-instance.md#create)
29
29
-[Install the Azure Machine Learning SDK for Python](/python/api/overview/azure/ml/installv2)
Spark jobs can use either user identity passthrough or a managed identity to access data and other resource. Different mechanisms for accessing resources while using attached Synapse Spark pool and Managed (Automatic) Spark compute are summarized in the following table.
33
+
Spark jobs can use either user identity passthrough or a managed identity to access data and other resource. Different mechanisms for accessing resources while using attached Synapse Spark pool and Managed (Automatic) Spark compute are summarized in the following table.
> To ensure successful execution of spark job, the identity being used for the Spark job should be assigned **Contributor** and **Storage Blob Data Contributor** roles on the Azure storage account used for data input and output.
67
67
68
68
## Submit a standalone Spark job
69
-
Once a Python script is developed by [interactive data wrangling](/interactive-data-wrangling-with-apache-spark-azure-ml.md), it can be used for submitting a batch job to process a larger volume of data after making necessary changes for parameterization of the Python script. A simple data wrangling batch job can be submitted as a standalone Spark job.
69
+
Once a Python script is developed by [interactive data wrangling](./interactive-data-wrangling-with-apache-spark-azure-ml.md), it can be used for submitting a batch job to process a larger volume of data after making necessary changes for parameterization of the Python script. A simple data wrangling batch job can be submitted as a standalone Spark job.
70
70
71
-
A Spark job requires a Python script that takes arguments, which can be developed by modifying the Python code developed from [interactive data wrangling](/interactive-data-wrangling-with-apache-spark-azure-ml.md). A sample Python script is shown here.
71
+
A Spark job requires a Python script that takes arguments, which can be developed by modifying the Python code developed from [interactive data wrangling](./interactive-data-wrangling-with-apache-spark-azure-ml.md). A sample Python script is shown here.
72
72
73
73
```python
74
74
@@ -126,7 +126,7 @@ A standalone Spark job can be defined as a YAML specification file, which can be
126
126
-`spark.dynamicAllocation.maxExecutors` - the maximum number of Spark executors instances, for dynamic allocation.
127
127
- If dynamic allocation of executors is disabled, define this property:
128
128
-`spark.executor.instances` - the number of Spark executor instances.
129
-
-`environment` - an [Azure Machine Learning environment](/reference-yaml-environment) to run the job.
129
+
-`environment` - an [Azure Machine Learning environment](./reference-yaml-environment.md) to run the job.
130
130
-`args` - the command line arguments that should be passed to the job entry point Python script or class. See the YAML specification file provided below for an example.
131
131
-`compute` - this property defines the name of an attached Synapse Spark pool, as shown in this example:
132
132
```yaml
@@ -429,7 +429,7 @@ To submit a standalone Spark job using the Azure Machine Learning studio UI:
429
429
1. Select **Create** to submit the standalone Spark job.
430
430
431
431
## Spark component in a pipeline job
432
-
A Spark component allows the flexibility to use the same component in multiple [Azure Machine Learning pipelines](/concept-ml-pipelines) as a pipeline step.
432
+
A Spark component allows the flexibility to use the same component in multiple [Azure Machine Learning pipelines](./concept-ml-pipelines.md) as a pipeline step.
433
433
434
434
# [Azure CLI](#tab/cli)
435
435
@@ -501,7 +501,7 @@ conf:
501
501
502
502
```
503
503
504
-
The Spark component defined in the above YAML specification file can be used in an Azure Machine Learning pipeline job. See [pipeline job YAML schema](/reference-yaml-job-pipeline.md) to learn more about the YAML syntax that defines a pipeline job. This is an example YAML specification file for a pipeline job, with a Spark component:
504
+
The Spark component defined in the above YAML specification file can be used in an Azure Machine Learning pipeline job. See [pipeline job YAML schema](./reference-yaml-job-pipeline.md) to learn more about the YAML syntax that defines a pipeline job. This is an example YAML specification file for a pipeline job, with a Spark component:
Copy file name to clipboardExpand all lines: articles/machine-learning/interactive-data-wrangling-with-apache-spark-azure-ml.md
+2-2Lines changed: 2 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -389,5 +389,5 @@ df.head()
389
389
390
390
- [Code samples for interactive data wrangling with Apache Spark in Azure Machine Learning](https://github.com/Azure/azureml-examples/tree/main/sdk/python/data-wrangling)
391
391
- [Optimize Apache Spark jobs in Azure Synapse Analytics](../synapse-analytics/spark/apache-spark-performance.md)
392
-
- [What are Azure Machine Learning pipelines?](/concept-ml-pipelines.md)
393
-
- [Submit Spark jobs in Azure Machine Learning (preview)](/how-to-submit-spark-jobs.md)
392
+
- [What are Azure Machine Learning pipelines?](./concept-ml-pipelines.md)
393
+
- [Submit Spark jobs in Azure Machine Learning (preview)](./how-to-submit-spark-jobs.md)
0 commit comments