You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/machine-learning/apache-spark-environment-configuration.md
+2Lines changed: 2 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -112,6 +112,7 @@ Once the user identity has the appropriate roles assigned, data in the Azure sto
112
112
> If an [attached Synapse Spark pool](./how-to-manage-synapse-spark-pool.md) points to a Synapse Spark pool in an Azure Synapse workspace that has a managed virtual network associated with it, [a managed private endpoint to storage account should be configured](../synapse-analytics/security/connect-to-a-secure-storage-account.md) to ensure data access.
113
113
114
114
## Ensuring resource access for Spark jobs
115
+
115
116
Spark jobs can use either a managed identity or user identity passthrough to access data and other resources. The following table summarizes the different mechanisms for resource access while using Azure Machine Learning Managed (Automatic) Spark compute and attached Synapse Spark pool.
@@ -122,6 +123,7 @@ Spark jobs can use either a managed identity or user identity passthrough to acc
122
123
If the CLI or SDK code defines an option to use managed identity, Azure Machine Learning Managed (Automatic) Spark compute relies on a user-assigned managed identity attached to the workspace. You can attach a user-assigned managed identity to an existing Azure Machine Learning workspace using Azure Machine Learning CLI v2, or with `ARMClient`.
123
124
124
125
## Next steps
126
+
125
127
-[Apache Spark in Azure Machine Learning (preview)](./apache-spark-azure-ml-concepts.md)
126
128
-[Attach and manage a Synapse Spark pool in Azure Machine Learning (preview)](./how-to-manage-synapse-spark-pool.md)
127
129
-[Interactive Data Wrangling with Apache Spark in Azure Machine Learning (preview)](./interactive-data-wrangling-with-apache-spark-azure-ml.md)
Copy file name to clipboardExpand all lines: articles/machine-learning/how-to-submit-spark-jobs.md
+9-2Lines changed: 9 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -58,6 +58,7 @@ These prerequisites cover the submission of a Spark job from Azure Machine Learn
58
58
> To learn more about resource access while using Azure Machine Learning Managed (Automatic) Spark compute, and attached Synapse Spark pool, see [Ensuring resource access for Spark jobs](apache-spark-environment-configuration.md#ensuring-resource-access-for-spark-jobs).
59
59
60
60
### Attach user assigned managed identity using CLI v2
61
+
61
62
1. Create a YAML file that defines the user-assigned managed identity that should be attached to the workspace:
62
63
```yaml
63
64
identity:
@@ -73,6 +74,7 @@ These prerequisites cover the submission of a Spark job from Azure Machine Learn
73
74
```
74
75
75
76
### Attach user assigned managed identity using `ARMClient`
77
+
76
78
1. Install [ARMClient](https://github.com/projectkudu/ARMClient), a simple command line tool that invokes the Azure Resource Manager API.
77
79
1. Create a JSON file that defines the user-assigned managed identity that should be attached to the workspace:
78
80
```json
@@ -139,6 +141,7 @@ The above script takes two arguments `--titanic_data` and `--wrangled_data`, whi
139
141
To create a job, a standalone Spark job can be defined as a YAML specification file, which can be used in the `az ml job create` command, with the `--file` parameter. Define these properties in the YAML file as follows:
140
142
141
143
### YAML properties in the Spark job specification
144
+
142
145
- `type`- set to `spark`.
143
146
- `code`- defines the location of the folder that contains source code and scripts for this job.
144
147
- `entry` - defines the entry point for the job. It should cover one of these properties:
@@ -216,8 +219,9 @@ To create a job, a standalone Spark job can be defined as a YAML specification f
216
219
mode: direct
217
220
```
218
221
- `identity`- this optional property defines the identity used to submit this job. It can have `user_identity` and `managed` values. If no identity is defined in the YAML specification, the Spark job will use the default identity.
219
-
222
+
220
223
### Standalone Spark job
224
+
221
225
This example YAML specification shows a standalone Spark job. It uses an Azure Machine Learning Managed (Automatic) Spark compute:
### Submit a standalone Spark job from Azure Machine Learning studio UI
400
+
396
401
To submit a standalone Spark job using the Azure Machine Learning studio UI:
397
402
398
403
:::image type="content" source="media/how-to-submit-spark-jobs/create_standalone_spark_job.png" alt-text="Screenshot showing creation of a new Spark job in Azure Machine Learning studio UI.":::
@@ -479,6 +484,7 @@ To submit a standalone Spark job using the Azure Machine Learning studio UI:
479
484
---
480
485
481
486
## Spark component in a pipeline job
487
+
482
488
A Spark component offers the flexibility to use the same component in multiple [Azure Machine Learning pipelines](./concept-ml-pipelines.md), as a pipeline step.
483
489
484
490
# [Azure CLI](#tab/cli)
@@ -688,5 +694,6 @@ This functionality isn't available in the Studio UI. The Studio UI doesn't suppo
688
694
---
689
695
690
696
## Next steps
697
+
691
698
- [Code samples for Spark jobs using Azure Machine Learning CLI](https://github.com/Azure/azureml-examples/tree/main/cli/jobs/spark)
692
-
- [Code samples for Spark jobs using Azure Machine Learning Python SDK](https://github.com/Azure/azureml-examples/tree/main/sdk/python/jobs/spark)
699
+
- [Code samples for Spark jobs using Azure Machine Learning Python SDK](https://github.com/Azure/azureml-examples/tree/main/sdk/python/jobs/spark)
0 commit comments