Skip to content

Commit 1ee0f42

Browse files
committed
Move content between files and add a relevant URL.
1 parent 29daa95 commit 1ee0f42

File tree

2 files changed

+26
-23
lines changed

2 files changed

+26
-23
lines changed

articles/machine-learning/apache-spark-environment-configuration.md

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -111,6 +111,16 @@ Once the user identity has the appropriate roles assigned, data in the Azure sto
111111
> [!NOTE]
112112
> If an [attached Synapse Spark pool](./how-to-manage-synapse-spark-pool.md) points to a Synapse Spark pool in an Azure Synapse workspace that has a managed virtual network associated with it, [a managed private endpoint to storage account should be configured](../synapse-analytics/security/connect-to-a-secure-storage-account.md) to ensure data access.
113113
114+
## Ensuring resource access for Spark jobs
115+
Spark jobs can use either a managed identity or user identity passthrough to access data and other resources. The following table summarizes the different mechanisms for resource access while using Azure Machine Learning Managed (Automatic) Spark compute and attached Synapse Spark pool.
116+
117+
|Spark pool|Supported identities|Default identity|
118+
| ---------- | -------------------- | ---------------- |
119+
|Managed (Automatic) Spark compute|User identity and managed identity|User identity|
120+
|Attached Synapse Spark pool|User identity and managed identity|Managed identity - compute identity of the attached Synapse Spark pool|
121+
122+
If the CLI or SDK code defines an option to use managed identity, Azure Machine Learning Managed (Automatic) Spark compute relies on a user-assigned managed identity attached to the workspace. You can attach a user-assigned managed identity to an existing Azure Machine Learning workspace using Azure Machine Learning CLI v2, or with `ARMClient`.
123+
114124
## Next steps
115125
- [Apache Spark in Azure Machine Learning (preview)](./apache-spark-azure-ml-concepts.md)
116126
- [Attach and manage a Synapse Spark pool in Azure Machine Learning (preview)](./how-to-manage-synapse-spark-pool.md)

articles/machine-learning/how-to-submit-spark-jobs.md

Lines changed: 16 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -8,20 +8,20 @@ ms.reviewer: franksolomon
88
ms.service: machine-learning
99
ms.subservice: mldata
1010
ms.topic: how-to
11-
ms.date: 01/10/2023
11+
ms.date: 03/08/2023
1212
ms.custom: template-how-to
1313
---
1414

1515
# Submit Spark jobs in Azure Machine Learning (preview)
1616

1717
[!INCLUDE [preview disclaimer](../../includes/machine-learning-preview-generic-disclaimer.md)]
1818

19-
Azure Machine Learning supports submission of standalone machine learning jobs, and creation of [machine learning pipelines](./concept-ml-pipelines.md), that involve multiple machine learning workflow steps. Azure Machine Learning handles both standalone Spark job creation, and creation of reusable Spark components that Azure Machine Learning pipelines can use. In this article, you'll learn how to submit Spark jobs using:
20-
- Azure Machine Learning Studio UI
19+
Azure Machine Learning supports submission of standalone machine learning jobs and creation of [machine learning pipelines](./concept-ml-pipelines.md) that involve multiple machine learning workflow steps. Azure Machine Learning handles both standalone Spark job creation, and creation of reusable Spark components that Azure Machine Learning pipelines can use. In this article, you'll learn how to submit Spark jobs using:
20+
- Azure Machine Learning studio UI
2121
- Azure Machine Learning CLI
2222
- Azure Machine Learning SDK
2323

24-
See [this resource](./apache-spark-azure-ml-concepts.md) for more information about **Apache Spark in Azure Machine Learning** concepts.
24+
For more information about **Apache Spark in Azure Machine Learning** concepts, see [this resource](./apache-spark-azure-ml-concepts.md).
2525

2626
## Prerequisites
2727

@@ -42,27 +42,20 @@ See [this resource](./apache-spark-azure-ml-concepts.md) for more information ab
4242
- [(Optional): An attached Synapse Spark pool in the Azure Machine Learning workspace](./how-to-manage-synapse-spark-pool.md).
4343

4444
# [Studio UI](#tab/ui)
45-
These prerequisites cover the submission of a Spark job from Azure Machine Learning Studio UI:
45+
These prerequisites cover the submission of a Spark job from Azure Machine Learning studio UI:
4646
- An Azure subscription; if you don't have an Azure subscription, [create a free account](https://azure.microsoft.com/free) before you begin.
4747
- An Azure Machine Learning workspace. See [Create workspace resources](./quickstart-create-resources.md).
4848
- To enable this feature:
49-
1. Navigate to Azure Machine Learning Studio UI.
49+
1. Navigate to Azure Machine Learning studio UI.
5050
2. Select **Manage preview features** (megaphone icon) from the icons on the top right side of the screen.
5151
3. In **Managed preview feature** panel, toggle on **Run notebooks and jobs on managed Spark** feature.
5252
:::image type="content" source="media/interactive-data-wrangling-with-apache-spark-azure-ml/how_to_enable_managed_spark_preview.png" alt-text="Screenshot showing option for enabling Managed Spark preview.":::
5353
- [(Optional): An attached Synapse Spark pool in the Azure Machine Learning workspace](./how-to-manage-synapse-spark-pool.md).
5454

5555
---
5656

57-
## Ensuring resource access for Spark jobs
58-
Spark jobs can use either user identity passthrough, or a managed identity, to access data and other resources. The following table summarizes the different mechanisms for resource access while using Azure Machine Learning Managed (Automatic) Spark compute and attached Synapse Spark pool.
59-
60-
|Spark pool|Supported identities|Default identity|
61-
| ---------- | -------------------- | ---------------- |
62-
|Managed (Automatic) Spark compute|User identity and managed identity|User identity|
63-
|Attached Synapse Spark pool|User identity and managed identity|Managed identity - compute identity of the attached Synapse Spark pool|
64-
65-
If the CLI or SDK code defines an option to use managed identity, Azure Machine Learning Managed (Automatic) Spark compute uses user-assigned managed identity attached to the workspace. You can attach a user-assigned managed identity to an existing Azure Machine Learning workspace using Azure Machine Learning CLI v2, or with `ARMClient`.
57+
> [!NOTE]
58+
> To learn more about resource access while using Azure Machine Learning Managed (Automatic) Spark compute, and attached Synapse Spark pool, see [Ensuring resource access for Spark jobs](apache-spark-environment-configuration.md#ensuring-resource-access-for-spark-jobs).
6659
6760
### Attach user assigned managed identity using CLI v2
6861
1. Create a YAML file that defines the user-assigned managed identity that should be attached to the workspace:
@@ -222,7 +215,7 @@ To create a job, a standalone Spark job can be defined as a YAML specification f
222215
path: azureml://datastores/workspaceblobstore/paths/data/wrangled/
223216
mode: direct
224217
```
225-
- `identity` - this optional property defines the identity used to submit this job. It can have `user_identity` and `managed` values. If no identity is defined in the YAML specification, the default identity will be used.
218+
- `identity` - this optional property defines the identity used to submit this job. It can have `user_identity` and `managed` values. If no identity is defined in the YAML specification, the Spark job will use the default identity.
226219

227220
### Standalone Spark job
228221
This example YAML specification shows a standalone Spark job. It uses an Azure Machine Learning Managed (Automatic) Spark compute:
@@ -304,7 +297,7 @@ To create a standalone Spark job, use the `azure.ai.ml.spark` function, with the
304297
- `dynamic_allocation_max_executors` - the maximum number of Spark executors instances for dynamic allocation.
305298
- If dynamic allocation of executors is disabled, then define these parameters:
306299
- `executor_instances` - the number of Spark executor instances.
307-
- `environment` - the Azure Machine Learning environment that will run the job. This parameter should pass:
300+
- `environment` - the Azure Machine Learning environment that runs the job. This parameter should pass:
308301
- an object of `azure.ai.ml.entities.Environment`, or an Azure Machine Learning environment name (string).
309302
- `args` - the command line arguments that should be passed to the job entry point Python script or class. See the sample code provided here for an example.
310303
- `resources` - the resources to be used by an Azure Machine Learning Managed (Automatic) Spark compute. This parameter should pass a dictionary with:
@@ -336,7 +329,7 @@ To create a standalone Spark job, use the `azure.ai.ml.spark` function, with the
336329
- `azure.ai.ml.entities.UserIdentityConfiguration`
337330
or
338331
- `azure.ai.ml.entities.ManagedIdentityConfiguration`
339-
for user identity and managed identity respectively. If no identity is defined, the default identity will be used.
332+
for user identity and managed identity respectively. If no identity is defined, the Spark job will use the default identity.
340333

341334
You can submit a standalone Spark job from:
342335
- an Azure Machine Learning Notebook connected to an Azure Machine Learning compute instance.
@@ -399,16 +392,16 @@ ml_client.jobs.stream(returned_spark_job.name)
399392

400393
# [Studio UI](#tab/ui)
401394

402-
### Submit a standalone Spark job from Azure Machine Learning Studio UI
403-
To submit a standalone Spark job using the Azure Machine Learning Studio UI:
395+
### Submit a standalone Spark job from Azure Machine Learning studio UI
396+
To submit a standalone Spark job using the Azure Machine Learning studio UI:
404397

405-
:::image type="content" source="media/how-to-submit-spark-jobs/create_standalone_spark_job.png" alt-text="Screenshot showing creation of a new Spark job in Azure Machine Learning Studio UI.":::
398+
:::image type="content" source="media/how-to-submit-spark-jobs/create_standalone_spark_job.png" alt-text="Screenshot showing creation of a new Spark job in Azure Machine Learning studio UI.":::
406399

407400
- In the left pane, select **+ New**.
408401
- Select **Spark job (preview)**.
409402
- On the **Compute** screen:
410403

411-
:::image type="content" source="media/how-to-submit-spark-jobs/create_standalone_spark_job_compute.png" alt-text="Screenshot showing compute selection screen for a new Spark job in Azure Machine Learning Studio UI.":::
404+
:::image type="content" source="media/how-to-submit-spark-jobs/create_standalone_spark_job_compute.png" alt-text="Screenshot showing compute selection screen for a new Spark job in Azure Machine Learning studio UI.":::
412405

413406
1. Under **Select compute type**, select **Spark automatic compute (Preview)** for Managed (Automatic) Spark compute, or **Attached compute** for an attached Synapse Spark pool.
414407
1. If you selected **Spark automatic compute (Preview)**:
@@ -606,7 +599,7 @@ You can execute the above command from:
606599
To create an Azure Machine Learning pipeline with a Spark component, you should have familiarity with creation of [Azure Machine Learning pipelines from components, using Python SDK](./tutorial-pipeline-python-sdk.md#create-the-pipeline-from-components). A Spark component is created using `azure.ai.ml.spark` function. The function parameters are defined almost the same way as for the [standalone Spark job](#standalone-spark-job-using-python-sdk). These parameters are defined differently for the Spark component:
607600

608601
- `name` - the name of the Spark component.
609-
- `display_name` - the name of the Spark component that will display in the UI and elsewhere.
602+
- `display_name` - the name of the Spark component displayed in the UI and elsewhere.
610603
- `inputs` - this parameter is similar to `inputs` parameter described for the [standalone Spark job](#standalone-spark-job-using-python-sdk), except that the `azure.ai.ml.Input` class is instantiated without the `path` parameter.
611604
- `outputs` - this parameter is similar to `outputs` parameter described for the [standalone Spark job](#standalone-spark-job-using-python-sdk), except that the `azure.ai.ml.Output` class is instantiated without the `path` parameter.
612605

0 commit comments

Comments
 (0)