You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/machine-learning/how-to-manage-synapse-spark-pool.md
+36-35Lines changed: 36 additions & 35 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,14 +1,14 @@
1
1
---
2
2
title: Attach and manage a Synapse Spark pool in Azure Machine Learning
3
3
titleSuffix: Azure Machine Learning
4
-
description: Learn how to attach and manage Spark pools with Azure Synapse
4
+
description: Learn how to attach and manage Spark pools with Azure Synapse.
5
5
author: ynpandey
6
6
ms.author: yogipandey
7
7
ms.reviewer: franksolomon
8
8
ms.service: machine-learning
9
9
ms.subservice: mldata
10
10
ms.topic: how-to
11
-
ms.date: 05/22/2023
11
+
ms.date: 04/12/2024
12
12
ms.custom: template-how-to, devx-track-azurecli
13
13
---
14
14
@@ -50,22 +50,22 @@ In this article, you'll learn how to attach a [Synapse Spark Pool](../synapse-an
50
50
---
51
51
52
52
## Attach a Synapse Spark pool in Azure Machine Learning
53
-
Azure Machine Learning provides multiple options for attaching and managing a Synapse Spark pool.
53
+
Azure Machine Learning offers different ways to attach and manage a Synapse Spark pool.
54
54
55
55
# [Studio UI](#tab/studio-ui)
56
56
57
-
To attach a Synapse Spark Pool using the Studio Compute tab:
57
+
To attach a Synapse Spark Pool with the Studio Compute tab:
58
58
59
-
:::image type="content" source="media/how-to-manage-synapse-spark-pool/synapse_compute_synapse_spark_pool.png" alt-text="Screenshot showing creation of a new Synapse Spark Pool.":::
59
+
:::image type="content" source="media/how-to-manage-synapse-spark-pool/synapse_compute_synapse_spark_pool.png" alt-text="Screenshot showing creation of a new Synapse Spark Pool." lightbox= "media/how-to-manage-synapse-spark-pool/synapse_compute_synapse_spark_pool.png":::
60
60
61
61
1. In the **Manage** section of the left pane, select **Compute**.
62
62
1. Select **Attached computes**.
63
63
1. On the **Attached computes** screen, select **New**, to see the options for attaching different types of computes.
64
-
2. Select **Synapse Spark pool**.
64
+
1. Select **Synapse Spark pool**.
65
65
66
-
The **Attach Synapse Spark pool** panel will open on the right side of the screen. In this panel:
66
+
The **Attach Synapse Spark pool** panel opens on the right side of the screen. In this panel:
67
67
68
-
1. Enter a **Name**, which refers to the attached Synapse Spark Pool inside the Azure Machine Learning.
68
+
1. Enter a **Name**, which refers to the attached Synapse Spark Pool inside the Azure Machine Learning resource.
69
69
70
70
2. Select an Azure **Subscription** from the dropdown menu.
71
71
@@ -83,19 +83,19 @@ The **Attach Synapse Spark pool** panel will open on the right side of the scree
With the Azure Machine Learning CLI, we can attach and manage a Synapse Spark pool from the command line interface, using intuitive YAML syntax and commands.
86
+
With the Azure Machine Learning CLI, we can use intuitive YAML syntax and commands from the command line interface, to attach and manage a Synapse Spark pool.
87
87
88
-
To define an attached Synapse Spark pool using YAML syntax, the YAML file should cover these properties:
88
+
To define an attached Synapse Spark pool using YAML syntax, the YAML file should cover these properties:
89
89
90
90
-`name` – name of the attached Synapse Spark pool.
91
91
92
92
-`type` – set this property to `synapsespark`.
93
93
94
94
-`resource_id` – this property should provide the resource ID value of the Synapse Spark pool created in the Azure Synapse Analytics workspace. The Azure resource ID includes
95
95
96
-
- Azure Subscription ID,
96
+
- Azure Subscription ID,
97
97
98
-
- resource Group Name,
98
+
- resource Group Name,
99
99
100
100
- Azure Synapse Analytics Workspace Name, and
101
101
@@ -125,7 +125,7 @@ To define an attached Synapse Spark pool using YAML syntax, the YAML file should
125
125
type: system_assigned
126
126
```
127
127
128
-
- For the `identity` type `user_assigned`, you should also provide a list of `user_assigned_identities` values. Each user-assigned identity should be declared as an element of the list, by using the `resource_id` value of the user-assigned identity. The first user-assigned identity in the list will be used for submitting a job by default.
128
+
- For the `identity` type `user_assigned`, you should also provide a list of `user_assigned_identities` values. Each user-assigned identity should be declared as an element of the list, by using the `resource_id` value of the user-assigned identity. The first user-assigned identity in the list is used to submit a job by default.
129
129
130
130
```YAML
131
131
name: <ATTACHED_SPARK_POOL_NAME>
@@ -149,7 +149,7 @@ az ml compute attach --file <YAML_SPECIFICATION_FILE_NAME>.yaml --subscription <
149
149
This sample shows the expected output of the above command:
150
150
151
151
```azurecli
152
-
Class SynapseSparkCompute: This is an experimental class, and may change at any time. Please see https://aka.ms/azuremlexperimental for more information.
152
+
Class SynapseSparkCompute: This is an experimental class, and may change at any time. Please visit https://aka.ms/azuremlexperimental for more information.
153
153
154
154
{
155
155
"auto_pause_settings": {
@@ -184,7 +184,7 @@ If the attached Synapse Spark pool, with the name specified in the YAML specific
184
184
185
185
values through YAML specification file.
186
186
187
-
To display details of an attached Synapse Spark pool, execute the `az ml compute show` command. Pass the name of the attached Synapse Spark pool with the `--name` parameter, as shown:
187
+
To display details of an attached Synapse Spark pool, execute the `az ml compute show` command. Pass the name of the attached Synapse Spark pool with the `--name` parameter, as shown:
188
188
189
189
```azurecli
190
190
az ml compute show --name <ATTACHED_SPARK_POOL_NAME> --subscription <SUBSCRIPTION_ID> --resource-group <RESOURCE_GROUP> --workspace-name <AML_WORKSPACE_NAME>
@@ -219,7 +219,7 @@ This sample shows the expected output of the above command:
219
219
}
220
220
```
221
221
222
-
To see a list of all computes, including the attached Synapse Spark pools in a workspace, use the `az ml compute list` command. Use the name parameter to pass the name of the workspace, as shown:
222
+
To see a list of all computes, including the attached Synapse Spark pools in a workspace, use the `az ml compute list` command. Use the name parameter to pass the name of the workspace, as shown:
223
223
224
224
```azurecli
225
225
az ml compute list --subscription <SUBSCRIPTION_ID> --resource-group <RESOURCE_GROUP> --workspace-name <AML_WORKSPACE_NAME>
@@ -267,9 +267,9 @@ This sample shows the expected output of the above command:
267
267
268
268
Azure Machine Learning Python SDK provides convenient functions for attaching and managing Synapse Spark pool, using Python code in Azure Machine Learning Notebooks.
269
269
270
-
To attach a Synapse Compute using Python SDK, first create an instance of [azure.ai.ml.MLClient class](/python/api/azure-ai-ml/azure.ai.ml.mlclient). This provides convenient functions for interaction with Azure Machine Learning services. The following code sample uses `azure.identity.DefaultAzureCredential` for connecting to a workspace in resource group of a specified Azure subscription. In the following code sample, define the `SynapseSparkCompute` with the parameters:
271
-
- `name`- user-defined name of the new attached Synapse Spark pool.
272
-
- `resource_id`- resource ID of the Synapse Spark pool created earlier in the Azure Synapse Analytics workspace.
270
+
To attach a Synapse Compute using Python SDK, first create an instance of [azure.ai.ml.MLClient class](/python/api/azure-ai-ml/azure.ai.ml.mlclient). This provides convenient functions for interaction with Azure Machine Learning services. The following code sample uses `azure.identity.DefaultAzureCredential` to connect to a workspace in the resource group of a specified Azure subscription. In the following code sample, define the `SynapseSparkCompute` with these parameters:
271
+
- `name`- user-defined name of the new attached Synapse Spark pool.
272
+
- `resource_id`- resource ID of the Synapse Spark pool created earlier in the Azure Synapse Analytics workspace
273
273
274
274
An [azure.ai.ml.MLClient.begin_create_or_update()](/python/api/azure-ai-ml/azure.ai.ml.mlclient#azure-ai-ml-mlclient-begin-create-or-update) function call attaches the defined Synapse Spark pool to the Azure Machine Learning workspace.
To attach a Synapse Spark pool that uses system-assigned identity, pass [IdentityConfiguration](/python/api/azure-ai-ml/azure.ai.ml.entities.identityconfiguration), with type set to `SystemAssigned`, as the `identity` parameter of the `SynapseSparkCompute` class. This code snippet attaches a Synapse Spark pool that uses system-assigned identity.
296
+
To attach a Synapse Spark pool that uses system-assigned identity, pass [IdentityConfiguration](/python/api/azure-ai-ml/azure.ai.ml.entities.identityconfiguration), with type set to `SystemAssigned`, as the `identity` parameter of the `SynapseSparkCompute` class. This code snippet attaches a Synapse Spark pool that uses system-assigned identity:
A Synapse Spark pool can also use a user-assigned identity. For a user-assigned identity, you can pass a managed identity definition, using the [IdentityConfiguration](/python/api/azure-ai-ml/azure.ai.ml.entities.identityconfiguration) class, as the `identity` parameter of the `SynapseSparkCompute` class. For the managed identity definition used in this way, set the `type` to `UserAssigned`. In addition, pass a `user_assigned_identities` parameter. The parameter `user_assigned_identities` is a list of objects of the UserAssignedIdentity class. The `resource_id`of the user-assigned identity populates each `UserAssignedIdentity` class object. This code snippet attaches a Synapse Spark pool that uses a user-assigned identity:
322
+
A Synapse Spark pool can also use a user-assigned identity. For a user-assigned identity, you can pass a managed identity definition, using the [IdentityConfiguration](/python/api/azure-ai-ml/azure.ai.ml.entities.identityconfiguration) class, as the `identity` parameter of the `SynapseSparkCompute` class. For the managed identity definition used in this way, set the `type` to `UserAssigned`. In addition, pass a `user_assigned_identities` parameter. The parameter `user_assigned_identities` is a list of objects of the UserAssignedIdentity class. The `resource_id`of the user-assigned identity populates each `UserAssignedIdentity` class object. This code snippet attaches a Synapse Spark pool that uses a user-assigned identity:
> The `azure.ai.ml.MLClient.begin_create_or_update()` function attaches a new Synapse Spark pool, if a pool with the specified name does not already exist in the workspace. However, if a Synapse Spark pool with that specified name is already attached to the workspace, a call to the `azure.ai.ml.MLClient.begin_create_or_update()` function will update the existing attached pool with the new identity or identities.
361
361
362
362
---
363
363
364
364
## Add role assignments in Azure Synapse Analytics
365
365
366
-
To ensure that the attached Synapse Spark Pool works properly, assign the [Administrator Role](../synapse-analytics/security/synapse-workspace-synapse-rbac.md#roles) to it, from the Azure Synapse Analytics studio UI. The following steps show how to do it:
366
+
To ensure that the attached Synapse Spark Pool works properly, assign the [Administrator Role](../synapse-analytics/security/synapse-workspace-synapse-rbac.md#roles) to it, from the Azure Synapse Analytics studio UI. These steps show how to do it:
367
367
368
368
1. Open your **Synapse Workspace** in Azure portal.
369
369
370
370
1. In the left pane, select **Overview**.
371
371
372
-
:::image type="content" source="media/how-to-manage-synapse-spark-pool/synapse-workspace-open-synapse-studio.png" alt-text="Screenshot showing Open Synapse Studio.":::
372
+
:::image type="content" source="media/how-to-manage-synapse-spark-pool/synapse-workspace-open-synapse-studio.png" alt-text="Screenshot showing Open Synapse Studio." lightbox= "media/how-to-manage-synapse-spark-pool/synapse-workspace-open-synapse-studio.png":::
373
+
373
374
1. Select **Open Synapse Studio**.
374
375
375
376
1. In the Azure Synapse Analytics studio, select **Manage** in the left pane.
@@ -392,17 +393,17 @@ To ensure that the attached Synapse Spark Pool works properly, assign the [Admin
392
393
393
394
1. Select **Apply**.
394
395
395
-
:::image type="content" source="media/how-to-manage-synapse-spark-pool/workspace-add-role-assignment.png" alt-text="Screenshot showing Add Role Assignment.":::
396
+
:::image type="content" source="media/how-to-manage-synapse-spark-pool/workspace-add-role-assignment.png" alt-text="Screenshot showing Add Role Assignment." lightbox= "media/how-to-manage-synapse-spark-pool/workspace-add-role-assignment.png":::
396
397
397
398
## Update the Synapse Spark Pool
398
399
399
400
# [Studio UI](#tab/studio-ui)
400
401
401
-
You can manage the attached Synapse Spark pool from the Azure Machine Learning studio UI. Spark pool management functionality includes associated managed identity updates for an attached Synapse Spark pool. You can assign a system-assigned or a user-assigned identity while updating a Synapse Spark pool. You should [create a user-assigned managed identity](../active-directory/managed-identities-azure-resources/how-manage-user-assigned-managed-identities.md#create-a-user-assigned-managed-identity) in Azure portal, before assigning it to a Synapse Spark pool.
402
+
You can manage the attached Synapse Spark pool from the Azure Machine Learning studio UI. Spark pool management functionality includes associated managed identity updates for an attached Synapse Spark pool. You can assign a system-assigned or a user-assigned identity while updating a Synapse Spark pool. You should [create a user-assigned managed identity](../active-directory/managed-identities-azure-resources/how-manage-user-assigned-managed-identities.md#create-a-user-assigned-managed-identity) in Azure portal, before you assign it to a Synapse Spark pool.
402
403
403
404
To update managed identity for the attached Synapse Spark pool:
1. Open the **Details** page for the Synapse Spark pool in the Azure Machine Learning studio.
408
409
@@ -417,12 +418,12 @@ To update managed identity for the attached Synapse Spark pool:
417
418
1. To assign a user-assigned managed identity:
418
419
1. Select **User-assigned** as the **Identity type**.
419
420
1. Select an Azure **Subscription** from the dropdown menu.
420
-
1. Type the first few letters of the name of user-assigned managed identity in the box showing text **Search by name**. A list with matching user-assigned managed identity names appears. Select the user-assigned managed identity you want from the list. You can select multiple user-assigned managed identities, and assign them to the attached Synapse Spark pool.
421
+
1. Type the first few letters of the name of user-assigned managed identity in the box that shows the text **Search by name**. A list with matching user-assigned managed identity names appears. Select the user-assigned managed identity you want from the list. You can select multiple user-assigned managed identities, and assign them to the attached Synapse Spark pool.
Execute the `az ml compute update` command, with appropriate parameters, to update the identity associated with an attached Synapse Spark pool. To assign a system-assigned identity, set the `--identity` parameter in the command to `SystemAssigned`, as shown:
426
+
To update the identity associated with an attached Synapse Spark pool, execute the `az ml compute update` command with appropriate parameters. To assign a system-assigned identity, set the `--identity` parameter in the command to `SystemAssigned`, as shown:
426
427
427
428
```azurecli
428
429
az ml compute update --identity SystemAssigned --subscription <SUBSCRIPTION_ID> --resource-group <RESOURCE_GROUP> --workspace-name <AML_WORKSPACE_NAME> --name <ATTACHED_SPARK_POOL_NAME>
@@ -462,7 +463,7 @@ Class SynapseSparkCompute: This is an experimental class, and may change at any
462
463
}
463
464
```
464
465
465
-
To assign a user-assigned identity, set the parameter `--identity` in the command to `UserAssigned`. Additionally, you should pass the resource ID, for the user-assigned identity, using the `--user-assigned-identities` parameter as shown:
466
+
To assign a user-assigned identity, set the parameter `--identity` in the command to `UserAssigned`. Additionally, you should use the `--user-assigned-identities` parameter to pass the resource ID for the user-assigned identity, as shown:
466
467
467
468
```azurecli
468
469
az ml compute update --identity UserAssigned --user-assigned-identities /subscriptions/<SUBSCRIPTION_ID>/resourceGroups/<RESOURCE_GROUP>/providers/Microsoft.ManagedIdentity/userAssignedIdentities/<AML_USER_MANAGED_ID> --subscription <SUBSCRIPTION_ID> --resource-group <RESOURCE_GROUP> --workspace-name <AML_WORKSPACE_NAME> --name <ATTACHED_SPARK_POOL_NAME>
@@ -585,7 +586,7 @@ We might want to detach an attached Synapse Spark pool, to clean up a workspace.
585
586
586
587
# [Studio UI](#tab/studio-ui)
587
588
588
-
The Azure Machine Learning studio UI also provides a way to detach an attached Synapse Spark pool. Follow these steps to do this:
589
+
The Azure Machine Learning studio UI also provides a way to detach an attached Synapse Spark pool. To do this, follow these steps:
589
590
590
591
1. Open the **Details** page for the Synapse Spark pool, in the Azure Machine Learning studio.
591
592
@@ -595,15 +596,15 @@ The Azure Machine Learning studio UI also provides a way to detach an attached S
An attached Synapse Spark pool can be detached by executing the `az ml compute detach` command with name of the pool passed using `--name` parameter as shown here:
599
+
An attached Synapse Spark pool can be detached by executing the `az ml compute detach` command with the name of the pool passed, using the `--name` parameter, as shown here:
599
600
600
601
```azurecli
601
602
az ml compute detach --name <ATTACHED_SPARK_POOL_NAME> --subscription <SUBSCRIPTION_ID> --resource-group <RESOURCE_GROUP> --workspace-name <AML_WORKSPACE_NAME>
602
603
```
603
604
604
605
This sample shows the expected output of the above command:
605
606
606
-
```azurecli
607
+
```azurecli
607
608
Are you sure you want to perform this operation? (y/n): y
## Serverless Spark compute in Azure Machine Learning
636
637
637
-
Some user scenarios may require access to a serverless Spark compute, during an Azure Machine Learning job submission, without a need to attach a Spark pool. The Azure Synapse Analytics integration with Azure Machine Learning also provides a serverless Spark compute experience. This allows access to a Spark compute in a job, without a need to attach the compute to a workspace first. [Learn more about the serverless Spark compute experience](interactive-data-wrangling-with-apache-spark-azure-ml.md).
638
+
Some user scenarios might require access to a serverless Spark compute resource, during an Azure Machine Learning job submission, without a need to attach a Spark pool. The Azure Synapse Analytics integration with Azure Machine Learning also provides a serverless Spark compute experience. This allows access to a Spark compute in a job, without a need to attach the compute to a workspace first. [Learn more about the serverless Spark compute experience](interactive-data-wrangling-with-apache-spark-azure-ml.md).
638
639
639
640
## Next steps
640
641
641
642
- [Interactive Data Wrangling with Apache Spark in Azure Machine Learning](./interactive-data-wrangling-with-apache-spark-azure-ml.md)
642
643
643
-
- [Submit Spark jobs in Azure Machine Learning](./how-to-submit-spark-jobs.md)
644
+
- [Submit Spark jobs in Azure Machine Learning](./how-to-submit-spark-jobs.md)
0 commit comments