Skip to content

Commit ac57c1c

Browse files
Merge pull request #272037 from fbsolo-ms1/update-data-science-virtual-machine-files
Freshness update for how-to-manage-synapse-spark-pool.md . . .
2 parents e7f5b69 + 67a6599 commit ac57c1c

File tree

1 file changed

+36
-35
lines changed

1 file changed

+36
-35
lines changed

articles/machine-learning/how-to-manage-synapse-spark-pool.md

Lines changed: 36 additions & 35 deletions
Original file line numberDiff line numberDiff line change
@@ -1,14 +1,14 @@
11
---
22
title: Attach and manage a Synapse Spark pool in Azure Machine Learning
33
titleSuffix: Azure Machine Learning
4-
description: Learn how to attach and manage Spark pools with Azure Synapse
4+
description: Learn how to attach and manage Spark pools with Azure Synapse.
55
author: ynpandey
66
ms.author: yogipandey
77
ms.reviewer: franksolomon
88
ms.service: machine-learning
99
ms.subservice: mldata
1010
ms.topic: how-to
11-
ms.date: 05/22/2023
11+
ms.date: 04/12/2024
1212
ms.custom: template-how-to, devx-track-azurecli
1313
---
1414

@@ -50,22 +50,22 @@ In this article, you'll learn how to attach a [Synapse Spark Pool](../synapse-an
5050
---
5151

5252
## Attach a Synapse Spark pool in Azure Machine Learning
53-
Azure Machine Learning provides multiple options for attaching and managing a Synapse Spark pool.
53+
Azure Machine Learning offers different ways to attach and manage a Synapse Spark pool.
5454

5555
# [Studio UI](#tab/studio-ui)
5656

57-
To attach a Synapse Spark Pool using the Studio Compute tab:
57+
To attach a Synapse Spark Pool with the Studio Compute tab:
5858

59-
:::image type="content" source="media/how-to-manage-synapse-spark-pool/synapse_compute_synapse_spark_pool.png" alt-text="Screenshot showing creation of a new Synapse Spark Pool.":::
59+
:::image type="content" source="media/how-to-manage-synapse-spark-pool/synapse_compute_synapse_spark_pool.png" alt-text="Screenshot showing creation of a new Synapse Spark Pool." lightbox= "media/how-to-manage-synapse-spark-pool/synapse_compute_synapse_spark_pool.png":::
6060

6161
1. In the **Manage** section of the left pane, select **Compute**.
6262
1. Select **Attached computes**.
6363
1. On the **Attached computes** screen, select **New**, to see the options for attaching different types of computes.
64-
2. Select **Synapse Spark pool**.
64+
1. Select **Synapse Spark pool**.
6565

66-
The **Attach Synapse Spark pool** panel will open on the right side of the screen. In this panel:
66+
The **Attach Synapse Spark pool** panel opens on the right side of the screen. In this panel:
6767

68-
1. Enter a **Name**, which refers to the attached Synapse Spark Pool inside the Azure Machine Learning.
68+
1. Enter a **Name**, which refers to the attached Synapse Spark Pool inside the Azure Machine Learning resource.
6969

7070
2. Select an Azure **Subscription** from the dropdown menu.
7171

@@ -83,19 +83,19 @@ The **Attach Synapse Spark pool** panel will open on the right side of the scree
8383

8484
[!INCLUDE [cli v2](includes/machine-learning-cli-v2.md)]
8585

86-
With the Azure Machine Learning CLI, we can attach and manage a Synapse Spark pool from the command line interface, using intuitive YAML syntax and commands.
86+
With the Azure Machine Learning CLI, we can use intuitive YAML syntax and commands from the command line interface, to attach and manage a Synapse Spark pool.
8787

88-
To define an attached Synapse Spark pool using YAML syntax, the YAML file should cover these properties:
88+
To define an attached Synapse Spark pool using YAML syntax, the YAML file should cover these properties:
8989

9090
- `name` – name of the attached Synapse Spark pool.
9191

9292
- `type` – set this property to `synapsespark`.
9393

9494
- `resource_id` – this property should provide the resource ID value of the Synapse Spark pool created in the Azure Synapse Analytics workspace. The Azure resource ID includes
9595

96-
- Azure Subscription ID,
96+
- Azure Subscription ID,
9797

98-
- resource Group Name,
98+
- resource Group Name,
9999

100100
- Azure Synapse Analytics Workspace Name, and
101101

@@ -125,7 +125,7 @@ To define an attached Synapse Spark pool using YAML syntax, the YAML file should
125125
type: system_assigned
126126
```
127127

128-
- For the `identity` type `user_assigned`, you should also provide a list of `user_assigned_identities` values. Each user-assigned identity should be declared as an element of the list, by using the `resource_id` value of the user-assigned identity. The first user-assigned identity in the list will be used for submitting a job by default.
128+
- For the `identity` type `user_assigned`, you should also provide a list of `user_assigned_identities` values. Each user-assigned identity should be declared as an element of the list, by using the `resource_id` value of the user-assigned identity. The first user-assigned identity in the list is used to submit a job by default.
129129

130130
```YAML
131131
name: <ATTACHED_SPARK_POOL_NAME>
@@ -149,7 +149,7 @@ az ml compute attach --file <YAML_SPECIFICATION_FILE_NAME>.yaml --subscription <
149149
This sample shows the expected output of the above command:
150150

151151
```azurecli
152-
Class SynapseSparkCompute: This is an experimental class, and may change at any time. Please see https://aka.ms/azuremlexperimental for more information.
152+
Class SynapseSparkCompute: This is an experimental class, and may change at any time. Please visit https://aka.ms/azuremlexperimental for more information.
153153
154154
{
155155
"auto_pause_settings": {
@@ -184,7 +184,7 @@ If the attached Synapse Spark pool, with the name specified in the YAML specific
184184

185185
values through YAML specification file.
186186

187-
To display details of an attached Synapse Spark pool, execute the `az ml compute show` command. Pass the name of the attached Synapse Spark pool with the `--name` parameter, as shown:
187+
To display details of an attached Synapse Spark pool, execute the `az ml compute show` command. Pass the name of the attached Synapse Spark pool with the `--name` parameter, as shown:
188188

189189
```azurecli
190190
az ml compute show --name <ATTACHED_SPARK_POOL_NAME> --subscription <SUBSCRIPTION_ID> --resource-group <RESOURCE_GROUP> --workspace-name <AML_WORKSPACE_NAME>
@@ -219,7 +219,7 @@ This sample shows the expected output of the above command:
219219
}
220220
```
221221

222-
To see a list of all computes, including the attached Synapse Spark pools in a workspace, use the `az ml compute list` command. Use the name parameter to pass the name of the workspace, as shown:
222+
To see a list of all computes, including the attached Synapse Spark pools in a workspace, use the `az ml compute list` command. Use the name parameter to pass the name of the workspace, as shown:
223223

224224
```azurecli
225225
az ml compute list --subscription <SUBSCRIPTION_ID> --resource-group <RESOURCE_GROUP> --workspace-name <AML_WORKSPACE_NAME>
@@ -267,9 +267,9 @@ This sample shows the expected output of the above command:
267267

268268
Azure Machine Learning Python SDK provides convenient functions for attaching and managing Synapse Spark pool, using Python code in Azure Machine Learning Notebooks.
269269

270-
To attach a Synapse Compute using Python SDK, first create an instance of [azure.ai.ml.MLClient class](/python/api/azure-ai-ml/azure.ai.ml.mlclient). This provides convenient functions for interaction with Azure Machine Learning services. The following code sample uses `azure.identity.DefaultAzureCredential` for connecting to a workspace in resource group of a specified Azure subscription. In the following code sample, define the `SynapseSparkCompute` with the parameters:
271-
- `name` - user-defined name of the new attached Synapse Spark pool.
272-
- `resource_id` - resource ID of the Synapse Spark pool created earlier in the Azure Synapse Analytics workspace.
270+
To attach a Synapse Compute using Python SDK, first create an instance of [azure.ai.ml.MLClient class](/python/api/azure-ai-ml/azure.ai.ml.mlclient). This provides convenient functions for interaction with Azure Machine Learning services. The following code sample uses `azure.identity.DefaultAzureCredential` to connect to a workspace in the resource group of a specified Azure subscription. In the following code sample, define the `SynapseSparkCompute` with these parameters:
271+
- `name` - user-defined name of the new attached Synapse Spark pool.
272+
- `resource_id` - resource ID of the Synapse Spark pool created earlier in the Azure Synapse Analytics workspace
273273

274274
An [azure.ai.ml.MLClient.begin_create_or_update()](/python/api/azure-ai-ml/azure.ai.ml.mlclient#azure-ai-ml-mlclient-begin-create-or-update) function call attaches the defined Synapse Spark pool to the Azure Machine Learning workspace.
275275

@@ -293,7 +293,7 @@ synapse_comp = SynapseSparkCompute(name=synapse_name, resource_id=synapse_resour
293293
ml_client.begin_create_or_update(synapse_comp)
294294
```
295295

296-
To attach a Synapse Spark pool that uses system-assigned identity, pass [IdentityConfiguration](/python/api/azure-ai-ml/azure.ai.ml.entities.identityconfiguration), with type set to `SystemAssigned`, as the `identity` parameter of the `SynapseSparkCompute` class. This code snippet attaches a Synapse Spark pool that uses system-assigned identity.
296+
To attach a Synapse Spark pool that uses system-assigned identity, pass [IdentityConfiguration](/python/api/azure-ai-ml/azure.ai.ml.entities.identityconfiguration), with type set to `SystemAssigned`, as the `identity` parameter of the `SynapseSparkCompute` class. This code snippet attaches a Synapse Spark pool that uses system-assigned identity:
297297

298298
```python
299299
# import required libraries
@@ -319,7 +319,7 @@ synapse_comp = SynapseSparkCompute(
319319
ml_client.begin_create_or_update(synapse_comp)
320320
```
321321

322-
A Synapse Spark pool can also use a user-assigned identity. For a user-assigned identity, you can pass a managed identity definition, using the [IdentityConfiguration](/python/api/azure-ai-ml/azure.ai.ml.entities.identityconfiguration) class, as the `identity` parameter of the `SynapseSparkCompute` class. For the managed identity definition used in this way, set the `type` to `UserAssigned`. In addition, pass a `user_assigned_identities` parameter. The parameter `user_assigned_identities` is a list of objects of the UserAssignedIdentity class. The `resource_id`of the user-assigned identity populates each `UserAssignedIdentity` class object. This code snippet attaches a Synapse Spark pool that uses a user-assigned identity:
322+
A Synapse Spark pool can also use a user-assigned identity. For a user-assigned identity, you can pass a managed identity definition, using the [IdentityConfiguration](/python/api/azure-ai-ml/azure.ai.ml.entities.identityconfiguration) class, as the `identity` parameter of the `SynapseSparkCompute` class. For the managed identity definition used in this way, set the `type` to `UserAssigned`. In addition, pass a `user_assigned_identities` parameter. The parameter `user_assigned_identities` is a list of objects of the UserAssignedIdentity class. The `resource_id` of the user-assigned identity populates each `UserAssignedIdentity` class object. This code snippet attaches a Synapse Spark pool that uses a user-assigned identity:
323323

324324
```python
325325
# import required libraries
@@ -356,20 +356,21 @@ synapse_comp = SynapseSparkCompute(
356356
ml_client.begin_create_or_update(synapse_comp)
357357
```
358358

359-
> [!NOTE]
359+
> [!NOTE]
360360
> The `azure.ai.ml.MLClient.begin_create_or_update()` function attaches a new Synapse Spark pool, if a pool with the specified name does not already exist in the workspace. However, if a Synapse Spark pool with that specified name is already attached to the workspace, a call to the `azure.ai.ml.MLClient.begin_create_or_update()` function will update the existing attached pool with the new identity or identities.
361361

362362
---
363363

364364
## Add role assignments in Azure Synapse Analytics
365365

366-
To ensure that the attached Synapse Spark Pool works properly, assign the [Administrator Role](../synapse-analytics/security/synapse-workspace-synapse-rbac.md#roles) to it, from the Azure Synapse Analytics studio UI. The following steps show how to do it:
366+
To ensure that the attached Synapse Spark Pool works properly, assign the [Administrator Role](../synapse-analytics/security/synapse-workspace-synapse-rbac.md#roles) to it, from the Azure Synapse Analytics studio UI. These steps show how to do it:
367367

368368
1. Open your **Synapse Workspace** in Azure portal.
369369

370370
1. In the left pane, select **Overview**.
371371

372-
:::image type="content" source="media/how-to-manage-synapse-spark-pool/synapse-workspace-open-synapse-studio.png" alt-text="Screenshot showing Open Synapse Studio.":::
372+
:::image type="content" source="media/how-to-manage-synapse-spark-pool/synapse-workspace-open-synapse-studio.png" alt-text="Screenshot showing Open Synapse Studio." lightbox= "media/how-to-manage-synapse-spark-pool/synapse-workspace-open-synapse-studio.png":::
373+
373374
1. Select **Open Synapse Studio**.
374375

375376
1. In the Azure Synapse Analytics studio, select **Manage** in the left pane.
@@ -392,17 +393,17 @@ To ensure that the attached Synapse Spark Pool works properly, assign the [Admin
392393

393394
1. Select **Apply**.
394395

395-
:::image type="content" source="media/how-to-manage-synapse-spark-pool/workspace-add-role-assignment.png" alt-text="Screenshot showing Add Role Assignment.":::
396+
:::image type="content" source="media/how-to-manage-synapse-spark-pool/workspace-add-role-assignment.png" alt-text="Screenshot showing Add Role Assignment." lightbox= "media/how-to-manage-synapse-spark-pool/workspace-add-role-assignment.png":::
396397

397398
## Update the Synapse Spark Pool
398399

399400
# [Studio UI](#tab/studio-ui)
400401

401-
You can manage the attached Synapse Spark pool from the Azure Machine Learning studio UI. Spark pool management functionality includes associated managed identity updates for an attached Synapse Spark pool. You can assign a system-assigned or a user-assigned identity while updating a Synapse Spark pool. You should [create a user-assigned managed identity](../active-directory/managed-identities-azure-resources/how-manage-user-assigned-managed-identities.md#create-a-user-assigned-managed-identity) in Azure portal, before assigning it to a Synapse Spark pool.
402+
You can manage the attached Synapse Spark pool from the Azure Machine Learning studio UI. Spark pool management functionality includes associated managed identity updates for an attached Synapse Spark pool. You can assign a system-assigned or a user-assigned identity while updating a Synapse Spark pool. You should [create a user-assigned managed identity](../active-directory/managed-identities-azure-resources/how-manage-user-assigned-managed-identities.md#create-a-user-assigned-managed-identity) in Azure portal, before you assign it to a Synapse Spark pool.
402403

403404
To update managed identity for the attached Synapse Spark pool:
404405

405-
:::image type="content" source="media/how-to-manage-synapse-spark-pool/synapse_compute_update_managed_identity.png" alt-text="Screenshot showing Synapse Spark Pool managed identity update.":::
406+
:::image type="content" source="media/how-to-manage-synapse-spark-pool/synapse_compute_update_managed_identity.png" alt-text="Screenshot showing Synapse Spark Pool managed identity update." lightbox= "media/how-to-manage-synapse-spark-pool/synapse_compute_update_managed_identity.png":::
406407

407408
1. Open the **Details** page for the Synapse Spark pool in the Azure Machine Learning studio.
408409

@@ -417,12 +418,12 @@ To update managed identity for the attached Synapse Spark pool:
417418
1. To assign a user-assigned managed identity:
418419
1. Select **User-assigned** as the **Identity type**.
419420
1. Select an Azure **Subscription** from the dropdown menu.
420-
1. Type the first few letters of the name of user-assigned managed identity in the box showing text **Search by name**. A list with matching user-assigned managed identity names appears. Select the user-assigned managed identity you want from the list. You can select multiple user-assigned managed identities, and assign them to the attached Synapse Spark pool.
421+
1. Type the first few letters of the name of user-assigned managed identity in the box that shows the text **Search by name**. A list with matching user-assigned managed identity names appears. Select the user-assigned managed identity you want from the list. You can select multiple user-assigned managed identities, and assign them to the attached Synapse Spark pool.
421422
1. Select **Update**.
422423

423424
# [CLI](#tab/cli)
424425
[!INCLUDE [cli v2](includes/machine-learning-cli-v2.md)]
425-
Execute the `az ml compute update` command, with appropriate parameters, to update the identity associated with an attached Synapse Spark pool. To assign a system-assigned identity, set the `--identity` parameter in the command to `SystemAssigned`, as shown:
426+
To update the identity associated with an attached Synapse Spark pool, execute the `az ml compute update` command with appropriate parameters. To assign a system-assigned identity, set the `--identity` parameter in the command to `SystemAssigned`, as shown:
426427

427428
```azurecli
428429
az ml compute update --identity SystemAssigned --subscription <SUBSCRIPTION_ID> --resource-group <RESOURCE_GROUP> --workspace-name <AML_WORKSPACE_NAME> --name <ATTACHED_SPARK_POOL_NAME>
@@ -462,7 +463,7 @@ Class SynapseSparkCompute: This is an experimental class, and may change at any
462463
}
463464
```
464465

465-
To assign a user-assigned identity, set the parameter `--identity` in the command to `UserAssigned`. Additionally, you should pass the resource ID, for the user-assigned identity, using the `--user-assigned-identities` parameter as shown:
466+
To assign a user-assigned identity, set the parameter `--identity` in the command to `UserAssigned`. Additionally, you should use the `--user-assigned-identities` parameter to pass the resource ID for the user-assigned identity, as shown:
466467

467468
```azurecli
468469
az ml compute update --identity UserAssigned --user-assigned-identities /subscriptions/<SUBSCRIPTION_ID>/resourceGroups/<RESOURCE_GROUP>/providers/Microsoft.ManagedIdentity/userAssignedIdentities/<AML_USER_MANAGED_ID> --subscription <SUBSCRIPTION_ID> --resource-group <RESOURCE_GROUP> --workspace-name <AML_WORKSPACE_NAME> --name <ATTACHED_SPARK_POOL_NAME>
@@ -585,7 +586,7 @@ We might want to detach an attached Synapse Spark pool, to clean up a workspace.
585586

586587
# [Studio UI](#tab/studio-ui)
587588

588-
The Azure Machine Learning studio UI also provides a way to detach an attached Synapse Spark pool. Follow these steps to do this:
589+
The Azure Machine Learning studio UI also provides a way to detach an attached Synapse Spark pool. To do this, follow these steps:
589590

590591
1. Open the **Details** page for the Synapse Spark pool, in the Azure Machine Learning studio.
591592

@@ -595,15 +596,15 @@ The Azure Machine Learning studio UI also provides a way to detach an attached S
595596

596597
[!INCLUDE [cli v2](includes/machine-learning-cli-v2.md)]
597598

598-
An attached Synapse Spark pool can be detached by executing the `az ml compute detach` command with name of the pool passed using `--name` parameter as shown here:
599+
An attached Synapse Spark pool can be detached by executing the `az ml compute detach` command with the name of the pool passed, using the `--name` parameter, as shown here:
599600

600601
```azurecli
601602
az ml compute detach --name <ATTACHED_SPARK_POOL_NAME> --subscription <SUBSCRIPTION_ID> --resource-group <RESOURCE_GROUP> --workspace-name <AML_WORKSPACE_NAME>
602603
```
603604

604605
This sample shows the expected output of the above command:
605606

606-
```azurecli
607+
```azurecli
607608
Are you sure you want to perform this operation? (y/n): y
608609
```
609610

@@ -634,10 +635,10 @@ ml_client.compute.begin_delete(name=synapse_name, action="Detach")
634635

635636
## Serverless Spark compute in Azure Machine Learning
636637

637-
Some user scenarios may require access to a serverless Spark compute, during an Azure Machine Learning job submission, without a need to attach a Spark pool. The Azure Synapse Analytics integration with Azure Machine Learning also provides a serverless Spark compute experience. This allows access to a Spark compute in a job, without a need to attach the compute to a workspace first. [Learn more about the serverless Spark compute experience](interactive-data-wrangling-with-apache-spark-azure-ml.md).
638+
Some user scenarios might require access to a serverless Spark compute resource, during an Azure Machine Learning job submission, without a need to attach a Spark pool. The Azure Synapse Analytics integration with Azure Machine Learning also provides a serverless Spark compute experience. This allows access to a Spark compute in a job, without a need to attach the compute to a workspace first. [Learn more about the serverless Spark compute experience](interactive-data-wrangling-with-apache-spark-azure-ml.md).
638639

639640
## Next steps
640641

641642
- [Interactive Data Wrangling with Apache Spark in Azure Machine Learning](./interactive-data-wrangling-with-apache-spark-azure-ml.md)
642643

643-
- [Submit Spark jobs in Azure Machine Learning](./how-to-submit-spark-jobs.md)
644+
- [Submit Spark jobs in Azure Machine Learning](./how-to-submit-spark-jobs.md)

0 commit comments

Comments
 (0)