You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
#Customer intent: As a full-stack machine learning pro, I want to use Apache Spark in Azure Machine Learning.
15
15
---
16
16
17
-
# Apache Spark in Azure Machine Learning (preview)
17
+
# Apache Spark in Azure Machine Learning
18
18
19
-
Azure Machine Learning integration with Azure Synapse Analytics (preview) provides easy access to distributed computation resources through the Apache Spark framework. This integration offers these Apache Spark computing experiences:
19
+
Azure Machine Learning integration with Azure Synapse Analytics provides easy access to distributed computation resources through the Apache Spark framework. This integration offers these Apache Spark computing experiences:
With the Apache Spark framework, Azure Machine Learning serverless Spark compute is the easiest way to accomplish distributed computing tasks in the Azure Machine Learning environment. Azure Machine Learning offers a fully managed, serverless, on-demand Apache Spark compute cluster. Its users can avoid the need to create an Azure Synapse workspace and a Synapse Spark pool.
29
29
@@ -118,8 +118,8 @@ To access data and other resources, a Spark job can use either a user identity p
118
118
119
119
## Next steps
120
120
121
-
-[Attach and manage a Synapse Spark pool in Azure Machine Learning (preview)](./how-to-manage-synapse-spark-pool.md)
122
-
-[Interactive data wrangling with Apache Spark in Azure Machine Learning (preview)](./interactive-data-wrangling-with-apache-spark-azure-ml.md)
123
-
-[Submit Spark jobs in Azure Machine Learning (preview)](./how-to-submit-spark-jobs.md)
121
+
-[Attach and manage a Synapse Spark pool in Azure Machine Learning](./how-to-manage-synapse-spark-pool.md)
122
+
-[Interactive data wrangling with Apache Spark in Azure Machine Learning](./interactive-data-wrangling-with-apache-spark-azure-ml.md)
123
+
-[Submit Spark jobs in Azure Machine Learning](./how-to-submit-spark-jobs.md)
124
124
-[Code samples for Spark jobs using the Azure Machine Learning CLI](https://github.com/Azure/azureml-examples/tree/main/cli/jobs/spark)
125
125
-[Code samples for Spark jobs using the Azure Machine Learning Python SDK](https://github.com/Azure/azureml-examples/tree/main/sdk/python/jobs/spark)
To handle interactive Azure Machine Learning notebook data wrangling, Azure Machine Learning integration with Azure Synapse Analytics provides easy access to the Apache Spark framework. This access allows for Azure Machine Learning Notebook interactive data wrangling.
18
18
19
-
To handle interactive Azure Machine Learning notebook data wrangling, Azure Machine Learning integration with Azure Synapse Analytics (preview) provides easy access to the Apache Spark framework. This access allows for Azure Machine Learning Notebook interactive data wrangling.
20
-
21
-
In this quickstart guide, you learn how to perform interactive data wrangling using Azure Machine Learning Managed (Automatic) Synapse Spark compute, Azure Data Lake Storage (ADLS) Gen 2 storage account, and user identity passthrough.
19
+
In this quickstart guide, you learn how to perform interactive data wrangling using Azure Machine Learning serverless Spark compute, Azure Data Lake Storage (ADLS) Gen 2 storage account, and user identity passthrough.
22
20
23
21
## Prerequisites
24
22
- An Azure subscription; if you don't have an Azure subscription, [create a free account](https://azure.microsoft.com/free) before you begin.
25
23
- An Azure Machine Learning workspace. See [Create workspace resources](./quickstart-create-resources.md).
26
24
- An Azure Data Lake Storage (ADLS) Gen 2 storage account. See [Create an Azure Data Lake Storage (ADLS) Gen 2 storage account](../storage/blobs/create-data-lake-storage-account.md).
27
-
- To enable this feature:
28
-
1. Navigate to the Azure Machine Learning studio UI
29
-
2. In the icon section at the top right of the screen, select **Manage preview features** (megaphone icon)
30
-
3. In the **Managed preview feature** panel, toggle the **Run notebooks and jobs on managed Spark** feature to **on**
31
-
:::image type="content" source="./media/apache-spark-environment-configuration/how-to-enable-managed-spark-preview.png" lightbox="media/apache-spark-environment-configuration/how-to-enable-managed-spark-preview.png" alt-text="Screenshot showing the option to enable the Managed Spark preview.":::
32
25
33
26
## Store Azure storage account credentials as secrets in Azure Key Vault
34
27
@@ -113,20 +106,20 @@ Once the user identity has the appropriate roles assigned, data in the Azure sto
113
106
114
107
## Ensuring resource access for Spark jobs
115
108
116
-
Spark jobs can use either a managed identity or user identity passthrough to access data and other resources. The following table summarizes the different mechanisms for resource access while using Azure Machine Learning serverless Spark compute (preview) and attached Synapse Spark pool.
109
+
To access data and other resources, Spark jobs can use either a managed identity or user identity passthrough. The following table summarizes the different mechanisms for resource access while using Azure Machine Learning serverless Spark compute and attached Synapse Spark pool.
|Serverless Spark compute (preview)|User identity and managed identity|User identity|
113
+
|Serverless Spark compute|User identity and managed identity|User identity|
121
114
|Attached Synapse Spark pool|User identity and managed identity|Managed identity - compute identity of the attached Synapse Spark pool|
122
115
123
-
If the CLI or SDK code defines an option to use managed identity, Azure Machine Learning serverless Spark compute (preview) relies on a user-assigned managed identity attached to the workspace. You can attach a user-assigned managed identity to an existing Azure Machine Learning workspace using Azure Machine Learning CLI v2, or with `ARMClient`.
116
+
If the CLI or SDK code defines an option to use managed identity, Azure Machine Learning serverless Spark compute relies on a user-assigned managed identity attached to the workspace. You can attach a user-assigned managed identity to an existing Azure Machine Learning workspace using Azure Machine Learning CLI v2, or with `ARMClient`.
124
117
125
118
## Next steps
126
119
127
-
-[Apache Spark in Azure Machine Learning (preview)](./apache-spark-azure-ml-concepts.md)
128
-
-[Attach and manage a Synapse Spark pool in Azure Machine Learning (preview)](./how-to-manage-synapse-spark-pool.md)
129
-
-[Interactive Data Wrangling with Apache Spark in Azure Machine Learning (preview)](./interactive-data-wrangling-with-apache-spark-azure-ml.md)
130
-
-[Submit Spark jobs in Azure Machine Learning (preview)](./how-to-submit-spark-jobs.md)
120
+
-[Apache Spark in Azure Machine Learning](./apache-spark-azure-ml-concepts.md)
121
+
-[Attach and manage a Synapse Spark pool in Azure Machine Learning](./how-to-manage-synapse-spark-pool.md)
122
+
-[Interactive Data Wrangling with Apache Spark in Azure Machine Learning](./interactive-data-wrangling-with-apache-spark-azure-ml.md)
123
+
-[Submit Spark jobs in Azure Machine Learning](./how-to-submit-spark-jobs.md)
131
124
-[Code samples for Spark jobs using Azure Machine Learning CLI](https://github.com/Azure/azureml-examples/tree/main/cli/jobs/spark)
132
-
-[Code samples for Spark jobs using Azure Machine Learning Python SDK](https://github.com/Azure/azureml-examples/tree/main/sdk/python/jobs/spark)
125
+
-[Code samples for Spark jobs using Azure Machine Learning Python SDK](https://github.com/Azure/azureml-examples/tree/main/sdk/python/jobs/spark)
In this article, you will learn how to attach a [Synapse Spark Pool](../synapse-analytics/spark/apache-spark-concepts.md#spark-pools) in Azure Machine Learning. You can attach a Synapse Spark Pool in Azure Machine Learning in one of these ways:
17
+
In this article, you'll learn how to attach a [Synapse Spark Pool](../synapse-analytics/spark/apache-spark-concepts.md#spark-pools) in Azure Machine Learning. You can attach a Synapse Spark Pool in Azure Machine Learning in one of these ways:
20
18
21
19
- Using Azure Machine Learning studio UI
22
20
- Using Azure Machine Learning CLI
@@ -28,11 +26,6 @@ In this article, you will learn how to attach a [Synapse Spark Pool](../synapse-
28
26
- An Azure Machine Learning workspace. See [Create workspace resources](./quickstart-create-resources.md).
29
27
-[Create an Azure Synapse Analytics workspace in Azure portal](../synapse-analytics/quickstart-create-workspace.md).
30
28
-[Create an Apache Spark pool using the Azure portal](../synapse-analytics/quickstart-create-apache-spark-pool-portal.md).
31
-
- To enable this feature:
32
-
1. Navigate to Azure Machine Learning studio UI.
33
-
2. Select **Manage preview features** (megaphone icon) among the icons on the top right side of the screen.
34
-
3. In **Managed preview feature** panel, toggle on **Run notebooks and jobs on managed Spark** feature.
@@ -59,30 +52,30 @@ Azure Machine Learning provides multiple options for attaching and managing a Sy
59
52
60
53
# [Studio UI](#tab/studio-ui)
61
54
62
-
To attach a Synapse Spark Pool using the Studio Compute tab:
55
+
To attach a Synapse Spark Pool using the Studio Compute tab:
63
56
64
57
:::image type="content" source="media/how-to-manage-synapse-spark-pool/synapse_compute_synapse_spark_pool.png" alt-text="Screenshot showing creation of a new Synapse Spark Pool.":::
65
58
66
59
1. In the **Manage** section of the left pane, select **Compute**.
67
60
1. Select **Attached computes**.
68
61
1. On the **Attached computes** screen, select **New**, to see the options for attaching different types of computes.
69
-
1. Select **Synapse Spark pool (preview)**.
62
+
2. Select **Synapse Spark pool**.
70
63
71
-
The **Attach Synapse Spark pool (preview)** panel will open on the right side of the screen. In this panel:
64
+
The **Attach Synapse Spark pool** panel will open on the right side of the screen. In this panel:
72
65
73
-
1. Enter a **Name**, which will refer to the attached Synapse Spark Pool inside the Azure Machine Learning.
66
+
1. Enter a **Name**, which refers to the attached Synapse Spark Pool inside the Azure Machine Learning.
74
67
75
-
1. Select an Azure **Subscription** from the dropdown menu.
68
+
2. Select an Azure **Subscription** from the dropdown menu.
76
69
77
-
1. Select a **Synapse workspace** from the dropdown menu.
70
+
3. Select a **Synapse workspace** from the dropdown menu.
78
71
79
-
1. Select a **Spark Pool** from the dropdown menu.
72
+
4. Select a **Spark Pool** from the dropdown menu.
80
73
81
-
1. Toggle the **Assign a managed identity** option, to enable it.
74
+
5. Toggle the **Assign a managed identity** option, to enable it.
82
75
83
-
1. Select a managed **Identity type** to use with this attached Synapse Spark Pool.
76
+
6. Select a managed **Identity type** to use with this attached Synapse Spark Pool.
84
77
85
-
1. Select **Update**, to complete the Synapse Spark Pool attach process.
78
+
7. Select **Update**, to complete the Synapse Spark Pool attach process.
86
79
87
80
# [CLI](#tab/cli)
88
81
@@ -181,7 +174,7 @@ Class SynapseSparkCompute: This is an experimental class, and may change at any
181
174
}
182
175
```
183
176
184
-
If the attached Synapse Spark pool, with the name specified in the YAML specification file, already exists in the workspace, then `az ml compute attach` command execution will update the existing pool with the information provided in the YAML specification file. You can update the
177
+
If the attached Synapse Spark pool, with the name specified in the YAML specification file, already exists in the workspace, then `az ml compute attach` command execution updates the existing pool with the information provided in the YAML specification file. You can update the
185
178
186
179
- identity type
187
180
- user assigned identities
@@ -270,7 +263,7 @@ This sample shows the expected output of the above command:
Azure Machine Learning Python SDK (preview) provides convenient functions for attaching and managing Synapse Spark pool, using Python code in Azure Machine Learning Notebooks.
266
+
Azure Machine Learning Python SDK provides convenient functions for attaching and managing Synapse Spark pool, using Python code in Azure Machine Learning Notebooks.
274
267
275
268
To attach a Synapse Compute using Python SDK, first create an instance of [azure.ai.ml.MLClient class](/python/api/azure-ai-ml/azure.ai.ml.mlclient). This provides convenient functions for interaction with Azure Machine Learning services. The following code sample uses `azure.identity.DefaultAzureCredential` for connecting to a workspace in resource group of a specified Azure subscription. In the following code sample, define the `SynapseSparkCompute` with the parameters:
276
269
- `name`- user-defined name of the new attached Synapse Spark pool.
@@ -393,7 +386,7 @@ To ensure that the attached Synapse Spark Pool works properly, assign the [Admin
393
386
394
387
1. In **Role** dropdown menu, select **Synapse Administrator**.
395
388
396
-
1. In the **Select user** search box, start typing the name of your Azure Machine Learning Workspace. It will show you a list of attached Synapse Spark pools. Select your desired Synapse Spark pool from the list.
389
+
1. In the **Select user** search box, start typing the name of your Azure Machine Learning Workspace. It shows you a list of attached Synapse Spark pools. Select your desired Synapse Spark pool from the list.
397
390
398
391
1. Select **Apply**.
399
392
@@ -422,7 +415,7 @@ To update managed identity for the attached Synapse Spark pool:
422
415
1. To assign a user-assigned managed identity:
423
416
1. Select **User-assigned** as the **Identity type**.
424
417
1. Select an Azure **Subscription** from the dropdown menu.
425
-
1. Type the first few letters of the name of user-assigned managed identity in the box showing text **Search by name**. A list with matching user-assigned managed identity names will appear. Select the user-assigned managed identity you want from the list. You can select multiple user-assigned managed identities, and assign them to the attached Synapse Spark pool.
418
+
1. Type the first few letters of the name of user-assigned managed identity in the box showing text **Search by name**. A list with matching user-assigned managed identity names appears. Select the user-assigned managed identity you want from the list. You can select multiple user-assigned managed identities, and assign them to the attached Synapse Spark pool.
426
419
1. Select **Update**.
427
420
428
421
# [CLI](#tab/cli)
@@ -616,7 +609,7 @@ Are you sure you want to perform this operation? (y/n): y
We will use an `MLClient.compute.begin_delete()` function call. Pass the `name` of the attached Synapse Spark pool, along with the action `Detach`, to the function. This code snippet detaches a Synapse Spark pool from an Azure Machine Learning workspace:
612
+
We'll use an `MLClient.compute.begin_delete()` function call. Pass the `name` of the attached Synapse Spark pool, along with the action `Detach`, to the function. This code snippet detaches a Synapse Spark pool from an Azure Machine Learning workspace:
## Managed Synapse Spark Pool in Azure Machine Learning
633
+
## Serverless Spark compute in Azure Machine Learning
641
634
642
-
Some user scenarios may require access to a Synapse Spark Pool, during an Azure Machine Learning job submission, without a need to attach a Spark pool. The Azure Synapse Analytics integration with Azure Machine Learning (preview) also provides a serverless Spark compute (preview) experience that allows access to a Spark pool in a job, without a need to attach the compute to a workspace first. [Learn more about the serverless Spark compute (preview) experience](interactive-data-wrangling-with-apache-spark-azure-ml.md).
635
+
Some user scenarios may require access to a serverless Spark compute, during an Azure Machine Learning job submission, without a need to attach a Spark pool. The Azure Synapse Analytics integration with Azure Machine Learning also provides a serverless Spark compute experience. This allows access to a Spark compute in a job, without a need to attach the compute to a workspace first. [Learn more about the serverless Spark compute experience](interactive-data-wrangling-with-apache-spark-azure-ml.md).
643
636
644
637
## Next steps
645
638
646
-
- [Interactive Data Wrangling with Apache Spark in Azure Machine Learning (preview)](./interactive-data-wrangling-with-apache-spark-azure-ml.md)
639
+
- [Interactive Data Wrangling with Apache Spark in Azure Machine Learning](./interactive-data-wrangling-with-apache-spark-azure-ml.md)
647
640
648
-
- [Submit Spark jobs in Azure Machine Learning (preview)](./how-to-submit-spark-jobs.md)
641
+
- [Submit Spark jobs in Azure Machine Learning](./how-to-submit-spark-jobs.md)
0 commit comments