Skip to content

Commit 2a06a6b

Browse files
committed
Freshness update for apache-spark-environment-configuration.md . . .
1 parent 8a27b4a commit 2a06a6b

File tree

1 file changed

+42
-39
lines changed

1 file changed

+42
-39
lines changed
Lines changed: 42 additions & 39 deletions
Original file line numberDiff line numberDiff line change
@@ -1,58 +1,58 @@
11
---
22
title: Apache Spark - environment configuration
33
titleSuffix: Azure Machine Learning
4-
description: Learn how to configure your Apache Spark environment for interactive data wrangling
4+
description: Learn how to configure your Apache Spark environment for interactive data wrangling.
55
author: ynpandey
66
ms.author: yogipandey
77
ms.reviewer: franksolomon
88
ms.service: machine-learning
99
ms.subservice: mldata
1010
ms.topic: how-to
11-
ms.date: 05/22/2023
11+
ms.date: 04/19/2024
1212
#Customer intent: As a Full Stack ML Pro, I want to perform interactive data wrangling in Azure Machine Learning with Apache Spark.
1313
---
1414

1515
# Quickstart: Interactive Data Wrangling with Apache Spark in Azure Machine Learning
1616

1717
To handle interactive Azure Machine Learning notebook data wrangling, Azure Machine Learning integration with Azure Synapse Analytics provides easy access to the Apache Spark framework. This access allows for Azure Machine Learning Notebook interactive data wrangling.
1818

19-
In this quickstart guide, you learn how to perform interactive data wrangling using Azure Machine Learning serverless Spark compute, Azure Data Lake Storage (ADLS) Gen 2 storage account, and user identity passthrough.
19+
In this quickstart guide, you learn how to perform interactive data wrangling with Azure Machine Learning serverless Spark compute, Azure Data Lake Storage (ADLS) Gen 2 storage account, and user identity passthrough.
2020

2121
## Prerequisites
22-
- An Azure subscription; if you don't have an Azure subscription, [create a free account](https://azure.microsoft.com/free) before you begin.
23-
- An Azure Machine Learning workspace. See [Create workspace resources](./quickstart-create-resources.md).
24-
- An Azure Data Lake Storage (ADLS) Gen 2 storage account. See [Create an Azure Data Lake Storage (ADLS) Gen 2 storage account](../storage/blobs/create-data-lake-storage-account.md).
22+
- An Azure subscription; if you don't have an Azure subscription, [create a free account](https://azure.microsoft.com/free) before you start.
23+
- An Azure Machine Learning workspace. Visit [Create workspace resources](./quickstart-create-resources.md).
24+
- An Azure Data Lake Storage (ADLS) Gen 2 storage account. Visit [Create an Azure Data Lake Storage (ADLS) Gen 2 storage account](../storage/blobs/create-data-lake-storage-account.md).
2525

2626
## Store Azure storage account credentials as secrets in Azure Key Vault
2727

28-
To store Azure storage account credentials as secrets in the Azure Key Vault using the Azure portal user interface:
28+
To store Azure storage account credentials as secrets in the Azure Key Vault, with the Azure portal user interface:
2929

30-
1. Navigate to your Azure Key Vault in the Azure portal.
31-
1. Select **Secrets** from the left panel.
32-
1. Select **+ Generate/Import**.
30+
1. Navigate to your Azure Key Vault in the Azure portal
31+
1. Select **Secrets** from the left panel
32+
1. Select **+ Generate/Import**
3333

34-
:::image type="content" source="media/apache-spark-environment-configuration/azure-key-vault-secrets-generate-import.png" alt-text="Screenshot showing the Azure Key Vault Secrets Generate Or Import tab.":::
34+
:::image type="content" source="media/apache-spark-environment-configuration/azure-key-vault-secrets-generate-import.png" alt-text="Screenshot that shows the Azure Key Vault Secrets Generate Or Import tab.":::
3535

36-
1. At the **Create a secret** screen, enter a **Name** for the secret you want to create.
37-
1. Navigate to Azure Blob Storage Account, in the Azure portal, as seen in this image:
36+
1. At the **Create a secret** screen, enter a **Name** for the secret you want to create
37+
1. Navigate to Azure Blob Storage Account, in the Azure portal, as shown in this image:
3838

39-
:::image type="content" source="media/apache-spark-environment-configuration/storage-account-access-keys.png" alt-text="Screenshot showing the Azure access key and connection string values screen.":::
40-
1. Select **Access keys** from the Azure Blob Storage Account page left panel.
41-
1. Select **Show** next to **Key 1**, and then **Copy to clipboard** to get the storage account access key.
39+
:::image type="content" source="media/apache-spark-environment-configuration/storage-account-access-keys.png" alt-text="Screenshot that shows the Azure access key and connection string values screen.":::
40+
1. Select **Access keys** from the Azure Blob Storage Account page left panel
41+
1. Select **Show** next to **Key 1**, and then **Copy to clipboard** to get the storage account access key
4242
> [!Note]
43-
> Select appropriate options to copy
43+
> Select the appropriate options to copy
4444
> - Azure Blob storage container shared access signature (SAS) tokens
4545
> - Azure Data Lake Storage (ADLS) Gen 2 storage account service principal credentials
4646
> - tenant ID
4747
> - client ID and
4848
> - secret
4949
>
50-
> on the respective user interfaces while creating Azure Key Vault secrets for them.
51-
1. Navigate back to the **Create a secret** screen.
52-
1. In the **Secret value** textbox, enter the access key credential for the Azure storage account, which was copied to the clipboard in the earlier step.
53-
1. Select **Create**.
50+
> on the respective user interfaces while you create the Azure Key Vault secrets for them
51+
1. Navigate back to the **Create a secret** screen
52+
1. In the **Secret value** textbox, enter the access key credential for the Azure storage account, which was copied to the clipboard in the earlier step
53+
1. Select **Create**
5454

55-
:::image type="content" source="media/apache-spark-environment-configuration/create-a-secret.png" alt-text="Screenshot showing the Azure secret creation screen.":::
55+
:::image type="content" source="media/apache-spark-environment-configuration/create-a-secret.png" alt-text="Screenshot that shows the Azure secret creation screen.":::
5656

5757
> [!TIP]
5858
> [Azure CLI](../key-vault/secrets/quick-create-cli.md) and [Azure Key Vault secret client library for Python](../key-vault/secrets/quick-create-python.md#sign-in-to-azure) can also create Azure Key Vault secrets.
@@ -61,59 +61,62 @@ To store Azure storage account credentials as secrets in the Azure Key Vault usi
6161

6262
We must ensure that the input and output data paths are accessible before we start interactive data wrangling. First, for
6363

64-
- the user identity of the Notebooks session logged-in user or
64+
- the user identity of the Notebooks session logged-in user
65+
66+
or
67+
6568
- a service principal
6669

67-
assign **Reader** and **Storage Blob Data Reader** roles to the user identity of the logged-in user. However, in certain scenarios, we might want to write the wrangled data back to the Azure storage account. The **Reader** and **Storage Blob Data Reader** roles provide read-only access to the user identity or service principal. To enable read and write access, assign **Contributor** and **Storage Blob Data Contributor** roles to the user identity or service principal. To assign appropriate roles to the user identity:
70+
assign **Reader** and **Storage Blob Data Reader** roles to the user identity of the logged-in user. However, in certain scenarios, we might want to write the wrangled data back to the Azure storage account. The **Reader** and **Storage Blob Data Reader** roles provide read-only access to the user identity or service principal. To enable read and write access, assign **Contributor** and **Storage Blob Data Contributor** roles to the user identity or service principal. To assign appropriate roles to the user identity:
6871

69-
1. Open the [Microsoft Azure portal](https://portal.azure.com).
70-
1. Search and select the **Storage accounts** service.
72+
1. Open the [Microsoft Azure portal](https://portal.azure.com)
73+
1. Search and select the **Storage accounts** service
7174

72-
:::image type="content" source="media/apache-spark-environment-configuration/find-storage-accounts-service.png" lightbox="media/apache-spark-environment-configuration/find-storage-accounts-service.png" alt-text="Expandable screenshot showing Storage accounts service search and selection, in Microsoft Azure portal.":::
75+
:::image type="content" source="media/apache-spark-environment-configuration/find-storage-accounts-service.png" lightbox="media/apache-spark-environment-configuration/find-storage-accounts-service.png" alt-text="Expandable screenshot that shows Storage accounts service search and selection in Microsoft Azure portal.":::
7376

74-
1. On the **Storage accounts** page, select the Azure Data Lake Storage (ADLS) Gen 2 storage account from the list. A page showing the storage account **Overview** will open.
77+
1. On the **Storage accounts** page, select the Azure Data Lake Storage (ADLS) Gen 2 storage account from the list. A page showing the storage account **Overview** opens
7578

76-
:::image type="content" source="media/apache-spark-environment-configuration/storage-accounts-list.png" lightbox="media/apache-spark-environment-configuration/storage-accounts-list.png" alt-text="Expandable screenshot showing selection of the Azure Data Lake Storage (ADLS) Gen 2 storage account Storage account.":::
79+
:::image type="content" source="media/apache-spark-environment-configuration/storage-accounts-list.png" lightbox="media/apache-spark-environment-configuration/storage-accounts-list.png" alt-text="Expandable screenshot that shows selection of the Azure Data Lake Storage (ADLS) Gen 2 storage account Storage account.":::
7780

7881
1. Select **Access Control (IAM)** from the left panel
7982
1. Select **Add role assignment**
8083

81-
:::image type="content" source="media/apache-spark-environment-configuration/storage-account-add-role-assignment.png" lightbox="media/apache-spark-environment-configuration/storage-account-add-role-assignment.png" alt-text="Screenshot showing the Azure access keys screen.":::
84+
:::image type="content" source="media/apache-spark-environment-configuration/storage-account-add-role-assignment.png" lightbox="media/apache-spark-environment-configuration/storage-account-add-role-assignment.png" alt-text="Screenshot that shows the Azure access keys screen.":::
8285

8386
1. Find and select role **Storage Blob Data Contributor**
8487
1. Select **Next**
8588

86-
:::image type="content" source="media/apache-spark-environment-configuration/add-role-assignment-choose-role.png" lightbox="media/apache-spark-environment-configuration/add-role-assignment-choose-role.png" alt-text="Screenshot showing the Azure add role assignment screen.":::
89+
:::image type="content" source="media/apache-spark-environment-configuration/add-role-assignment-choose-role.png" lightbox="media/apache-spark-environment-configuration/add-role-assignment-choose-role.png" alt-text="Screenshot that shows the Azure add role assignment screen.":::
8790

88-
1. Select **User, group, or service principal**.
89-
1. Select **+ Select members**.
91+
1. Select **User, group, or service principal**
92+
1. Select **+ Select members**
9093
1. Search for the user identity below **Select**
9194
1. Select the user identity from the list, so that it shows under **Selected members**
9295
1. Select the appropriate user identity
9396
1. Select **Next**
9497

95-
:::image type="content" source="media/apache-spark-environment-configuration/add-role-assignment-choose-members.png" lightbox="media/apache-spark-environment-configuration/add-role-assignment-choose-members.png" alt-text="Screenshot showing the Azure add role assignment screen Members tab.":::
98+
:::image type="content" source="media/apache-spark-environment-configuration/add-role-assignment-choose-members.png" lightbox="media/apache-spark-environment-configuration/add-role-assignment-choose-members.png" alt-text="Screenshot that shows the Azure add role assignment screen Members tab.":::
9699

97100
1. Select **Review + Assign**
98101

99102
:::image type="content" source="media/apache-spark-environment-configuration/add-role-assignment-review-and-assign.png" lightbox="media/apache-spark-environment-configuration/add-role-assignment-review-and-assign.png" alt-text="Screenshot showing the Azure add role assignment screen review and assign tab.":::
100-
1. Repeat steps 2-13 for **Contributor** role assignment.
103+
1. Repeat steps 2-13 for **Contributor** role assignment
101104

102105
Once the user identity has the appropriate roles assigned, data in the Azure storage account should become accessible.
103106

104107
> [!NOTE]
105-
> If an [attached Synapse Spark pool](./how-to-manage-synapse-spark-pool.md) points to a Synapse Spark pool in an Azure Synapse workspace that has a managed virtual network associated with it, [a managed private endpoint to storage account should be configured](../synapse-analytics/security/connect-to-a-secure-storage-account.md) to ensure data access.
108+
> If an [attached Synapse Spark pool](./how-to-manage-synapse-spark-pool.md) points to a Synapse Spark pool, in an Azure Synapse workspace, that has a managed virtual network associated with it, [you should configure a managed private endpoint to a storage account](../synapse-analytics/security/connect-to-a-secure-storage-account.md) to ensure data access.
106109
107110
## Ensuring resource access for Spark jobs
108111

109-
To access data and other resources, Spark jobs can use either a managed identity or user identity passthrough. The following table summarizes the different mechanisms for resource access while using Azure Machine Learning serverless Spark compute and attached Synapse Spark pool.
112+
To access data and other resources, Spark jobs can use either a managed identity or user identity passthrough. The following table summarizes the different mechanisms for resource access while you use Azure Machine Learning serverless Spark compute and attached Synapse Spark pool.
110113

111114
|Spark pool|Supported identities|Default identity|
112115
| ---------- | -------------------- | ---------------- |
113116
|Serverless Spark compute|User identity, user-assigned managed identity attached to the workspace|User identity|
114117
|Attached Synapse Spark pool|User identity, user-assigned managed identity attached to the attached Synapse Spark pool, system-assigned managed identity of the attached Synapse Spark pool|System-assigned managed identity of the attached Synapse Spark pool|
115118

116-
If the CLI or SDK code defines an option to use managed identity, Azure Machine Learning serverless Spark compute relies on a user-assigned managed identity attached to the workspace. You can attach a user-assigned managed identity to an existing Azure Machine Learning workspace using Azure Machine Learning CLI v2, or with `ARMClient`.
119+
If the CLI or SDK code defines an option to use managed identity, Azure Machine Learning serverless Spark compute relies on a user-assigned managed identity attached to the workspace. You can attach a user-assigned managed identity to an existing Azure Machine Learning workspace with Azure Machine Learning CLI v2, or with `ARMClient`.
117120

118121
## Next steps
119122

@@ -122,4 +125,4 @@ If the CLI or SDK code defines an option to use managed identity, Azure Machine
122125
- [Interactive Data Wrangling with Apache Spark in Azure Machine Learning](./interactive-data-wrangling-with-apache-spark-azure-ml.md)
123126
- [Submit Spark jobs in Azure Machine Learning](./how-to-submit-spark-jobs.md)
124127
- [Code samples for Spark jobs using Azure Machine Learning CLI](https://github.com/Azure/azureml-examples/tree/main/cli/jobs/spark)
125-
- [Code samples for Spark jobs using Azure Machine Learning Python SDK](https://github.com/Azure/azureml-examples/tree/main/sdk/python/jobs/spark)
128+
- [Code samples for Spark jobs using Azure Machine Learning Python SDK](https://github.com/Azure/azureml-examples/tree/main/sdk/python/jobs/spark)

0 commit comments

Comments
 (0)