Skip to content

Commit 545ca59

Browse files
Merge pull request #229629 from fbsolo-ms1/tutorial-for-SK
Edit apache-spark-azure-ml-concepts.md & the TOC . . .
2 parents 26b607e + 95c908d commit 545ca59

File tree

2 files changed

+11
-14
lines changed

2 files changed

+11
-14
lines changed

articles/machine-learning/apache-spark-azure-ml-concepts.md

Lines changed: 9 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ ms.topic: conceptual
99
ms.author: franksolomon
1010
author: ynpandey
1111
ms.reviewer: franksolomon
12-
ms.date: 02/10/2023
12+
ms.date: 03/06/2023
1313
ms.custom: cliv2, sdkv2
1414
#Customer intent: As a full-stack machine learning pro, I want to use Apache Spark in Azure Machine Learning.
1515
---
@@ -27,7 +27,7 @@ Azure Machine Learning integration with Azure Synapse Analytics (preview) provid
2727

2828
With the Apache Spark framework, Azure Machine Learning Managed (Automatic) Spark compute is the easiest way to accomplish distributed computing tasks in the Azure Machine Learning environment. Azure Machine Learning offers a fully managed, serverless, on-demand Apache Spark compute cluster. Its users can avoid the need to create an Azure Synapse workspace and a Synapse Spark pool.
2929

30-
Users can define resources, including instance type and Apache Spark runtime version. They can then use those resources to access Managed (Automatic) Spark compute in Azure Machine Learning notebooks for:
30+
Users can define resources, including instance type and the Apache Spark runtime version. They can then use those resources to access Managed (Automatic) Spark compute, in Azure Machine Learning notebooks, for:
3131

3232
- [Interactive Spark code development](./interactive-data-wrangling-with-apache-spark-azure-ml.md)
3333
- [Spark batch job submissions](./how-to-submit-spark-jobs.md)
@@ -60,7 +60,7 @@ As of January 2023, creation of a Managed (Automatic) Spark compute, inside a vi
6060

6161
### Inactivity periods and tear-down mechanism
6262

63-
At first launch, Managed (Automatic) Spark compute (*cold start*) resource might need three to five minutes to start the Spark session itself. The automated Managed (Automatic) Spark compute provisioning, backed by Azure Synapse, causes this delay. After the Managed (Automatic) Spark compute is provisioned, and an Apache Spark session starts, subsequent code executions (*warm start*) won't experience this delay.
63+
At first launch, a Managed (Automatic) Spark compute (*cold start*) resource might need three to five minutes to start the Spark session itself. The automated Managed (Automatic) Spark compute provisioning, backed by Azure Synapse, causes this delay. After the Managed (Automatic) Spark compute is provisioned, and an Apache Spark session starts, subsequent code executions (*warm start*) won't experience this delay.
6464

6565
The Spark session configuration offers an option that defines a session timeout (in minutes). The Spark session will end after an inactivity period that exceeds the user-defined timeout. If another Spark session doesn't start in the following ten minutes, resources provisioned for the Managed (Automatic) Spark compute will be torn down.
6666

@@ -70,9 +70,9 @@ After the Managed (Automatic) Spark compute resource tear-down happens, submissi
7070

7171
> [!NOTE]
7272
> For a session-level conda package:
73-
> - *Cold start* time will need about ten to fifteen minutes.
74-
> - *Warm start* time using same conda package will need about one minute.
75-
> - *Warm start* with a different conda package will also need about ten to fifteen minutes.
73+
> - the *Cold start* will need about ten to fifteen minutes.
74+
> - the *Warm start*, using same conda package, will need about one minute.
75+
> - the *Warm start*, with a different conda package, will also need about ten to fifteen minutes.
7676
7777
## Attached Synapse Spark pool
7878

@@ -90,13 +90,13 @@ The Spark session configuration for an attached Synapse Spark pool also offers a
9090

9191
## Defining Spark cluster size
9292

93-
You can define Spark cluster size with three parameter values in Azure Machine Learning Spark jobs:
93+
In Azure Machine Learning Spark jobs, you can define Spark cluster size with three parameter values:
9494

9595
- Number of executors
9696
- Executor cores
9797
- Executor memory
9898

99-
You should consider an Azure Machine Learning Apache Spark executor as an equivalent of Azure Spark worker nodes. An example can explain these parameters. Let's say that you defined the number of executors as 6 (equivalent to six worker nodes), executor cores as 4, and executor memory as 28 GB. Your Spark job then has access to a cluster with 24 cores and 168 GB of memory.
99+
You should consider an Azure Machine Learning Apache Spark executor as equivalent to Azure Spark worker nodes. An example can explain these parameters. Let's say that you defined the number of executors as 6 (equivalent to six worker nodes), executor cores as 4, and executor memory as 28 GB. Your Spark job then has access to a cluster with 24 cores and 168 GB of memory.
100100

101101
## Ensuring resource access for Spark jobs
102102

@@ -110,15 +110,12 @@ To access data and other resources, a Spark job can use either a user identity p
110110
[This article](./how-to-submit-spark-jobs.md#ensuring-resource-access-for-spark-jobs) describes resource access for Spark jobs. In a notebook session, both the Managed (Automatic) Spark compute and the attached Synapse Spark pool use user identity passthrough for data access during [interactive data wrangling](./interactive-data-wrangling-with-apache-spark-azure-ml.md).
111111

112112
> [!NOTE]
113-
> - To ensure successful Spark job execution, assign **Contributor** and **Storage Blob Data Contributor** roles (on the Azure storage account used for data input and output) to the identity that's used for submitting the Spark job.
113+
> - To ensure successful Spark job execution, assign **Contributor** and **Storage Blob Data Contributor** roles (on the Azure storage account used for data input and output) to the identity that will be used for the Spark job submission.
114114
> - If an [attached Synapse Spark pool](./how-to-manage-synapse-spark-pool.md) points to a Synapse Spark pool in an Azure Synapse workspace, and that workspace has an associated managed virtual network, [configure a managed private endpoint to a storage account](../synapse-analytics/security/connect-to-a-secure-storage-account.md). This configuration will help ensure data access.
115115
> - Both Managed (Automatic) Spark compute and attached Synapse Spark pool do not work in a notebook created in a private link enabled workspace.
116116
117-
[This quickstart](./quickstart-spark-data-wrangling.md) describes how to start using Managed (Automatic) Spark compute in Azure Machine Learning.
118-
119117
## Next steps
120118

121-
- [Quickstart: Submit Apache Spark jobs in Azure Machine Learning (preview)](./quickstart-spark-jobs.md)
122119
- [Attach and manage a Synapse Spark pool in Azure Machine Learning (preview)](./how-to-manage-synapse-spark-pool.md)
123120
- [Interactive data wrangling with Apache Spark in Azure Machine Learning (preview)](./interactive-data-wrangling-with-apache-spark-azure-ml.md)
124121
- [Submit Spark jobs in Azure Machine Learning (preview)](./how-to-submit-spark-jobs.md)

articles/machine-learning/toc.yml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -70,8 +70,6 @@
7070
href: quickstart-spark-jobs.md
7171
- name: Run Jupyter notebooks
7272
href: quickstart-run-notebooks.md
73-
- name: Apache Spark in Azure Machine Learning (preview)
74-
href: apache-spark-azure-ml-concepts.md
7573
- name: Tutorials
7674
expanded: true
7775
items:
@@ -123,6 +121,8 @@
123121
href: concept-v2.md
124122
- name: Work with Data
125123
items:
124+
- name: Apache Spark in Azure Machine Learning (preview)
125+
href: apache-spark-azure-ml-concepts.md
126126
- name: Data concepts in Azure Machine Learning
127127
href: concept-data.md
128128
- name: Sourcing human data responsibly

0 commit comments

Comments
 (0)