Skip to content

Commit e447be7

Browse files
authored
Merge pull request #203596 from jingyanjingyan/notebook
Notebook
2 parents d8c990f + f6ba558 commit e447be7

File tree

2 files changed

+73
-48
lines changed

2 files changed

+73
-48
lines changed

articles/synapse-analytics/spark/microsoft-spark-utilities.md

Lines changed: 69 additions & 44 deletions
Original file line numberDiff line numberDiff line change
@@ -1,14 +1,14 @@
11
---
22
title: Introduction to Microsoft Spark utilities
33
description: "Tutorial: MSSparkutils in Azure Synapse Analytics notebooks"
4-
author: ruixinxu
5-
services: synapse-analytics
6-
ms.service: synapse-analytics
4+
author: ruixinxu
5+
services: synapse-analytics
6+
ms.service: synapse-analytics
77
ms.topic: reference
88
ms.subservice: spark
99
ms.date: 09/10/2020
1010
ms.author: ruxu
11-
ms.reviewer:
11+
ms.reviewer:
1212
zone_pivot_groups: programming-languages-spark-all-minus-sql
1313
ms.custom: subject-rbac-steps
1414
---
@@ -19,9 +19,9 @@ Microsoft Spark Utilities (MSSparkUtils) is a builtin package to help you easily
1919

2020
## Pre-requisites
2121

22-
### Configure access to Azure Data Lake Storage Gen2
22+
### Configure access to Azure Data Lake Storage Gen2
2323

24-
Synapse notebooks use Azure Active Directory (Azure AD) pass-through to access the ADLS Gen2 accounts. You need to be a **Storage Blob Data Contributor** to access the ADLS Gen2 account (or folder).
24+
Synapse notebooks use Azure Active Directory (Azure AD) pass-through to access the ADLS Gen2 accounts. You need to be a **Storage Blob Data Contributor** to access the ADLS Gen2 account (or folder).
2525

2626
Synapse pipelines use workspace's Managed Service Identity (MSI) to access the storage accounts. To use MSSparkUtils in your pipeline activities, your workspace identity needs to be **Storage Blob Data Contributor** to access the ADLS Gen2 account (or folder).
2727

@@ -30,7 +30,7 @@ Follow these steps to make sure your Azure AD and workspace MSI have access to t
3030
1. Select the **Access control (IAM)** from the left panel.
3131
1. Select **Add** > **Add role assignment** to open the Add role assignment page.
3232
1. Assign the following role. For detailed steps, see [Assign Azure roles using the Azure portal](../../role-based-access-control/role-assignments-portal.md).
33-
33+
3434
| Setting | Value |
3535
| --- | --- |
3636
| Role | Storage Blob Data Contributor |
@@ -41,14 +41,14 @@ Follow these steps to make sure your Azure AD and workspace MSI have access to t
4141
> The managed identity name is also the workspace name.
4242
4343
![Add role assignment page in Azure portal.](../../../includes/role-based-access-control/media/add-role-assignment-page.png)
44-
44+
4545
1. Select **Save**.
4646

4747
You can access data on ADLS Gen2 with Synapse Spark via the following URL:
4848

4949
`abfss://<container_name>@<storage_account_name>.dfs.core.windows.net/<path>`
5050

51-
### Configure access to Azure Blob Storage
51+
### Configure access to Azure Blob Storage
5252

5353
Synapse use [**Shared access signature (SAS)**](../../storage/common/storage-sas-overview.md) to access Azure Blob Storage. To avoid exposing SAS keys in the code, we recommend creating a new linked service in Synapse workspace to the Azure Blob Storage account you want to access.
5454

@@ -60,7 +60,7 @@ Follow these steps to add a new linked service for an Azure Blob Storage account
6060
4. Select **Continue**.
6161
5. Select the Azure Blob Storage Account to access and configure the linked service name. Suggest using **Account key** for the **Authentication method**.
6262
6. Select **Test connection** to validate the settings are correct.
63-
7. Select **Create** first and click **Publish all** to save your changes.
63+
7. Select **Create** first and click **Publish all** to save your changes.
6464

6565
You can access data on Azure Blob Storage with Synapse Spark via following URL:
6666

@@ -126,29 +126,29 @@ Console.WriteLine(wasbs_path);
126126

127127
```
128128

129-
::: zone-end
130-
129+
::: zone-end
130+
131131
### Configure access to Azure Key Vault
132132

133-
You can add an Azure Key Vault as a linked service to manage your credentials in Synapse.
133+
You can add an Azure Key Vault as a linked service to manage your credentials in Synapse.
134134
Follow these steps to add an Azure Key Vault as a Synapse linked service:
135135
1. Open the [Azure Synapse Studio](https://web.azuresynapse.net/).
136136
2. Select **Manage** from the left panel and select **Linked services** under the **External connections**.
137137
3. Search **Azure Key Vault** in the **New linked Service** panel on the right.
138138
4. Select the Azure Key Vault Account to access and configure the linked service name.
139139
5. Select **Test connection** to validate the settings are correct.
140-
6. Select **Create** first and click **Publish all** to save your change.
140+
6. Select **Create** first and click **Publish all** to save your change.
141141

142142
Synapse notebooks use Azure active directory(Azure AD) pass-through to access Azure Key Vault. Synapse pipelines use workspace identity(MSI) to access Azure Key Vault. To make sure your code work both in notebook and in Synapse pipeline, we recommend granting secret access permission for both your Azure AD account and workspace identity.
143143

144144
Follow these steps to grant secret access to your workspace identity:
145-
1. Open the [Azure portal](https://portal.azure.com/) and the Azure Key Vault you want to access.
145+
1. Open the [Azure portal](https://portal.azure.com/) and the Azure Key Vault you want to access.
146146
2. Select the **Access policies** from the left panel.
147-
3. Select **Add Access Policy**:
147+
3. Select **Add Access Policy**:
148148
- Choose **Key, Secret, & Certificate Management** as config template.
149-
- Select **your Azure AD account** and **your workspace identity** (same as your workspace name) in the select principal or make sure it is already assigned.
149+
- Select **your Azure AD account** and **your workspace identity** (same as your workspace name) in the select principal or make sure it is already assigned.
150150
4. Select **Select** and **Add**.
151-
5. Select the **Save** button to commit changes.
151+
5. Select the **Save** button to commit changes.
152152

153153
## File system utilities
154154

@@ -429,29 +429,29 @@ Removes a file or a directory.
429429
:::zone pivot = "programming-language-python"
430430

431431
```python
432-
mssparkutils.fs.rm('file path', True) # Set the last parameter as True to remove all files and directories recursively
432+
mssparkutils.fs.rm('file path', True) # Set the last parameter as True to remove all files and directories recursively
433433
```
434434
::: zone-end
435435

436436
:::zone pivot = "programming-language-scala"
437437

438438
```scala
439-
mssparkutils.fs.rm("file path", true) // Set the last parameter as True to remove all files and directories recursively
439+
mssparkutils.fs.rm("file path", true) // Set the last parameter as True to remove all files and directories recursively
440440
```
441441

442442
::: zone-end
443443

444444
:::zone pivot = "programming-language-csharp"
445445

446446
```csharp
447-
FS.Rm("file path", true) // Set the last parameter as True to remove all files and directories recursively
447+
FS.Rm("file path", true) // Set the last parameter as True to remove all files and directories recursively
448448
```
449449

450450
::: zone-end
451451

452452

453453

454-
## Notebook utilities
454+
## Notebook utilities
455455

456456
:::zone pivot = "programming-language-csharp"
457457

@@ -461,7 +461,7 @@ Not supported.
461461

462462
:::zone pivot = "programming-language-python"
463463

464-
You can use the MSSparkUtils Notebook Utilities to run a notebook or exit a notebook with a value.
464+
You can use the MSSparkUtils Notebook Utilities to run a notebook or exit a notebook with a value.
465465
Run the following command to get an overview of the available methods:
466466

467467
```python
@@ -478,7 +478,7 @@ run(path: String, timeoutSeconds: int, arguments: Map): String -> This method ru
478478
```
479479

480480
### Reference a notebook
481-
Reference a notebook and returns its exit value. You can run nesting function calls in a notebook interactively or in a pipeline. The notebook being referenced will run on the Spark pool of which notebook calls this function.
481+
Reference a notebook and returns its exit value. You can run nesting function calls in a notebook interactively or in a pipeline. The notebook being referenced will run on the Spark pool of which notebook calls this function.
482482

483483
```python
484484

@@ -497,13 +497,13 @@ After the run finished, you will see a snapshot link named '**View notebook run:
497497
![Screenshot of a snap link python](./media/microsoft-spark-utilities/spark-utilities-run-notebook-snap-link-sample-python.png)
498498

499499
### Exit a notebook
500-
Exits a notebook with a value. You can run nesting function calls in a notebook interactively or in a pipeline.
500+
Exits a notebook with a value. You can run nesting function calls in a notebook interactively or in a pipeline.
501501

502502
- When you call an `exit()` function a notebook interactively, Azure Synapse will throw an exception, skip running subsequence cells, and keep Spark session alive.
503503

504-
- When you orchestrate a notebook that calls an `exit()` function in a Synapse pipeline, Azure Synapse will return an exit value, complete the pipeline run, and stop the Spark session.
504+
- When you orchestrate a notebook that calls an `exit()` function in a Synapse pipeline, Azure Synapse will return an exit value, complete the pipeline run, and stop the Spark session.
505505

506-
- When you call an `exit()` function in a notebook being referenced, Azure Synapse will stop the further execution in the notebook being referenced, and continue to run next cells in the notebook that call the `run()` function. For example: Notebook1 has three cells and calls an `exit()` function in the second cell. Notebook2 has five cells and calls `run(notebook1)` in the third cell. When you run Notebook2, Notebook1 will be stopped at the second cell when hitting the `exit()` function. Notebook2 will continue to run its fourth cell and fifth cell.
506+
- When you call an `exit()` function in a notebook being referenced, Azure Synapse will stop the further execution in the notebook being referenced, and continue to run next cells in the notebook that call the `run()` function. For example: Notebook1 has three cells and calls an `exit()` function in the second cell. Notebook2 has five cells and calls `run(notebook1)` in the third cell. When you run Notebook2, Notebook1 will be stopped at the second cell when hitting the `exit()` function. Notebook2 will continue to run its fourth cell and fifth cell.
507507

508508

509509
```python
@@ -512,9 +512,9 @@ mssparkutils.notebook.exit("value string")
512512

513513
For example:
514514

515-
**Sample1** notebook locates under **folder/** with following two cells:
515+
**Sample1** notebook locates under **folder/** with following two cells:
516516
- cell 1 defines an **input** parameter with default value set to 10.
517-
- cell 2 exits the notebook with **input** as exit value.
517+
- cell 2 exits the notebook with **input** as exit value.
518518

519519
![Screenshot of a sample notebook](./media/microsoft-spark-utilities/spark-utilities-run-notebook-sample.png)
520520

@@ -548,7 +548,7 @@ Sample1 run success with input is 20
548548

549549
:::zone pivot = "programming-language-scala"
550550

551-
You can use the MSSparkUtils Notebook Utilities to run a notebook or exit a notebook with a value.
551+
You can use the MSSparkUtils Notebook Utilities to run a notebook or exit a notebook with a value.
552552
Run the following command to get an overview of the available methods:
553553

554554
```scala
@@ -565,7 +565,7 @@ run(path: String, timeoutSeconds: int, arguments: Map): String -> This method ru
565565
```
566566

567567
### Reference a notebook
568-
Reference a notebook and returns its exit value. You can run nesting function calls in a notebook interactively or in a pipeline. The notebook being referenced will run on the Spark pool of which notebook calls this function.
568+
Reference a notebook and returns its exit value. You can run nesting function calls in a notebook interactively or in a pipeline. The notebook being referenced will run on the Spark pool of which notebook calls this function.
569569

570570
```scala
571571

@@ -585,13 +585,13 @@ After the run finished, you will see a snapshot link named '**View notebook run:
585585

586586

587587
### Exit a notebook
588-
Exits a notebook with a value. You can run nesting function calls in a notebook interactively or in a pipeline.
588+
Exits a notebook with a value. You can run nesting function calls in a notebook interactively or in a pipeline.
589589

590590
- When you call an `exit()` function a notebook interactively, Azure Synapse will throw an exception, skip running subsequence cells, and keep Spark session alive.
591591

592-
- When you orchestrate a notebook that calls an `exit()` function in a Synapse pipeline, Azure Synapse will return an exit value, complete the pipeline run, and stop the Spark session.
592+
- When you orchestrate a notebook that calls an `exit()` function in a Synapse pipeline, Azure Synapse will return an exit value, complete the pipeline run, and stop the Spark session.
593593

594-
- When you call an `exit()` function in a notebook being referenced, Azure Synapse will stop the further execution in the notebook being referenced, and continue to run next cells in the notebook that call the `run()` function. For example: Notebook1 has three cells and calls an `exit()` function in the second cell. Notebook2 has five cells and calls `run(notebook1)` in the third cell. When you run Notebook2, Notebook1 will be stopped at the second cell when hitting the `exit()` function. Notebook2 will continue to run its fourth cell and fifth cell.
594+
- When you call an `exit()` function in a notebook being referenced, Azure Synapse will stop the further execution in the notebook being referenced, and continue to run next cells in the notebook that call the `run()` function. For example: Notebook1 has three cells and calls an `exit()` function in the second cell. Notebook2 has five cells and calls `run(notebook1)` in the third cell. When you run Notebook2, Notebook1 will be stopped at the second cell when hitting the `exit()` function. Notebook2 will continue to run its fourth cell and fifth cell.
595595

596596

597597
```python
@@ -600,9 +600,9 @@ mssparkutils.notebook.exit("value string")
600600

601601
For example:
602602

603-
**Sample1** notebook locates under **mssparkutils/folder/** with following two cells:
603+
**Sample1** notebook locates under **mssparkutils/folder/** with following two cells:
604604
- cell 1 defines an **input** parameter with default value set to 10.
605-
- cell 2 exits the notebook with **input** as exit value.
605+
- cell 2 exits the notebook with **input** as exit value.
606606

607607
![Screenshot of a sample notebook](./media/microsoft-spark-utilities/spark-utilities-run-notebook-sample.png)
608608

@@ -640,7 +640,7 @@ Sample1 run success with input is 20
640640

641641
## Credentials utilities
642642

643-
You can use the MSSparkUtils Credentials Utilities to get the access tokens of linked services and manage secrets in Azure Key Vault.
643+
You can use the MSSparkUtils Credentials Utilities to get the access tokens of linked services and manage secrets in Azure Key Vault.
644644

645645
Run the following command to get an overview of the available methods:
646646

@@ -681,7 +681,7 @@ putSecret(akvName, secretName, secretValue): puts AKV secret for a given akvName
681681

682682
### Get token
683683

684-
Returns Azure AD token for a given audience, name (optional). The table below list all the available audience types:
684+
Returns Azure AD token for a given audience, name (optional). The table below list all the available audience types:
685685

686686
|Audience Type|Audience key|
687687
|--|--|
@@ -748,7 +748,7 @@ Credentials.IsValidToken("your token")
748748

749749
### Get connection string or credentials for linked service
750750

751-
Returns connection string or credentials for linked service.
751+
Returns connection string or credentials for linked service.
752752

753753
:::zone pivot = "programming-language-python"
754754

@@ -804,7 +804,7 @@ Credentials.GetSecret("azure key vault name","secret name","linked service name"
804804

805805
### Get secret using user credentials
806806

807-
Returns Azure Key Vault secret for a given Azure Key Vault name, secret name, and linked service name using user credentials.
807+
Returns Azure Key Vault secret for a given Azure Key Vault name, secret name, and linked service name using user credentials.
808808

809809
:::zone pivot = "programming-language-python"
810810

@@ -873,7 +873,7 @@ Puts Azure Key Vault secret for a given Azure Key Vault name, secret name, and l
873873

874874
### Put secret using user credentials
875875

876-
Puts Azure Key Vault secret for a given Azure Key Vault name, secret name, and linked service name using user credentials.
876+
Puts Azure Key Vault secret for a given Azure Key Vault name, secret name, and linked service name using user credentials.
877877

878878
```python
879879
mssparkutils.credentials.putSecret('azure key vault name','secret name','secret value')
@@ -884,7 +884,7 @@ mssparkutils.credentials.putSecret('azure key vault name','secret name','secret
884884

885885
### Put secret using user credentials
886886

887-
Puts Azure Key Vault secret for a given Azure Key Vault name, secret name, and linked service name using user credentials.
887+
Puts Azure Key Vault secret for a given Azure Key Vault name, secret name, and linked service name using user credentials.
888888

889889
```scala
890890
mssparkutils.credentials.putSecret("azure key vault name","secret name","secret value")
@@ -901,7 +901,7 @@ mssparkutils.credentials.putSecret("azure key vault name","secret name","secret
901901
::: zone-end -->
902902

903903

904-
## Environment utilities
904+
## Environment utilities
905905

906906
Run following commands to get an overview of the available methods:
907907

@@ -1125,11 +1125,36 @@ mssparkutils.runtime.context
11251125
```
11261126
::: zone-end
11271127

1128+
## Session management
1129+
1130+
### Stop an interactive session
1131+
1132+
Instead of manually click stop button, sometimes it's more convenient to stop an interactive session by calling an API in the code. For such cases, we provide an API `mssparkutils.session.stop()` to support stopping the interactive session via code, it's available for Scala and Python.
1133+
1134+
:::zone pivot = "programming-language-python"
1135+
1136+
```python
1137+
mssparkutils.session.stop()
1138+
```
1139+
::: zone-end
1140+
1141+
:::zone pivot = "programming-language-scala"
1142+
1143+
```scala
1144+
mssparkutils.session.stop()
1145+
```
1146+
::: zone-end
1147+
1148+
`mssparkutils.session.stop()` API will stop the current interactive session asynchronously in the background, it stops the Spark session and release resources occupied by the session so they are available to other sessions in the same pool.
1149+
1150+
> [!NOTE]
1151+
> We don't recommend call language built-in APIs like `sys.exit` in Scala or `sys.exit()` in Python in your code, because such APIs just
1152+
> kill the interpreter process, leaving Spark session alive and resources not released.
1153+
11281154
## Next steps
11291155

11301156
- [Check out Synapse sample notebooks](https://github.com/Azure-Samples/Synapse/tree/master/Notebooks)
11311157
- [Quickstart: Create an Apache Spark pool in Azure Synapse Analytics using web tools](../quickstart-apache-spark-notebook.md)
11321158
- [What is Apache Spark in Azure Synapse Analytics](apache-spark-overview.md)
11331159
- [Azure Synapse Analytics](../index.yml)
11341160
- [How to use file mount/unmount API in Synapse](./synapse-file-mount-api.md)
1135-

articles/zone-pivot-groups.yml

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -715,12 +715,12 @@ groups:
715715
title: Programming languages
716716
prompt: Choose a language
717717
pivots:
718-
- id: programming-language-csharp
719-
title: C#
720-
- id: programming-language-scala
721-
title: Scala
722718
- id: programming-language-python
723719
title: Python
720+
- id: programming-language-scala
721+
title: Scala
722+
- id: programming-language-csharp
723+
title: C#
724724
- id: aml-control-methods
725725
# Owner: gopalv
726726
title: Azure Machine Learning control plane

0 commit comments

Comments
 (0)