You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@@ -19,9 +19,9 @@ Microsoft Spark Utilities (MSSparkUtils) is a builtin package to help you easily
19
19
20
20
## Pre-requisites
21
21
22
-
### Configure access to Azure Data Lake Storage Gen2
22
+
### Configure access to Azure Data Lake Storage Gen2
23
23
24
-
Synapse notebooks use Azure Active Directory (Azure AD) pass-through to access the ADLS Gen2 accounts. You need to be a **Storage Blob Data Contributor** to access the ADLS Gen2 account (or folder).
24
+
Synapse notebooks use Azure Active Directory (Azure AD) pass-through to access the ADLS Gen2 accounts. You need to be a **Storage Blob Data Contributor** to access the ADLS Gen2 account (or folder).
25
25
26
26
Synapse pipelines use workspace's Managed Service Identity (MSI) to access the storage accounts. To use MSSparkUtils in your pipeline activities, your workspace identity needs to be **Storage Blob Data Contributor** to access the ADLS Gen2 account (or folder).
27
27
@@ -30,7 +30,7 @@ Follow these steps to make sure your Azure AD and workspace MSI have access to t
30
30
1. Select the **Access control (IAM)** from the left panel.
31
31
1. Select **Add** > **Add role assignment** to open the Add role assignment page.
32
32
1. Assign the following role. For detailed steps, see [Assign Azure roles using the Azure portal](../../role-based-access-control/role-assignments-portal.md).
33
-
33
+
34
34
| Setting | Value |
35
35
| --- | --- |
36
36
| Role | Storage Blob Data Contributor |
@@ -41,14 +41,14 @@ Follow these steps to make sure your Azure AD and workspace MSI have access to t
41
41
> The managed identity name is also the workspace name.
42
42
43
43

44
-
44
+
45
45
1. Select **Save**.
46
46
47
47
You can access data on ADLS Gen2 with Synapse Spark via the following URL:
Synapse use [**Shared access signature (SAS)**](../../storage/common/storage-sas-overview.md) to access Azure Blob Storage. To avoid exposing SAS keys in the code, we recommend creating a new linked service in Synapse workspace to the Azure Blob Storage account you want to access.
54
54
@@ -60,7 +60,7 @@ Follow these steps to add a new linked service for an Azure Blob Storage account
60
60
4. Select **Continue**.
61
61
5. Select the Azure Blob Storage Account to access and configure the linked service name. Suggest using **Account key** for the **Authentication method**.
62
62
6. Select **Test connection** to validate the settings are correct.
63
-
7. Select **Create** first and click **Publish all** to save your changes.
63
+
7. Select **Create** first and click **Publish all** to save your changes.
64
64
65
65
You can access data on Azure Blob Storage with Synapse Spark via following URL:
You can add an Azure Key Vault as a linked service to manage your credentials in Synapse.
133
+
You can add an Azure Key Vault as a linked service to manage your credentials in Synapse.
134
134
Follow these steps to add an Azure Key Vault as a Synapse linked service:
135
135
1. Open the [Azure Synapse Studio](https://web.azuresynapse.net/).
136
136
2. Select **Manage** from the left panel and select **Linked services** under the **External connections**.
137
137
3. Search **Azure Key Vault** in the **New linked Service** panel on the right.
138
138
4. Select the Azure Key Vault Account to access and configure the linked service name.
139
139
5. Select **Test connection** to validate the settings are correct.
140
-
6. Select **Create** first and click **Publish all** to save your change.
140
+
6. Select **Create** first and click **Publish all** to save your change.
141
141
142
142
Synapse notebooks use Azure active directory(Azure AD) pass-through to access Azure Key Vault. Synapse pipelines use workspace identity(MSI) to access Azure Key Vault. To make sure your code work both in notebook and in Synapse pipeline, we recommend granting secret access permission for both your Azure AD account and workspace identity.
143
143
144
144
Follow these steps to grant secret access to your workspace identity:
145
-
1. Open the [Azure portal](https://portal.azure.com/) and the Azure Key Vault you want to access.
145
+
1. Open the [Azure portal](https://portal.azure.com/) and the Azure Key Vault you want to access.
146
146
2. Select the **Access policies** from the left panel.
147
-
3. Select **Add Access Policy**:
147
+
3. Select **Add Access Policy**:
148
148
- Choose **Key, Secret, & Certificate Management** as config template.
149
-
- Select **your Azure AD account** and **your workspace identity** (same as your workspace name) in the select principal or make sure it is already assigned.
149
+
- Select **your Azure AD account** and **your workspace identity** (same as your workspace name) in the select principal or make sure it is already assigned.
150
150
4. Select **Select** and **Add**.
151
-
5. Select the **Save** button to commit changes.
151
+
5. Select the **Save** button to commit changes.
152
152
153
153
## File system utilities
154
154
@@ -429,29 +429,29 @@ Removes a file or a directory.
429
429
:::zone pivot = "programming-language-python"
430
430
431
431
```python
432
-
mssparkutils.fs.rm('file path', True) # Set the last parameter as True to remove all files and directories recursively
432
+
mssparkutils.fs.rm('file path', True) # Set the last parameter as True to remove all files and directories recursively
433
433
```
434
434
::: zone-end
435
435
436
436
:::zone pivot = "programming-language-scala"
437
437
438
438
```scala
439
-
mssparkutils.fs.rm("file path", true) // Set the last parameter as True to remove all files and directories recursively
439
+
mssparkutils.fs.rm("file path", true) // Set the last parameter as True to remove all files and directories recursively
440
440
```
441
441
442
442
::: zone-end
443
443
444
444
:::zone pivot = "programming-language-csharp"
445
445
446
446
```csharp
447
-
FS.Rm("file path", true) // Set the last parameter as True to remove all files and directories recursively
447
+
FS.Rm("file path", true) // Set the last parameter as True to remove all files and directories recursively
448
448
```
449
449
450
450
::: zone-end
451
451
452
452
453
453
454
-
## Notebook utilities
454
+
## Notebook utilities
455
455
456
456
:::zone pivot = "programming-language-csharp"
457
457
@@ -461,7 +461,7 @@ Not supported.
461
461
462
462
:::zone pivot = "programming-language-python"
463
463
464
-
You can use the MSSparkUtils Notebook Utilities to run a notebook or exit a notebook with a value.
464
+
You can use the MSSparkUtils Notebook Utilities to run a notebook or exit a notebook with a value.
465
465
Run the following command to get an overview of the available methods:
Reference a notebook and returns its exit value. You can run nesting function calls in a notebook interactively or in a pipeline. The notebook being referenced will run on the Spark pool of which notebook calls this function.
481
+
Reference a notebook and returns its exit value. You can run nesting function calls in a notebook interactively or in a pipeline. The notebook being referenced will run on the Spark pool of which notebook calls this function.
482
482
483
483
```python
484
484
@@ -497,13 +497,13 @@ After the run finished, you will see a snapshot link named '**View notebook run:
497
497

498
498
499
499
### Exit a notebook
500
-
Exits a notebook with a value. You can run nesting function calls in a notebook interactively or in a pipeline.
500
+
Exits a notebook with a value. You can run nesting function calls in a notebook interactively or in a pipeline.
501
501
502
502
- When you call an `exit()` function a notebook interactively, Azure Synapse will throw an exception, skip running subsequence cells, and keep Spark session alive.
503
503
504
-
- When you orchestrate a notebook that calls an `exit()` function in a Synapse pipeline, Azure Synapse will return an exit value, complete the pipeline run, and stop the Spark session.
504
+
- When you orchestrate a notebook that calls an `exit()` function in a Synapse pipeline, Azure Synapse will return an exit value, complete the pipeline run, and stop the Spark session.
505
505
506
-
- When you call an `exit()` function in a notebook being referenced, Azure Synapse will stop the further execution in the notebook being referenced, and continue to run next cells in the notebook that call the `run()` function. For example: Notebook1 has three cells and calls an `exit()` function in the second cell. Notebook2 has five cells and calls `run(notebook1)` in the third cell. When you run Notebook2, Notebook1 will be stopped at the second cell when hitting the `exit()` function. Notebook2 will continue to run its fourth cell and fifth cell.
506
+
- When you call an `exit()` function in a notebook being referenced, Azure Synapse will stop the further execution in the notebook being referenced, and continue to run next cells in the notebook that call the `run()` function. For example: Notebook1 has three cells and calls an `exit()` function in the second cell. Notebook2 has five cells and calls `run(notebook1)` in the third cell. When you run Notebook2, Notebook1 will be stopped at the second cell when hitting the `exit()` function. Notebook2 will continue to run its fourth cell and fifth cell.
Reference a notebook and returns its exit value. You can run nesting function calls in a notebook interactively or in a pipeline. The notebook being referenced will run on the Spark pool of which notebook calls this function.
568
+
Reference a notebook and returns its exit value. You can run nesting function calls in a notebook interactively or in a pipeline. The notebook being referenced will run on the Spark pool of which notebook calls this function.
569
569
570
570
```scala
571
571
@@ -585,13 +585,13 @@ After the run finished, you will see a snapshot link named '**View notebook run:
585
585
586
586
587
587
### Exit a notebook
588
-
Exits a notebook with a value. You can run nesting function calls in a notebook interactively or in a pipeline.
588
+
Exits a notebook with a value. You can run nesting function calls in a notebook interactively or in a pipeline.
589
589
590
590
- When you call an `exit()` function a notebook interactively, Azure Synapse will throw an exception, skip running subsequence cells, and keep Spark session alive.
591
591
592
-
- When you orchestrate a notebook that calls an `exit()` function in a Synapse pipeline, Azure Synapse will return an exit value, complete the pipeline run, and stop the Spark session.
592
+
- When you orchestrate a notebook that calls an `exit()` function in a Synapse pipeline, Azure Synapse will return an exit value, complete the pipeline run, and stop the Spark session.
593
593
594
-
- When you call an `exit()` function in a notebook being referenced, Azure Synapse will stop the further execution in the notebook being referenced, and continue to run next cells in the notebook that call the `run()` function. For example: Notebook1 has three cells and calls an `exit()` function in the second cell. Notebook2 has five cells and calls `run(notebook1)` in the third cell. When you run Notebook2, Notebook1 will be stopped at the second cell when hitting the `exit()` function. Notebook2 will continue to run its fourth cell and fifth cell.
594
+
- When you call an `exit()` function in a notebook being referenced, Azure Synapse will stop the further execution in the notebook being referenced, and continue to run next cells in the notebook that call the `run()` function. For example: Notebook1 has three cells and calls an `exit()` function in the second cell. Notebook2 has five cells and calls `run(notebook1)` in the third cell. When you run Notebook2, Notebook1 will be stopped at the second cell when hitting the `exit()` function. Notebook2 will continue to run its fourth cell and fifth cell.
Instead of manually click stop button, sometimes it's more convenient to stop an interactive session by calling an API in the code. For such cases, we provide an API `mssparkutils.session.stop()` to support stopping the interactive session via code, it's available for Scala and Python.
1133
+
1134
+
:::zone pivot = "programming-language-python"
1135
+
1136
+
```python
1137
+
mssparkutils.session.stop()
1138
+
```
1139
+
::: zone-end
1140
+
1141
+
:::zone pivot = "programming-language-scala"
1142
+
1143
+
```scala
1144
+
mssparkutils.session.stop()
1145
+
```
1146
+
::: zone-end
1147
+
1148
+
`mssparkutils.session.stop()` API will stop the current interactive session asynchronously in the background, it stops the Spark session and release resources occupied by the session so they are available to other sessions in the same pool.
1149
+
1150
+
> [!NOTE]
1151
+
> We don't recommend call language built-in APIs like `sys.exit` in Scala or `sys.exit()` in Python in your code, because such APIs just
1152
+
> kill the interpreter process, leaving Spark session alive and resources not released.
1153
+
1128
1154
## Next steps
1129
1155
1130
1156
-[Check out Synapse sample notebooks](https://github.com/Azure-Samples/Synapse/tree/master/Notebooks)
1131
1157
-[Quickstart: Create an Apache Spark pool in Azure Synapse Analytics using web tools](../quickstart-apache-spark-notebook.md)
1132
1158
-[What is Apache Spark in Azure Synapse Analytics](apache-spark-overview.md)
1133
1159
-[Azure Synapse Analytics](../index.yml)
1134
1160
-[How to use file mount/unmount API in Synapse](./synapse-file-mount-api.md)
0 commit comments