You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/synapse-analytics/spark/apache-spark-azure-portal-add-libraries.md
+18-14Lines changed: 18 additions & 14 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -4,7 +4,7 @@ description: Learn how to add and manage libraries used by Apache Spark in Azure
4
4
author: shuaijunye
5
5
ms.service: synapse-analytics
6
6
ms.topic: how-to
7
-
ms.date: 11/03/2022
7
+
ms.date: 02/20/2023
8
8
ms.author: shuaijunye
9
9
ms.subservice: spark
10
10
ms.custom: kr2b-contr-experiment
@@ -25,19 +25,22 @@ To make third-party or locally built code available to your applications, instal
25
25
26
26
## Overview of package levels
27
27
28
-
There are three levels of packages installed on Azure Synapse Analytics:
28
+
There are three levels of packages installed on Azure Synapse Analytics:
29
29
30
-
-**Default**: Default packages include a full Anaconda installation, plus extra commonly used libraries. For a full list of libraries, see [Apache Spark version support](apache-spark-version-support.md).
30
+
-**Default**: Default packages include a full Anaconda installation, plus extra commonly used libraries. For a full list of libraries, see [Apache Spark version support](apache-spark-version-support.md).
31
31
32
-
When a Spark instance starts, these libraries are included automatically. You can add more packages at the other levels.
33
-
-**Spark pool**: All running artifacts can use packages at the Spark pool level. For example, you can attach notebook and Spark job definitions to corresponding Spark pools.
32
+
When a Spark instance starts, these libraries are included automatically. You can add more packages at the other levels.
33
+
-**Spark pool**: All running artifacts can use packages at the Spark pool level. For example, you can attach notebook and Spark job definitions to corresponding Spark pools.
34
34
35
35
You can upload custom libraries and a specific version of an open-source library that you want to use in your Azure Synapse Analytics workspace. The workspace packages can be installed in your Spark pools.
36
36
-**Session**: A session-level installation creates an environment for a specific notebook session. The change of session-level libraries isn't persisted between sessions.
37
37
38
38
> [!NOTE]
39
-
> Pool-level library management can take time, depending on the size of the packages and the complexity of required dependencies. We recommend the session-level installation for experimental and quick iterative scenarios.
40
-
39
+
>
40
+
> - Pool-level library management can take time, depending on the size of the packages and the complexity of required dependencies. We recommend the session-level installation for experimental and quick iterative scenarios.
41
+
> - The pool-level library management will produce a stable dependency for running your Notebooks and Spark job definitions. Installing the library to your Spark pool is highly recommended for the pipeline runs.
42
+
> - Session level library management can help you with fast iteration or dealing with the frequent changes of library. However, the stability of session level installation is not promised. Also, in-line commands like %pip and %conda are disabled in pipeline run. Managing library in Notebook session is recommended during the developing phase.
43
+
41
44
## Manage workspace packages
42
45
43
46
When your team develops custom applications or models, you might develop various code artifacts like *.whl*, *.jar*, or *tar.gz* files to package your code.
@@ -52,17 +55,16 @@ In some cases, you might want to standardize the packages that are used on an Ap
52
55
53
56
By using the pool management capabilities of Azure Synapse Analytics, you can configure the default set of libraries to install on a serverless Apache Spark pool. These libraries are installed on top of the [base runtime](./apache-spark-version-support.md).
54
57
55
-
Currently, pool management is supported only for Python. For Python, Azure Synapse Spark pools use Conda to install and manage Python package dependencies.
56
-
57
-
When you're specifying pool-level libraries, you can now provide a *requirements.txt* or *environment.yml* file. This environment configuration file is used every time a Spark instance is created from that Spark pool.
58
+
For Python libraries, Azure Synapse Spark pools use Conda to install and manage Python package dependencies. You can specify the pool-level Python libraries by providing a *requirements.txt* or *environment.yml* file. This environment configuration file is used every time a Spark instance is created from that Spark pool. You can also attach the workspace packages to your pools.
58
59
59
60
To learn more about these capabilities, see [Manage Spark pool packages](./apache-spark-manage-pool-packages.md).
60
61
61
62
> [!IMPORTANT]
63
+
>
62
64
> - If the package that you're installing is large or takes a long time to install, it might affect the Spark instance's startup time.
63
65
> - Altering the PySpark, Python, Scala/Java, .NET, or Spark version is not supported.
64
66
65
-
## Manage dependencies for DEP-enabled Azure Synapse Spark pools
67
+
###Manage dependencies for DEP-enabled Azure Synapse Spark pools
66
68
67
69
> [!NOTE]
68
70
> Installing packages from a public repo is not supported within [DEP-enabled workspaces](../security/workspace-data-exfiltration-protection.md). Instead, upload all your dependencies as workspace libraries and install them to your Spark pool.
@@ -81,7 +83,7 @@ If you're having trouble identifying required dependencies, follow these steps:
81
83
source activate synapse-env
82
84
```
83
85
84
-
1. Run the following script to identify the required dependencies.
86
+
2. Run the following script to identify the required dependencies.
85
87
The script can be used to pass your *requirements.txt* file, which has all the packages and versions that you intend to install in the Spark 3.1 or Spark 3.2 pool. It will print the names of the *new* wheel files/dependencies for your input library requirements.
86
88
87
89
```python
@@ -91,6 +93,7 @@ The script can be used to pass your *requirements.txt* file, which has all the p
> This script will list only the dependencies that are not already present in the Spark pool by default.
96
99
@@ -102,13 +105,14 @@ Session-scoped packages allow users to define package dependencies at the start
102
105
103
106
To learn more about how to manage session-scoped packages, see the following articles:
104
107
105
-
-[Python session packages](./apache-spark-manage-session-packages.md#session-scoped-python-packages): At the start of a session, provide a Conda *environment.yml* file to install more Python packages from popular repositories.
108
+
-[Python session packages](./apache-spark-manage-session-packages.md#session-scoped-python-packages): At the start of a session, provide a Conda *environment.yml* file to install more Python packages from popular repositories. Or you can use %pip and %conda commands to manage libraries in the Notebook code cells.
106
109
107
110
-[Scala/Java session packages](./apache-spark-manage-session-packages.md#session-scoped-java-or-scala-packages): At the start of your session, provide a list of *.jar* files to install by using `%%configure`.
108
111
109
112
-[R session packages](./apache-spark-manage-session-packages.md#session-scoped-r-packages-preview): Within your session, you can install packages across all nodes within your Spark pool by using `install.packages` or `devtools`.
110
113
111
-
## Manage your packages outside the Azure Synapse Analytics UI
114
+
115
+
## Automate the library management process through Azure PowerShell cmdlets and REST APIs
112
116
113
117
If your team wants to manage libraries without visiting the package management UIs, you have the option to manage the workspace packages and pool-level package updates through Azure PowerShell cmdlets or REST APIs for Azure Synapse Analytics.
Copy file name to clipboardExpand all lines: articles/synapse-analytics/spark/apache-spark-manage-packages-outside-UI.md
+9-5Lines changed: 9 additions & 5 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -4,12 +4,12 @@ description: Learn how to manage packages using Azure PowerShell cmdlets or REST
4
4
author: shuaijunye
5
5
ms.service: synapse-analytics
6
6
ms.topic: conceptual
7
-
ms.date: 07/07/2022
7
+
ms.date: 02/23/2023
8
8
ms.author: shuaijunye
9
9
ms.subservice: spark
10
10
---
11
11
12
-
# Manage packages outside Synapse Analytics Studio UIs
12
+
# Automate the library management process through Azure PowerShell cmdlets and REST APIs
13
13
14
14
You may want to manage your libraries for your serverless Apache Spark pools without going into the Synapse Analytics UI pages. For example, you may find that:
15
15
@@ -21,6 +21,7 @@ In this article, we'll provide a general guide to help you managing libraries th
21
21
## Manage packages through Azure PowerShell cmdlets
22
22
23
23
### Add new libraries
24
+
24
25
1.[New-AzSynapseWorkspacePackage](/powershell/module/az.synapse/new-azsynapseworkspacepackage) command can be used to **upload new libraries to workspace**.
25
26
26
27
```powershell
@@ -42,29 +43,31 @@ In this article, we'll provide a general guide to help you managing libraries th
42
43
```
43
44
44
45
### Remove libraries
46
+
45
47
1. In order to **remove a installed package** from your Spark pool, please refer to the command combination of [Get-AzSynapseWorkspacePackage](/powershell/module/az.synapse/get-azsynapseworkspacepackage) and [Update-AzSynapseSparkPool](/powershell/module/az.synapse/update-azsynapsesparkpool).
2. You can also retrieve a Spark pool and **remove all attached workspace libraries** from the pool by calling [Get-AzSynapseSparkPool](/powershell/module/az.synapse/get-azsynapsesparkpool) and [Update-AzSynapseSparkPool](/powershell/module/az.synapse/update-azsynapsesparkpool) commands.
54
+
2. You can also retrieve a Spark pool and **remove all attached workspace libraries** from the pool by calling [Get-AzSynapseSparkPool](/powershell/module/az.synapse/get-azsynapsesparkpool) and [Update-AzSynapseSparkPool](/powershell/module/az.synapse/update-azsynapsesparkpool) commands.
For more Azure PowerShell cmdlets capabilities, please refer to [Azure PowerShell cmdlets for Azure Synapse Analytics](/powershell/module/az.synapse).
59
62
60
-
61
63
## Manage packages through REST APIs
62
64
63
65
### Manage the workspace packages
64
-
With the ability of REST APIs, you can add/delete packages or list all uploaded files of your workspace. See the full supported APIs, please refer to [Overview of workspace library APIs](/rest/api/synapse/data-plane/library).
65
66
67
+
With the ability of REST APIs, you can add/delete packages or list all uploaded files of your workspace. See the full supported APIs, please refer to [Overview of workspace library APIs](/rest/api/synapse/data-plane/library).
66
68
67
69
### Manage the Spark pool packages
70
+
68
71
You can leverage the [Spark pool REST API](/rest/api/synapse/big-data-pools/create-or-update) to attach or remove your custom or open source libraries to your Spark pools.
69
72
70
73
1. For custom libraries, please specify the list of custom files as the **customLibraries** property in request body.
@@ -91,5 +94,6 @@ You can leverage the [Spark pool REST API](/rest/api/synapse/big-data-pools/crea
91
94
```
92
95
93
96
## Next steps
97
+
94
98
- View the default libraries: [Apache Spark version support](apache-spark-version-support.md)
95
99
- Manage Spark pool level packages through Synapse Studio portal: [Python package management on Notebook Session](./apache-spark-manage-session-packages.md#session-scoped-python-packages)
0 commit comments