Skip to content

Commit d5e6057

Browse files
Merge pull request #204217 from shuaijunye/UpdateForLMDocs
Documentation updates
2 parents 71c048c + c320ea1 commit d5e6057

13 files changed

+362
-333
lines changed

articles/machine-learning/how-to-data-prep-synapse-spark-pool.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -96,7 +96,7 @@ env.register(workspace=ws)
9696
To begin data preparation with the Apache Spark pool and your custom environment, specify the Apache Spark pool name and which environment to use during the Apache Spark session. Furthermore, you can provide your subscription ID, the machine learning workspace resource group, and the name of the machine learning workspace.
9797

9898
>[!IMPORTANT]
99-
> Make sure to [Allow session level packages](../synapse-analytics/spark/apache-spark-manage-python-packages.md#session-scoped-packages) is enabled in the linked Synapse workspace.
99+
> Make sure to [Allow session level packages](../synapse-analytics/spark/apache-spark-manage-session-packages.md#session-scoped-python-packages) is enabled in the linked Synapse workspace.
100100
>
101101
>![enable session level packages](media/how-to-data-prep-synapse-spark-pool/enable-session-level-package.png)
102102
@@ -301,7 +301,7 @@ script_run_config = ScriptRunConfig(source_directory = './code',
301301
run_config = run_config)
302302
```
303303

304-
For more infomation about `run_config.spark.configuration` and general Spark configuration, see [SparkConfiguration Class](/python/api/azureml-core/azureml.core.runconfig.sparkconfiguration) and [Apache Spark's configuration documentation](https://spark.apache.org/docs/latest/configuration.html).
304+
For more information about `run_config.spark.configuration` and general Spark configuration, see [SparkConfiguration Class](/python/api/azureml-core/azureml.core.runconfig.sparkconfiguration) and [Apache Spark's configuration documentation](https://spark.apache.org/docs/latest/configuration.html).
305305

306306
Once your `ScriptRunConfig` object is set up, you can submit the run.
307307

articles/synapse-analytics/.openpublishing.redirection.synapse-analytics.json

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -694,6 +694,16 @@
694694
"source_path_from_root": "/articles/sql-data-warehouse/what-is-a-data-warehouse-unit-dwu-cdwu.md",
695695
"redirect_url": "/azure/synapse-analytics/sql-data-warehouse/what-is-a-data-warehouse-unit-dwu-cdwu",
696696
"redirect_document_id": true
697+
},
698+
{
699+
"source_path_from_root": "/articles/synapse-analytics/spark/apache-spark-manage-python-packages.md",
700+
"redirect_url": "/azure/synapse-analytics/spark/apache-spark-azure-portal-add-libraries",
701+
"redirect_document_id": true
702+
},
703+
{
704+
"source_path_from_root": "/articles/synapse-analytics/spark/apache-spark-manage-scala-packages.md",
705+
"redirect_url": "/azure/synapse-analytics/spark/apache-spark-azure-portal-add-libraries",
706+
"redirect_document_id": false
697707
}
698708
]
699709
}

articles/synapse-analytics/spark/apache-spark-azure-portal-add-libraries.md

Lines changed: 25 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -4,9 +4,8 @@ description: Learn how to add and manage libraries used by Apache Spark in Azure
44
author: shuaijunye
55
ms.service: synapse-analytics
66
ms.topic: how-to
7-
ms.date: 06/08/2022
7+
ms.date: 07/07/2022
88
ms.author: shuaijunye
9-
ms.reviewer: sngun
109
ms.subservice: spark
1110
ms.custom: kr2b-contr-experiment
1211
---
@@ -23,23 +22,29 @@ You might need to update your serverless Apache Spark pool environment for vario
2322
- Your team has built a custom package that you need available in your Apache Spark pool.
2423

2524
To make third party or locally built code available to your applications, install a library onto one of your serverless Apache Spark pools or notebook session.
25+
26+
> [!IMPORTANT]
27+
>
28+
> - There are three levels of package installing on Synapse Analytics -- default level, Spark pool level and session level.
29+
> - Apache Spark in Azure Synapse Analytics has a full Anaconda install plus extra libraries served as the default level installation which is fully managed by Synapse. The Spark pool level packages can be used by all running Artifacts, e.g., Notebook and Spark job definition attaching the corresponding Spark pool. The session level installation will create an environment for the specific Notebook session, the change of session level libraries will not be persisted between sessions.
30+
> - You can upload custom libraries and a specific version of an open-source library that you would like to use in your Azure Synapse Analytics Workspace. The workspace packages can be installed in your Spark pools.
31+
> - To be noted, the pool level library management can take certain amount of time depending on the size of packages and the complexity of required dependencies. The session level installation is suggested with experimental and quick iterative scenarios.
2632
2733
## Default Installation
2834

29-
Apache Spark in Azure Synapse Analytics has a full Anaconda install plus extra libraries. The full libraries list can be found at [Apache Spark version support](apache-spark-version-support.md).
35+
Default packages include a full Anaconda install plus extra commonly used libraries. The full libraries list can be found at [Apache Spark version support](apache-spark-version-support.md).
3036

3137
When a Spark instance starts, these libraries are included automatically. More packages can be added at the Spark pool level or session level.
3238

3339
## Workspace packages
3440

3541
When your team develops custom applications or models, you might develop various code artifacts like *.whl* or *.jar* files to package your code.
3642

37-
In Synapse, workspace packages can be custom or private *.whl* or *.jar* files. You can upload these packages to your workspace and later assign them to a specific Spark pool. Once assigned, these workspace packages are installed automatically on all Spark pool sessions.
43+
In Synapse, workspace packages can be custom or private *.whl* or *.jar* files. You can upload these packages to your workspace and later assign them to a specific serverless Apache Spark pool. Once assigned, these workspace packages are installed automatically on all Spark pool sessions.
3844

39-
To learn more about how to manage workspace libraries, see the following articles:
45+
To learn more about how to manage workspace libraries, see the following article:
4046

41-
- [Python workspace packages: ](./apache-spark-manage-python-packages.md#install-wheel-files) Upload Python *.whl* files as a workspace package and later add these packages to specific serverless Apache Spark pools.
42-
- [Scala/Java workspace packages: ](./apache-spark-manage-scala-packages.md#workspace-packages) Upload Scala and Java *.jar* files as a workspace package and later add these packages to specific serverless Apache Spark pools.
47+
- [Manage workspace packages](./apache-spark-manage-workspace-packages.md)
4348

4449
## Pool packages
4550

@@ -49,7 +54,7 @@ Using the Azure Synapse Analytics pool management capabilities, you can configur
4954

5055
Currently, pool management is only supported for Python. For Python, Synapse Spark pools use Conda to install and manage Python package dependencies. When specifying your pool-level libraries, you can now provide a *requirements.txt* or an *environment.yml* file. This environment configuration file is used every time a Spark instance is created from that Spark pool.
5156

52-
To learn more about these capabilities, see [Python pool management](./apache-spark-manage-python-packages.md#pool-libraries).
57+
To learn more about these capabilities, see [Manage Spark pool packages](./apache-spark-manage-pool-packages.md).
5358

5459
> [!IMPORTANT]
5560
>
@@ -65,9 +70,18 @@ Session-scoped packages allow users to define package dependencies at the start
6570

6671
To learn more about how to manage session-scoped packages, see the following articles:
6772

68-
- [Python session packages: ](./apache-spark-manage-python-packages.md) At the start of a session, provide a Conda *environment.yml* to install more Python packages from popular repositories.
69-
- [Scala/Java session packages: ](./apache-spark-manage-scala-packages.md) At the start of your session, provide a list of *.jar* files to install using `%%configure`.
73+
- [Python session packages: ](./apache-spark-manage-session-packages.md#session-scoped-python-packages) At the start of a session, provide a Conda *environment.yml* to install more Python packages from popular repositories.
74+
- [Scala/Java session packages: ](./apache-spark-manage-session-packages.md#session-scoped-java-or-scala-packages) At the start of your session, provide a list of *.jar* files to install using `%%configure`.
7075

71-
## Next steps
76+
## Manage your packages outside Synapse Analytics UI
7277

78+
If your team want to manage the libraries without visiting the package management UIs, you have the options to manage the workspace packages and pool level package updates through Azure PowerShell cmdlets or REST APIs for Synapse Analytics.
79+
80+
To learn more about Azure PowerShell cmdlets and package management REST APIs, see the following articles:
81+
82+
- Azure PowerShell cmdlets for Synapse Analytics: [Manage your Spark pool libraries through Azure PowerShell cmdlets](apache-spark-manage-packages-outside-ui.md#manage-packages-through-azure-powershell-cmdlets)
83+
- Package management REST APIs: [Manage your Spark pool libraries through REST APIs](apache-spark-manage-packages-outside-ui.md#manage-packages-through-rest-apis)
84+
85+
## Next steps
7386
- View the default libraries: [Apache Spark version support](apache-spark-version-support.md)
87+
- Troubleshoot library installation errors: [Troubleshoot library errors](apache-spark-troubleshoot-library-errors.md)

articles/synapse-analytics/spark/apache-spark-custom-conda-channel.md

Lines changed: 5 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,11 @@
11
---
22
title: Create custom Conda channel for package management
33
description: Learn how to create a custom Conda channel for package management
4-
author: midesa
4+
author: shuaijunye
55
ms.service: synapse-analytics
66
ms.topic: conceptual
7-
ms.date: 08/11/2021
8-
ms.author: midesa
9-
ms.reviewer: sngun
7+
ms.date: 07/07/2022
8+
ms.author: shuaijunye
109
ms.subservice: spark
1110
---
1211

@@ -125,9 +124,8 @@ conda env create --file sample.yml
125124
source activate env
126125
conda list
127126
```
128-
Now that you've verified your custom channel, you can use the [Python pool management](./apache-spark-manage-python-packages.md) process to update the libraries on your Apache Spark pool.
127+
Now that you've verified your custom channel, you can use the [Python pool management](./apache-spark-manage-pool-packages.md#manage-packages-from-synapse-studio-or-azure-portal) process to update the libraries on your Apache Spark pool.
129128

130129
## Next steps
131130
- View the default libraries: [Apache Spark version support](apache-spark-version-support.md)
132-
- Manage Python packages: [Python package management](./apache-spark-manage-python-packages.md)
133-
131+
- Manage Session level Python packages: [Python package management on Notebook Session](./apache-spark-manage-session-packages.md#session-scoped-python-packages)
Lines changed: 96 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,96 @@
1+
---
2+
title: Manage packages outside Synapse Analytics Studio UIs
3+
description: Learn how to manage packages using Azure PowerShell cmdlets or REST APIs
4+
author: shuaijunye
5+
ms.service: synapse-analytics
6+
ms.topic: conceptual
7+
ms.date: 07/07/2022
8+
ms.author: shuaijunye
9+
ms.subservice: spark
10+
---
11+
12+
# Manage packages outside Synapse Analytics Studio UIs
13+
14+
You may want to manage your libraries for your serverless Apache Spark pools without going into the Synapse Analytics UI pages. For example, you may find that:
15+
16+
- you develop a custom package and want to upload it to your workspace and use it in your Spark pool. And you want to finish the steps on your local tools without visiting the package management UIs.
17+
- you are updating your packages through the CI/CD process
18+
19+
In this article, we'll provide a general guide to help you managing libraries through Azure PowerShell cmdlets or REST APIs.
20+
21+
## Manage packages through Azure PowerShell cmdlets
22+
23+
### Add new libraries
24+
1. [New-AzSynapseWorkspacePackage](https://docs.microsoft.com/powershell/module/az.synapse/new-azsynapseworkspacepackage) command can be used to **upload new libraries to workspace**.
25+
26+
```powershell
27+
New-AzSynapseWorkspacePackage -WorkspaceName ContosoWorkspace -Package ".\ContosoPackage.whl"
28+
```
29+
30+
2. The combination of [New-AzSynapseWorkspacePackage](https://docs.microsoft.com/powershell/module/az.synapse/new-azsynapseworkspacepackage) and [Update-AzSynapseSparkPool](https://docs.microsoft.com/powershell/module/az.synapse/update-azsynapsesparkpool) commands can be used to **upload new libraries to workspace** and **attach the library to a Spark pool**.
31+
32+
```powershell
33+
$package = New-AzSynapseWorkspacePackage -WorkspaceName ContosoWorkspace -Package ".\ContosoPackage.whl"
34+
Update-AzSynapseSparkPool -WorkspaceName ContosoWorkspace -Name ContosoSparkPool -PackageAction Add -Package $package
35+
```
36+
37+
3. If you want to attach an **existing workspace library** to your Spark pool, please refer to the command combination of [Get-AzSynapseWorkspacePackage](https://docs.microsoft.com/powershell/module/az.synapse/get-azsynapseworkspacepackage) and [Update-AzSynapseSparkPool](https://docs.microsoft.com/powershell/module/az.synapse/update-azsynapsesparkpool).
38+
39+
```powershell
40+
$packages = Get-AzSynapseWorkspacePackage -WorkspaceName ContosoWorkspace
41+
Update-AzSynapseSparkPool -WorkspaceName ContosoWorkspace -Name ContosoSparkPool -PackageAction Add -Package $packages
42+
```
43+
44+
### Remove libraries
45+
1. In order to **remove a installed package** from your Spark pool, please refer to the command combination of [Get-AzSynapseWorkspacePackage](https://docs.microsoft.com/powershell/module/az.synapse/get-azsynapseworkspacepackage) and [Update-AzSynapseSparkPool](https://docs.microsoft.com/powershell/module/az.synapse/update-azsynapsesparkpool).
46+
47+
```powershell
48+
$package = Get-AzSynapseWorkspacePackage -WorkspaceName ContosoWorkspace -Name ContosoPackage
49+
Update-AzSynapseSparkPool -WorkspaceName ContosoWorkspace -Name ContosoSparkPool -PackageAction Remove -Package $package
50+
```
51+
52+
2. You can also retrieve a Spark pool and **remove all attached workspace libraries** from the pool by calling [Get-AzSynapseSparkPool](https://docs.microsoft.com/powershell/module/az.synapse/get-azsynapsesparkpool) and [Update-AzSynapseSparkPool](https://docs.microsoft.com/powershell/module/az.synapse/update-azsynapsesparkpool) commands.
53+
```powershell
54+
$pool = Get-AzSynapseSparkPool -ResourceGroupName ContosoResourceGroup -WorkspaceName ContosoWorkspace -Name ContosoSparkPool
55+
$pool | Update-AzSynapseSparkPool -PackageAction Remove -Package $pool.WorkspacePackages
56+
```
57+
58+
For more Azure PowerShell cmdlets capabilities, please refer to [Azure PowerShell cmdlets for Azure Synapse Analytics](https://docs.microsoft.com/powershell/module/az.synapse).
59+
60+
61+
## Manage packages through REST APIs
62+
63+
### Manage the workspace packages
64+
With the ability of REST APIs, you can add/delete packages or list all uploaded files of your workspace. See the full supported APIs, please refer to [Overview of workspace library APIs](https://docs.microsoft.com/rest/api/synapse/data-plane/library).
65+
66+
67+
### Manage the Spark pool packages
68+
You can leverage the [Spark pool REST API](https://docs.microsoft.com/rest/api/synapse/big-data-pools/create-or-update) to attach or remove your custom or open source libraries to your Spark pools.
69+
70+
1. For custom libraries, please specify the list of custom files as the **customLibraries** property in request body.
71+
72+
```json
73+
"customLibraries": [
74+
{
75+
"name": "samplejartestfile.jar",
76+
"path": "<workspace-name>/libraries/<jar-name>.jar",
77+
"containerName": "prep",
78+
"uploadedTimestamp": "1970-01-01T00:00:00Z",
79+
"type": "jar"
80+
}
81+
]
82+
```
83+
84+
2. You can also update your Spark pool libraries by specifying the **libraryRequirements** property in request body.
85+
86+
```json
87+
"libraryRequirements": {
88+
"content": "",
89+
"filename": "requirements.txt"
90+
}
91+
```
92+
93+
## Next steps
94+
- View the default libraries: [Apache Spark version support](apache-spark-version-support.md)
95+
- Manage Spark pool level packages through Synapse Studio portal: [Python package management on Notebook Session](./apache-spark-manage-session-packages.md#session-scoped-python-packages)
96+

0 commit comments

Comments
 (0)