Skip to content

Commit 61dbbe0

Browse files
authored
Merge pull request #218183 from shuaijunye/updatFoeDEP
Update for DEP
2 parents 859ab5c + 4c16752 commit 61dbbe0

File tree

1 file changed

+33
-4
lines changed

1 file changed

+33
-4
lines changed

articles/synapse-analytics/spark/apache-spark-azure-portal-add-libraries.md

Lines changed: 33 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -46,9 +46,6 @@ To learn more about how to manage workspace libraries, see the following article
4646

4747
- [Manage workspace packages](./apache-spark-manage-workspace-packages.md)
4848

49-
> [!NOTE]
50-
> If you enabled [Data exfiltration protection](../security/workspace-data-exfiltration-protection.md), you should upload all your dependencies as workspace libraries.
51-
5249
## Pool packages
5350

5451
In some cases, you might want to standardize the packages that are used on an Apache Spark pool. This standardization can be useful if the same packages are commonly installed by multiple people on your team.
@@ -63,7 +60,39 @@ To learn more about these capabilities, see [Manage Spark pool packages](./apach
6360
>
6461
> - If the package you are installing is large or takes a long time to install, this fact affects the Spark instance start up time.
6562
> - Altering the PySpark, Python, Scala/Java, .NET, or Spark version is not supported.
66-
> - Installing packages from PyPI is not supported within DEP-enabled workspaces.
63+
64+
### Manage dependencies for DEP-enabled Synapse Spark pools
65+
66+
> [!NOTE]
67+
>
68+
> - Installing packages from public repo is not supported within [DEP-enabled workspaces](../security/workspace-data-exfiltration-protection.md), you should upload all your dependencies as workspace libraries and install to your Spark pool.
69+
>
70+
Please follow the steps below if you have trouble to identify the required dependencies:
71+
72+
- **Step1: Run the following script to set up a local Python environment same with Synapse Spark environment**
73+
The setup script requires [Synapse-Python38-CPU.yml](https://github.com/Azure-Samples/Synapse/blob/main/Spark/Python/Synapse-Python38-CPU.yml) which is the list of libraries shipped in the default python env in Synapse spark.
74+
75+
```powershell
76+
# one-time synapse python setup
77+
wget Synapse-Python38-CPU.yml
78+
sudo bash Miniforge3-Linux-x86_64.sh -b -p /usr/lib/miniforge3
79+
export PATH="/usr/lib/miniforge3/bin:$PATH"
80+
sudo apt-get -yq install gcc g++
81+
conda env create -n synapse-env -f Synapse-Python38-CPU.yml
82+
source activate synapse-env
83+
```
84+
85+
- **Step2: Run the following script to identify the required dependencies**
86+
The below snippet can be used to pass your requirement.txt which has all the packages and version you intend to install in the spark 3.1/spark3.2 spark pool. It will print the names of the *new* wheel files/dependencies needed for your input library requirements. Note this will list out only the dependencies that are not already present in the spark pool by default.
87+
88+
```python
89+
# command to list out wheels needed for your input libraries
90+
# this command will list out only *new* dependencies that are
91+
# not already part of the built-in synapse environment
92+
pip install -r <input-user-req.txt> > pip_output.txt
93+
cat pip_output.txt | grep "Using cached *"
94+
```
95+
6796

6897
## Session-scoped packages
6998

0 commit comments

Comments
 (0)