Skip to content

Commit b3f6464

Browse files
Merge pull request #259623 from v-akarnase/patch-11
Update apache-spark-manage-dependencies.md
2 parents 3d6676d + fa23000 commit b3f6464

File tree

1 file changed

+7
-6
lines changed

1 file changed

+7
-6
lines changed

articles/hdinsight/spark/apache-spark-manage-dependencies.md

Lines changed: 7 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ ms.author: apsinhar
66
ms.service: hdinsight
77
ms.custom: hdinsightactive, ignite-2022, devx-track-python
88
ms.topic: how-to
9-
ms.date: 11/23/2023
9+
ms.date: 11/28/2023
1010
#Customer intent: As a developer for Apache Spark and Apache Spark in Azure HDInsight, I want to learn how to manage my Spark application dependencies and install packages on my HDInsight cluster.
1111
---
1212

@@ -29,7 +29,7 @@ When a Spark session starts in Jupyter Notebook on Spark kernel for Scala, you c
2929
* [Maven Repository](https://search.maven.org/), or community-contributed packages at [Spark Packages](https://spark-packages.org/).
3030
* Jar files stored on your cluster's primary storage.
3131

32-
You'll use the `%%configure` magic to configure the notebook to use an external package. In notebooks that use external packages, make sure you call the `%%configure` magic in the first code cell. This ensures that the kernel is configured to use the package before the session starts.
32+
You can use the `%%configure` magic to configure the notebook to use an external package. In notebooks that use external packages, make sure you call the `%%configure` magic in the first code cell. This ensures that the kernel is configured to use the package before the session starts.
3333

3434
>
3535
>[!IMPORTANT]
@@ -71,7 +71,7 @@ import com.microsoft.azure.cosmosdb.spark._
7171
## Jar libs for cluster
7272
In some cases, you may want to configure the jar dependencies at cluster level so that every application can be set up with same dependencies by default. The approach is to add your jar paths to Spark driver and executor class path.
7373

74-
1. Run below sample script actions to copy jar files from primary storage `wasb://[email protected]/libs/*` to cluster local file system `/usr/libs/sparklibs`. The step is needed as linux uses `:` to separate class path list, but HDInsight only support storage paths with scheme like `wasb://`. The remote storage path won't work correctly if you directly add it to class path.
74+
1. Run sample script actions to copy jar files from primary storage `wasb://[email protected]/libs/*` to cluster local file system `/usr/libs/sparklibs`. The step is needed as linux uses `:` to separate class path list, but HDInsight only support storage paths with scheme like `wasb://`. The remote storage path won't work correctly if you directly add it to class path.
7575

7676
```bash
7777
sudo mkdir -p /usr/libs/sparklibs
@@ -98,16 +98,17 @@ HDInsight cluster has built-in jar dependencies, and updates for these jar versi
9898

9999
## Python packages for one Spark job
100100
### Use Jupyter Notebook
101-
HDInsight Jupyter Notebook PySpark kernel doesn't support installing Python packages from PyPi or Anaconda package repository directly. If you have `.zip`, `.egg`, or `.py` dependencies, and want to reference them for one Spark session, follow below steps:
102101

103-
1. Run below sample script actions to copy `.zip`, `.egg` or `.py` files from primary storage `wasb://[email protected]/libs/*` to cluster local file system `/usr/libs/pylibs`. The step is needed as linux uses `:` to separate search path list, but HDInsight only support storage paths with scheme like `wasb://`. The remote storage path won't work correctly when you use `sys.path.insert`.
102+
HDInsight Jupyter Notebook PySpark kernel doesn't support installing Python packages from PyPi or Anaconda package repository directly. If you have `.zip`, `.egg`, or `.py` dependencies, and want to reference them for one Spark session, follow steps:
103+
104+
1. Run sample script actions to copy `.zip`, `.egg` or `.py` files from primary storage `wasb://[email protected]/libs/*` to cluster local file system `/usr/libs/pylibs`. The step is needed as linux uses `:` to separate search path list, but HDInsight only support storage paths with scheme like `wasb://`. The remote storage path won't work correctly when you use `sys.path.insert`.
104105

105106
```bash
106107
sudo mkdir -p /usr/libs/pylibs
107108
sudo hadoop fs -copyToLocal wasb://[email protected]/libs/*.* /usr/libs/pylibs
108109
```
109110

110-
2. In your notebook, run below code in a code cell with PySpark kernel:
111+
2. In your notebook, run following code in a code cell with PySpark kernel:
111112

112113
```python
113114
import sys

0 commit comments

Comments
 (0)