Skip to content

Commit 1f44e0c

Browse files
authored
Update apache-spark-python-package-installation.md
1 parent 3799daf commit 1f44e0c

File tree

1 file changed

+12
-12
lines changed

1 file changed

+12
-12
lines changed

articles/hdinsight/spark/apache-spark-python-package-installation.md

Lines changed: 12 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -33,7 +33,7 @@ There are two types of open-source components that are available in the HDInsigh
3333
3434
## Understand default Python installation
3535

36-
HDInsight Spark clusters have Anaconda installed. There are two Python installations in the cluster, Anaconda Python 2.7 and Python 3.5. The table below shows the default Python settings for Spark, Livy, and Jupyter.
36+
HDInsight Spark clusters have Anaconda installed. There are two Python installations in the cluster, Anaconda Python 2.7 and Python 3.5. The following table shows the default Python settings for Spark, Livy, and Jupyter.
3737

3838
|Setting |Python 2.7|Python 3.5|
3939
|----|----|----|
@@ -42,7 +42,7 @@ HDInsight Spark clusters have Anaconda installed. There are two Python installat
4242
|Livy version|Default set to 2.7|Can change config to 3.5|
4343
|Jupyter|PySpark kernel|PySpark3 kernel|
4444

45-
For the Spark 3.1.2 version, the Apache PySpark kernel is removed and a new Python 3.8 environment is installed under `/usr/bin/miniforge/envs/py38/bin` which is used by the PySpark3 kernel. The `PYSPARK_PYTHON` and `PYSPARK3_PYTHON` environment variables are updated with the following:
45+
For the Spark 3.1.2 version, the Apache PySpark kernel is removed and a new Python 3.8 environment is installed under `/usr/bin/miniforge/envs/py38/bin`, which is used by the PySpark3 kernel. The `PYSPARK_PYTHON` and `PYSPARK3_PYTHON` environment variables are updated with the following:
4646

4747
```bash
4848
export PYSPARK_PYTHON=${PYSPARK_PYTHON:-/usr/bin/miniforge/envs/py38/bin/python}
@@ -51,9 +51,9 @@ export PYSPARK3_PYTHON=${PYSPARK_PYTHON:-/usr/bin/miniforge/envs/py38/bin/python
5151

5252
## Safely install external Python packages
5353

54-
HDInsight cluster depends on the built-in Python environment, both Python 2.7 and Python 3.5. Directly installing custom packages in those default built-in environments may cause unexpected library version changes. And break the cluster further. To safely install custom external Python packages for your Spark applications, follow below steps.
54+
HDInsight cluster depends on the built-in Python environment, both Python 2.7 and Python 3.5. Directly installing custom packages in those default built-in environments may cause unexpected library version changes. And break the cluster further. To safely install custom external Python packages for your Spark applications, follow the steps.
5555

56-
1. Create Python virtual environment using conda. A virtual environment provides an isolated space for your projects without breaking others. When creating the Python virtual environment, you can specify Python version that you want to use. You still need to create virtual environment even though you would like to use Python 2.7 and 3.5. This requirement is to make sure the cluster's default environment not getting broke. Run script actions on your cluster for all nodes with below script to create a Python virtual environment.
56+
1. Create Python virtual environment using conda. A virtual environment provides an isolated space for your projects without breaking others. When creating the Python virtual environment, you can specify Python version that you want to use. You still need to create virtual environment even though you would like to use Python 2.7 and 3.5. This requirement is to make sure the cluster's default environment not getting broke. Run script actions on your cluster for all nodes with following script to create a Python virtual environment.
5757

5858
- `--prefix` specifies a path where a conda virtual environment lives. There are several configs that need to be changed further based on the path specified here. In this example, we use the py35new, as the cluster has an existing virtual environment called py35 already.
5959
- `python=` specifies the Python version for the virtual environment. In this example, we use version 3.5, the same version as the cluster built in one. You can also use other Python versions to create the virtual environment.
@@ -63,11 +63,11 @@ HDInsight cluster depends on the built-in Python environment, both Python 2.7 an
6363
sudo /usr/bin/anaconda/bin/conda create --prefix /usr/bin/anaconda/envs/py35new python=3.5 anaconda=4.3 --yes
6464
```
6565

66-
2. Install external Python packages in the created virtual environment if needed. Run script actions on your cluster for all nodes with below script to install external Python packages. You need to have sudo privilege here to write files to the virtual environment folder.
66+
2. Install external Python packages in the created virtual environment if needed. Run script actions on your cluster for all nodes with following script to install external Python packages. You need to have sudo privilege here to write files to the virtual environment folder.
6767

6868
Search the [package index](https://pypi.python.org/pypi) for the complete list of packages that are available. You can also get a list of available packages from other sources. For example, you can install packages made available through [conda-forge](https://conda-forge.org/feedstocks/).
6969

70-
Use below command if you would like to install a library with its latest version:
70+
Use following command if you would like to install a library with its latest version:
7171

7272
- Use conda channel:
7373

@@ -105,11 +105,11 @@ HDInsight cluster depends on the built-in Python environment, both Python 2.7 an
105105
106106
3. Change Spark and Livy configs and point to the created virtual environment.
107107
108-
1. Open Ambari UI, go to Spark2 page, Configs tab.
108+
1. Open Ambari UI, go to Spark 2 page, Configs tab.
109109
110110
:::image type="content" source="./media/apache-spark-python-package-installation/ambari-spark-and-livy-config.png" alt-text="Change Spark and Livy config through Ambari" border="true":::
111111
112-
2. Expand Advanced livy2-env, add below statements at bottom. If you installed the virtual environment with a different prefix, change the path correspondingly.
112+
2. Expand Advanced livy2-env, add following statements at bottom. If you installed the virtual environment with a different prefix, change the path correspondingly.
113113
114114
```bash
115115
export PYSPARK_PYTHON=/usr/bin/anaconda/envs/py35new/bin/python
@@ -126,7 +126,7 @@ HDInsight cluster depends on the built-in Python environment, both Python 2.7 an
126126
127127
:::image type="content" source="./media/apache-spark-python-package-installation/ambari-spark-config.png" alt-text="Change Spark config through Ambari" border="true":::
128128
129-
4. Save the changes and restart affected services. These changes need a restart of Spark2 service. Ambari UI will prompt a required restart reminder, click Restart to restart all affected services.
129+
4. Save the changes and restart affected services. These changes need a restart of Spark 2 service. Ambari UI will prompt a required restart reminder, click Restart to restart all affected services.
130130
131131
:::image type="content" source="./media/apache-spark-python-package-installation/ambari-restart-services.png" alt-text="Restart services" border="true":::
132132
@@ -139,7 +139,7 @@ HDInsight cluster depends on the built-in Python environment, both Python 2.7 an
139139
spark.conf.set("spark.yarn.appMasterEnv.PYSPARK_DRIVER_PYTHON", "/usr/bin/anaconda/envs/py35/bin/python")
140140
```
141141
142-
If you are using livy, add the following properties to the request body:
142+
If you are using `livy`, add the following properties to the request body:
143143
144144
```
145145
"conf" : {
@@ -148,13 +148,13 @@ HDInsight cluster depends on the built-in Python environment, both Python 2.7 an
148148
}
149149
```
150150
151-
4. If you would like to use the new created virtual environment on Jupyter. Change Jupyter configs and restart Jupyter. Run script actions on all header nodes with below statement to point Jupyter to the new created virtual environment. Make sure to modify the path to the prefix you specified for your virtual environment. After running this script action, restart Jupyter service through Ambari UI to make this change available.
151+
4. If you would like to use the new created virtual environment on Jupyter. Change Jupyter configs and restart Jupyter. Run script actions on all header nodes with following statement to point Jupyter to the new created virtual environment. Make sure to modify the path to the prefix you specified for your virtual environment. After running this script action, restart Jupyter service through Ambari UI to make this change available.
152152
153153
```bash
154154
sudo sed -i '/python3_executable_path/c\ \"python3_executable_path\" : \"/usr/bin/anaconda/envs/py35new/bin/python3\"' /home/spark/.sparkmagic/config.json
155155
```
156156
157-
You could double confirm the Python environment in Jupyter Notebook by running below code:
157+
You could double confirm the Python environment in Jupyter Notebook by running the code:
158158
159159
:::image type="content" source="./media/apache-spark-python-package-installation/check-python-version-in-jupyter.png" alt-text="Check Python version in Jupyter Notebook" border="true":::
160160

0 commit comments

Comments
 (0)