You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/hdinsight/spark/apache-spark-python-package-installation.md
+12-12Lines changed: 12 additions & 12 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -33,7 +33,7 @@ There are two types of open-source components that are available in the HDInsigh
33
33
34
34
## Understand default Python installation
35
35
36
-
HDInsight Spark clusters have Anaconda installed. There are two Python installations in the cluster, Anaconda Python 2.7 and Python 3.5. The table below shows the default Python settings for Spark, Livy, and Jupyter.
36
+
HDInsight Spark clusters have Anaconda installed. There are two Python installations in the cluster, Anaconda Python 2.7 and Python 3.5. The following table shows the default Python settings for Spark, Livy, and Jupyter.
37
37
38
38
|Setting |Python 2.7|Python 3.5|
39
39
|----|----|----|
@@ -42,7 +42,7 @@ HDInsight Spark clusters have Anaconda installed. There are two Python installat
42
42
|Livy version|Default set to 2.7|Can change config to 3.5|
43
43
|Jupyter|PySpark kernel|PySpark3 kernel|
44
44
45
-
For the Spark 3.1.2 version, the Apache PySpark kernel is removed and a new Python 3.8 environment is installed under `/usr/bin/miniforge/envs/py38/bin` which is used by the PySpark3 kernel. The `PYSPARK_PYTHON` and `PYSPARK3_PYTHON` environment variables are updated with the following:
45
+
For the Spark 3.1.2 version, the Apache PySpark kernel is removed and a new Python 3.8 environment is installed under `/usr/bin/miniforge/envs/py38/bin`, which is used by the PySpark3 kernel. The `PYSPARK_PYTHON` and `PYSPARK3_PYTHON` environment variables are updated with the following:
HDInsight cluster depends on the built-in Python environment, both Python 2.7 and Python 3.5. Directly installing custom packages in those default built-in environments may cause unexpected library version changes. And break the cluster further. To safely install custom external Python packages for your Spark applications, follow below steps.
54
+
HDInsight cluster depends on the built-in Python environment, both Python 2.7 and Python 3.5. Directly installing custom packages in those default built-in environments may cause unexpected library version changes. And break the cluster further. To safely install custom external Python packages for your Spark applications, follow the steps.
55
55
56
-
1. Create Python virtual environment using conda. A virtual environment provides an isolated space for your projects without breaking others. When creating the Python virtual environment, you can specify Python version that you want to use. You still need to create virtual environment even though you would like to use Python 2.7 and 3.5. This requirement is to make sure the cluster's default environment not getting broke. Run script actions on your cluster for all nodes with below script to create a Python virtual environment.
56
+
1. Create Python virtual environment using conda. A virtual environment provides an isolated space for your projects without breaking others. When creating the Python virtual environment, you can specify Python version that you want to use. You still need to create virtual environment even though you would like to use Python 2.7 and 3.5. This requirement is to make sure the cluster's default environment not getting broke. Run script actions on your cluster for all nodes with following script to create a Python virtual environment.
57
57
58
58
-`--prefix` specifies a path where a conda virtual environment lives. There are several configs that need to be changed further based on the path specified here. In this example, we use the py35new, as the cluster has an existing virtual environment called py35 already.
59
59
-`python=` specifies the Python version for the virtual environment. In this example, we use version 3.5, the same version as the cluster built in one. You can also use other Python versions to create the virtual environment.
@@ -63,11 +63,11 @@ HDInsight cluster depends on the built-in Python environment, both Python 2.7 an
2. Install external Python packages in the created virtual environment if needed. Run script actions on your cluster for all nodes with below script to install external Python packages. You need to have sudo privilege here to write files to the virtual environment folder.
66
+
2. Install external Python packages in the created virtual environment if needed. Run script actions on your cluster for all nodes with following script to install external Python packages. You need to have sudo privilege here to write files to the virtual environment folder.
67
67
68
68
Search the [package index](https://pypi.python.org/pypi) for the complete list of packages that are available. You can also get a list of available packages from other sources. For example, you can install packages made available through [conda-forge](https://conda-forge.org/feedstocks/).
69
69
70
-
Use belowcommandif you would like to install a library with its latest version:
70
+
Use followingcommandif you would like to install a library with its latest version:
71
71
72
72
- Use conda channel:
73
73
@@ -105,11 +105,11 @@ HDInsight cluster depends on the built-in Python environment, both Python 2.7 an
105
105
106
106
3. Change Spark and Livy configs and point to the created virtual environment.
107
107
108
-
1. Open Ambari UI, go to Spark2 page, Configs tab.
108
+
1. Open Ambari UI, go to Spark 2 page, Configs tab.
109
109
110
110
:::image type="content" source="./media/apache-spark-python-package-installation/ambari-spark-and-livy-config.png" alt-text="Change Spark and Livy config through Ambari" border="true":::
111
111
112
-
2. Expand Advanced livy2-env, add below statements at bottom. If you installed the virtual environment with a different prefix, change the path correspondingly.
112
+
2. Expand Advanced livy2-env, add following statements at bottom. If you installed the virtual environment with a different prefix, change the path correspondingly.
@@ -126,7 +126,7 @@ HDInsight cluster depends on the built-in Python environment, both Python 2.7 an
126
126
127
127
:::image type="content" source="./media/apache-spark-python-package-installation/ambari-spark-config.png" alt-text="Change Spark config through Ambari" border="true":::
128
128
129
-
4. Save the changes and restart affected services. These changes need a restart of Spark2 service. Ambari UI will prompt a required restart reminder, click Restart to restart all affected services.
129
+
4. Save the changes and restart affected services. These changes need a restart of Spark 2 service. Ambari UI will prompt a required restart reminder, click Restart to restart all affected services.
If you are using livy, add the following properties to the request body:
142
+
If you are using `livy`, add the following properties to the request body:
143
143
144
144
```
145
145
"conf" : {
@@ -148,13 +148,13 @@ HDInsight cluster depends on the built-in Python environment, both Python 2.7 an
148
148
}
149
149
```
150
150
151
-
4. If you would like to use the new created virtual environment on Jupyter. Change Jupyter configs and restart Jupyter. Run script actions on all header nodes with below statement to point Jupyter to the new created virtual environment. Make sure to modify the path to the prefix you specified for your virtual environment. After running this script action, restart Jupyter service through Ambari UI to make this change available.
151
+
4. If you would like to use the new created virtual environment on Jupyter. Change Jupyter configs and restart Jupyter. Run script actions on all header nodes with following statement to point Jupyter to the new created virtual environment. Make sure to modify the path to the prefix you specified for your virtual environment. After running this script action, restart Jupyter service through Ambari UI to make this change available.
152
152
153
153
```bash
154
154
sudo sed -i '/python3_executable_path/c\ \"python3_executable_path\":\"/usr/bin/anaconda/envs/py35new/bin/python3\"' /home/spark/.sparkmagic/config.json
155
155
```
156
156
157
-
You could double confirm the Python environment in Jupyter Notebook by running below code:
157
+
You could double confirm the Python environment in Jupyter Notebook by running the code:
158
158
159
159
:::image type="content" source="./media/apache-spark-python-package-installation/check-python-version-in-jupyter.png" alt-text="Check Python version in Jupyter Notebook" border="true":::
0 commit comments