You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/hdinsight/spark/apache-spark-jupyter-notebook-install-locally.md
+16-16Lines changed: 16 additions & 16 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -5,14 +5,14 @@ author: hrasheed-msft
5
5
ms.author: hrasheed
6
6
ms.reviewer: jasonh
7
7
ms.service: hdinsight
8
-
ms.custom: hdinsightactive
9
8
ms.topic: conceptual
10
-
ms.date: 11/07/2019
9
+
ms.custom: hdinsightactive
10
+
ms.date: 04/02/2020
11
11
---
12
12
13
13
# Install Jupyter notebook on your computer and connect to Apache Spark on HDInsight
14
14
15
-
In this article you learn how to install Jupyter notebook, with the custom PySpark (for Python) and Apache Spark (for Scala) kernels with Spark magic, and connect the notebook to an HDInsight cluster. There can be a number of reasons to install Jupyter on your local computer, and there can be some challenges as well. For more on this, see the section [Why should I install Jupyter on my computer](#why-should-i-install-jupyter-on-my-computer) at the end of this article.
15
+
In this article, you learn how to install Jupyter notebook with the custom PySpark (for Python) and Apache Spark (for Scala) kernels with Spark magic. You then connect the notebook to an HDInsight cluster.
16
16
17
17
There are four key steps involved in installing Jupyter and connecting to Apache Spark on HDInsight.
18
18
@@ -21,17 +21,17 @@ There are four key steps involved in installing Jupyter and connecting to Apache
21
21
* Install the PySpark and Spark kernels with the Spark magic.
22
22
* Configure Spark magic to access Spark cluster on HDInsight.
23
23
24
-
For more information about the custom kernels and the Spark magic available for Jupyter notebooks with HDInsight cluster, see [Kernels available for Jupyter notebooks with Apache Spark Linux clusters on HDInsight](apache-spark-jupyter-notebook-kernels.md).
24
+
For more information about custom kernels and Spark magic, see [Kernels available for Jupyter notebooks with Apache Spark Linux clusters on HDInsight](apache-spark-jupyter-notebook-kernels.md).
25
25
26
26
## Prerequisites
27
27
28
-
* An Apache Spark cluster on HDInsight. For instructions, see [Create Apache Spark clusters in Azure HDInsight](apache-spark-jupyter-spark-sql.md). This is a prerequisite for connecting the Jupyter notebook to an HDInsight cluster once the notebook is installed.
28
+
* An Apache Spark cluster on HDInsight. For instructions, see [Create Apache Spark clusters in Azure HDInsight](apache-spark-jupyter-spark-sql.md). The local notebook connects to the HDInsight cluster.
29
29
30
30
* Familiarity with using Jupyter Notebooks with Spark on HDInsight.
31
31
32
32
## Install Jupyter notebook on your computer
33
33
34
-
You must install Python before you can install Jupyter notebooks. The [Anaconda distribution](https://www.anaconda.com/download/) will install both, Python, and Jupyter Notebook.
34
+
Install Python before you install Jupyter notebooks. The [Anaconda distribution](https://www.anaconda.com/download/) will install both, Python, and Jupyter Notebook.
35
35
36
36
Download the [Anaconda installer](https://www.anaconda.com/download/) for your platform and run the setup. While running the setup wizard, make sure you select the option to add Anaconda to your PATH variable. See also, [Installing Jupyter using Anaconda](https://jupyter.readthedocs.io/en/latest/install.html).
37
37
@@ -58,9 +58,9 @@ Download the [Anaconda installer](https://www.anaconda.com/download/) for your p
58
58
pip show sparkmagic
59
59
```
60
60
61
-
Then change your working directory to the location identified with the above command.
61
+
Then change your working directory to the **location** identified with the above command.
62
62
63
-
1. From your new working directory, enter one or more of the commands below to install the desired kernel(s):
63
+
1. From your new working directory, enter one or more of the commands below to install the wanted kernel(s):
64
64
65
65
|Kernel | Command |
66
66
|---|---|
@@ -85,7 +85,7 @@ In this section, you configure the Spark magic that you installed earlier to con
85
85
python
86
86
```
87
87
88
-
2. The Jupyter configuration information is typically stored in the users home directory. Enter the following command to identify the home directory, and create a folder called **.sparkmagic**. The full path will be outputted.
88
+
2. The Jupyter configuration information is typically stored in the users home directory. Enter the following command to identify the home directory, and create a folder called **\.sparkmagic**. The full path will be outputted.
89
89
90
90
```python
91
91
import os
@@ -141,16 +141,16 @@ In this section, you configure the Spark magic that you installed earlier to con
141
141
jupyter notebook
142
142
```
143
143
144
-
6. Verify that you can use the Spark magic available with the kernels. Perform the following steps.
144
+
6. Verify that you can use the Spark magic available with the kernels. Complete the following steps.
145
145
146
-
a. Create a new notebook. From the right-hand corner, select **New**. You should see the default kernel **Python 2** or **Python 3** and the kernels you installed. The actual values may vary depending on your installation choices. Select **PySpark**.
146
+
a. Create a new notebook. From the right-hand corner, select **New**. You should see the default kernel **Python 2** or **Python 3** and the kernels you installed. The actual values may vary depending on your installation choices. Select **PySpark**.
147
147
148
-

148
+

149
149
150
150
> [!IMPORTANT]
151
151
> After selecting **New** review your shell for any errors. If you see the error `TypeError: __init__() got an unexpected keyword argument 'io_loop'` you may be experiencing a known issue with certain versions of Tornado. If so, stop the kernel and then downgrade your Tornado installation with the following command: `pip install tornado==4.5.3`.
152
152
153
-
b. Run the following code snippet.
153
+
b. Run the following code snippet.
154
154
155
155
```sql
156
156
%%sql
@@ -163,9 +163,9 @@ In this section, you configure the Spark magic that you installed earlier to con
163
163
164
164
## Why should I install Jupyter on my computer?
165
165
166
-
There can be a number of reasons why you might want to install Jupyter on your computer and then connect it to an Apache Spark cluster on HDInsight.
166
+
Reasons to install Jupyter on your computer and then connect it to an Apache Spark cluster on HDInsight:
167
167
168
-
* Even though Jupyter notebooks are already available on the Spark cluster in Azure HDInsight, installing Jupyter on your computer provides you the option to create your notebooks locally, test your application against a running cluster, and then upload the notebooks to the cluster. To upload the notebooks to the cluster, you can either upload them using the Jupyter notebook that is running or the cluster, or save them to the /HdiNotebooks folder in the storage account associated with the cluster. For more information on how notebooks are stored on the cluster, see [Where are Jupyter notebooks stored](apache-spark-jupyter-notebook-kernels.md#where-are-the-notebooks-stored)?
168
+
* Provides you the option to create your notebooks locally, test your application against a running cluster, and then upload the notebooks to the cluster. To upload the notebooks to the cluster, you can either upload them using the Jupyter notebook that is running or the cluster, or save them to the `/HdiNotebooks` folder in the storage account associated with the cluster. For more information on how notebooks are stored on the cluster, see [Where are Jupyter notebooks stored](apache-spark-jupyter-notebook-kernels.md#where-are-the-notebooks-stored)?
169
169
* With the notebooks available locally, you can connect to different Spark clusters based on your application requirement.
170
170
* You can use GitHub to implement a source control system and have version control for the notebooks. You can also have a collaborative environment where multiple users can work with the same notebook.
171
171
* You can work with notebooks locally without even having a cluster up. You only need a cluster to test your notebooks against, not to manually manage your notebooks or a development environment.
@@ -177,5 +177,5 @@ There can be a number of reasons why you might want to install Jupyter on your c
177
177
## Next steps
178
178
179
179
* [Overview: Apache Spark on Azure HDInsight](apache-spark-overview.md)
180
-
* [Apache Spark with BI: Perform interactive data analysis using Spark in HDInsight with BI tools](apache-spark-use-bi-tools.md)
180
+
* [Apache Spark with BI: Analyze Apache Spark data using Power BI in HDInsight](apache-spark-use-bi-tools.md)
181
181
* [Apache Spark with Machine Learning: Use Spark in HDInsight for analyzing building temperature using HVAC data](apache-spark-ipython-notebook-machine-learning.md)
0 commit comments