Skip to content

Commit 5d1d2ee

Browse files
authored
Merge pull request #110095 from dagiro/freshness32
freshness32
2 parents a83050b + 104bb3a commit 5d1d2ee

File tree

1 file changed

+16
-16
lines changed

1 file changed

+16
-16
lines changed

articles/hdinsight/spark/apache-spark-jupyter-notebook-install-locally.md

Lines changed: 16 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -5,14 +5,14 @@ author: hrasheed-msft
55
ms.author: hrasheed
66
ms.reviewer: jasonh
77
ms.service: hdinsight
8-
ms.custom: hdinsightactive
98
ms.topic: conceptual
10-
ms.date: 11/07/2019
9+
ms.custom: hdinsightactive
10+
ms.date: 04/02/2020
1111
---
1212

1313
# Install Jupyter notebook on your computer and connect to Apache Spark on HDInsight
1414

15-
In this article you learn how to install Jupyter notebook, with the custom PySpark (for Python) and Apache Spark (for Scala) kernels with Spark magic, and connect the notebook to an HDInsight cluster. There can be a number of reasons to install Jupyter on your local computer, and there can be some challenges as well. For more on this, see the section [Why should I install Jupyter on my computer](#why-should-i-install-jupyter-on-my-computer) at the end of this article.
15+
In this article, you learn how to install Jupyter notebook with the custom PySpark (for Python) and Apache Spark (for Scala) kernels with Spark magic. You then connect the notebook to an HDInsight cluster.
1616

1717
There are four key steps involved in installing Jupyter and connecting to Apache Spark on HDInsight.
1818

@@ -21,17 +21,17 @@ There are four key steps involved in installing Jupyter and connecting to Apache
2121
* Install the PySpark and Spark kernels with the Spark magic.
2222
* Configure Spark magic to access Spark cluster on HDInsight.
2323

24-
For more information about the custom kernels and the Spark magic available for Jupyter notebooks with HDInsight cluster, see [Kernels available for Jupyter notebooks with Apache Spark Linux clusters on HDInsight](apache-spark-jupyter-notebook-kernels.md).
24+
For more information about custom kernels and Spark magic, see [Kernels available for Jupyter notebooks with Apache Spark Linux clusters on HDInsight](apache-spark-jupyter-notebook-kernels.md).
2525

2626
## Prerequisites
2727

28-
* An Apache Spark cluster on HDInsight. For instructions, see [Create Apache Spark clusters in Azure HDInsight](apache-spark-jupyter-spark-sql.md). This is a prerequisite for connecting the Jupyter notebook to an HDInsight cluster once the notebook is installed.
28+
* An Apache Spark cluster on HDInsight. For instructions, see [Create Apache Spark clusters in Azure HDInsight](apache-spark-jupyter-spark-sql.md). The local notebook connects to the HDInsight cluster.
2929

3030
* Familiarity with using Jupyter Notebooks with Spark on HDInsight.
3131

3232
## Install Jupyter notebook on your computer
3333

34-
You must install Python before you can install Jupyter notebooks. The [Anaconda distribution](https://www.anaconda.com/download/) will install both, Python, and Jupyter Notebook.
34+
Install Python before you install Jupyter notebooks. The [Anaconda distribution](https://www.anaconda.com/download/) will install both, Python, and Jupyter Notebook.
3535

3636
Download the [Anaconda installer](https://www.anaconda.com/download/) for your platform and run the setup. While running the setup wizard, make sure you select the option to add Anaconda to your PATH variable. See also, [Installing Jupyter using Anaconda](https://jupyter.readthedocs.io/en/latest/install.html).
3737

@@ -58,9 +58,9 @@ Download the [Anaconda installer](https://www.anaconda.com/download/) for your p
5858
pip show sparkmagic
5959
```
6060
61-
Then change your working directory to the location identified with the above command.
61+
Then change your working directory to the **location** identified with the above command.
6262
63-
1. From your new working directory, enter one or more of the commands below to install the desired kernel(s):
63+
1. From your new working directory, enter one or more of the commands below to install the wanted kernel(s):
6464
6565
|Kernel | Command |
6666
|---|---|
@@ -85,7 +85,7 @@ In this section, you configure the Spark magic that you installed earlier to con
8585
python
8686
```
8787
88-
2. The Jupyter configuration information is typically stored in the users home directory. Enter the following command to identify the home directory, and create a folder called **.sparkmagic**. The full path will be outputted.
88+
2. The Jupyter configuration information is typically stored in the users home directory. Enter the following command to identify the home directory, and create a folder called **\.sparkmagic**. The full path will be outputted.
8989
9090
```python
9191
import os
@@ -141,16 +141,16 @@ In this section, you configure the Spark magic that you installed earlier to con
141141
jupyter notebook
142142
```
143143
144-
6. Verify that you can use the Spark magic available with the kernels. Perform the following steps.
144+
6. Verify that you can use the Spark magic available with the kernels. Complete the following steps.
145145
146-
a. Create a new notebook. From the right-hand corner, select **New**. You should see the default kernel **Python 2** or **Python 3** and the kernels you installed. The actual values may vary depending on your installation choices. Select **PySpark**.
146+
a. Create a new notebook. From the right-hand corner, select **New**. You should see the default kernel **Python 2** or **Python 3** and the kernels you installed. The actual values may vary depending on your installation choices. Select **PySpark**.
147147
148-
![Available kernels in Jupyter notebook](./media/apache-spark-jupyter-notebook-install-locally/jupyter-kernels-notebook.png "Kernels in Jupyter notebook")
148+
![Available kernels in Jupyter notebook](./media/apache-spark-jupyter-notebook-install-locally/jupyter-kernels-notebook.png "Kernels in Jupyter notebook")
149149
150150
> [!IMPORTANT]
151151
> After selecting **New** review your shell for any errors. If you see the error `TypeError: __init__() got an unexpected keyword argument 'io_loop'` you may be experiencing a known issue with certain versions of Tornado. If so, stop the kernel and then downgrade your Tornado installation with the following command: `pip install tornado==4.5.3`.
152152
153-
b. Run the following code snippet.
153+
b. Run the following code snippet.
154154
155155
```sql
156156
%%sql
@@ -163,9 +163,9 @@ In this section, you configure the Spark magic that you installed earlier to con
163163
164164
## Why should I install Jupyter on my computer?
165165
166-
There can be a number of reasons why you might want to install Jupyter on your computer and then connect it to an Apache Spark cluster on HDInsight.
166+
Reasons to install Jupyter on your computer and then connect it to an Apache Spark cluster on HDInsight:
167167
168-
* Even though Jupyter notebooks are already available on the Spark cluster in Azure HDInsight, installing Jupyter on your computer provides you the option to create your notebooks locally, test your application against a running cluster, and then upload the notebooks to the cluster. To upload the notebooks to the cluster, you can either upload them using the Jupyter notebook that is running or the cluster, or save them to the /HdiNotebooks folder in the storage account associated with the cluster. For more information on how notebooks are stored on the cluster, see [Where are Jupyter notebooks stored](apache-spark-jupyter-notebook-kernels.md#where-are-the-notebooks-stored)?
168+
* Provides you the option to create your notebooks locally, test your application against a running cluster, and then upload the notebooks to the cluster. To upload the notebooks to the cluster, you can either upload them using the Jupyter notebook that is running or the cluster, or save them to the `/HdiNotebooks` folder in the storage account associated with the cluster. For more information on how notebooks are stored on the cluster, see [Where are Jupyter notebooks stored](apache-spark-jupyter-notebook-kernels.md#where-are-the-notebooks-stored)?
169169
* With the notebooks available locally, you can connect to different Spark clusters based on your application requirement.
170170
* You can use GitHub to implement a source control system and have version control for the notebooks. You can also have a collaborative environment where multiple users can work with the same notebook.
171171
* You can work with notebooks locally without even having a cluster up. You only need a cluster to test your notebooks against, not to manually manage your notebooks or a development environment.
@@ -177,5 +177,5 @@ There can be a number of reasons why you might want to install Jupyter on your c
177177
## Next steps
178178
179179
* [Overview: Apache Spark on Azure HDInsight](apache-spark-overview.md)
180-
* [Apache Spark with BI: Perform interactive data analysis using Spark in HDInsight with BI tools](apache-spark-use-bi-tools.md)
180+
* [Apache Spark with BI: Analyze Apache Spark data using Power BI in HDInsight](apache-spark-use-bi-tools.md)
181181
* [Apache Spark with Machine Learning: Use Spark in HDInsight for analyzing building temperature using HVAC data](apache-spark-ipython-notebook-machine-learning.md)

0 commit comments

Comments
 (0)