Skip to content

Commit a5ada18

Browse files
committed
freshness40
1 parent 5a738ea commit a5ada18

File tree

1 file changed

+19
-18
lines changed

1 file changed

+19
-18
lines changed

articles/hdinsight/spark/apache-spark-zeppelin-notebook.md

Lines changed: 19 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -7,17 +7,17 @@ ms.reviewer: jasonh
77
ms.service: hdinsight
88
ms.topic: conceptual
99
ms.custom: hdinsightactive
10-
ms.date: 02/18/2020
10+
ms.date: 04/07/2020
1111
---
1212

1313
# Use Apache Zeppelin notebooks with Apache Spark cluster on Azure HDInsight
1414

15-
HDInsight Spark clusters include [Apache Zeppelin](https://zeppelin.apache.org/) notebooks that you can use to run [Apache Spark](https://spark.apache.org/) jobs. In this article, you learn how to use the Zeppelin notebook on an HDInsight cluster.
15+
HDInsight Spark clusters include [Apache Zeppelin](https://zeppelin.apache.org/) notebooks. Use the notebooks to run [Apache Spark](https://spark.apache.org/) jobs. In this article, you learn how to use the Zeppelin notebook on an HDInsight cluster.
1616

1717
## Prerequisites
1818

1919
* An Apache Spark cluster on HDInsight. For instructions, see [Create Apache Spark clusters in Azure HDInsight](apache-spark-jupyter-spark-sql.md).
20-
* The URI scheme for your clusters primary storage. This would be `wasb://` for Azure Blob Storage, `abfs://` for Azure Data Lake Storage Gen2 or `adl://` for Azure Data Lake Storage Gen1. If secure transfer is enabled for Blob Storage, the URI would be `wasbs://`. For more information, see [Require secure transfer in Azure Storage](../../storage/common/storage-require-secure-transfer.md) .
20+
* The URI scheme for your clusters primary storage. The scheme would be `wasb://` for Azure Blob Storage, `abfs://` for Azure Data Lake Storage Gen2 or `adl://` for Azure Data Lake Storage Gen1. If secure transfer is enabled for Blob Storage, the URI would be `wasbs://`. For more information, see [Require secure transfer in Azure Storage](../../storage/common/storage-require-secure-transfer.md) .
2121

2222
## Launch an Apache Zeppelin notebook
2323

@@ -66,7 +66,7 @@ HDInsight Spark clusters include [Apache Zeppelin](https://zeppelin.apache.org/)
6666
hvac.registerTempTable("hvac")
6767
```
6868

69-
Press **SHIFT + ENTER** or select the **Play** button for the paragraph to run the snippet. The status on the right-corner of the paragraph should progress from READY, PENDING, RUNNING to FINISHED. The output shows up at the bottom of the same paragraph. The screenshot looks like the following:
69+
Press **SHIFT + ENTER** or select the **Play** button for the paragraph to run the snippet. The status on the right-corner of the paragraph should progress from READY, PENDING, RUNNING to FINISHED. The output shows up at the bottom of the same paragraph. The screenshot looks like the following image:
7070

7171
![Create a temporary table from raw data](./media/apache-spark-zeppelin-notebook/hdinsight-zeppelin-load-data.png "Create a temporary table from raw data")
7272

@@ -75,7 +75,7 @@ HDInsight Spark clusters include [Apache Zeppelin](https://zeppelin.apache.org/)
7575
> [!NOTE]
7676
> %spark2 interpreter is not supported in Zeppelin notebooks across all HDInsight versions, and %sh interpreter will not be supported from HDInsight 4.0 onwards.
7777

78-
5. You can now run Spark SQL statements on the `hvac` table. Paste the following query in a new paragraph. The query retrieves the building ID and the difference between the target and actual temperatures for each building on a given date. Press **SHIFT + ENTER**.
78+
5. You can now run Spark SQL statements on the `hvac` table. Paste the following query in a new paragraph. The query retrieves the building ID. Also the difference between the target and actual temperatures for each building on a given date. Press **SHIFT + ENTER**.
7979

8080
```sql
8181
%sql
@@ -84,7 +84,7 @@ HDInsight Spark clusters include [Apache Zeppelin](https://zeppelin.apache.org/)
8484

8585
The **%sql** statement at the beginning tells the notebook to use the Livy Scala interpreter.
8686

87-
6. Select the **Bar Chart** icon to change the display. **settings**, which appears after you have selected **Bar Chart**, allows you to choose **Keys**, and **Values**. The following screenshot shows the output.
87+
6. Select the **Bar Chart** icon to change the display. **settings**, appear after you have selected **Bar Chart**, allows you to choose **Keys**, and **Values**. The following screenshot shows the output.
8888

8989
![Run a Spark SQL statement using the notebook1](./media/apache-spark-zeppelin-notebook/hdinsight-zeppelin-spark-query-1.png "Run a Spark SQL statement using the notebook1")
9090

@@ -108,7 +108,7 @@ HDInsight Spark clusters include [Apache Zeppelin](https://zeppelin.apache.org/)
108108

109109
## How do I use external packages with the notebook?
110110

111-
You can configure the Zeppelin notebook in Apache Spark cluster on HDInsight to use external, community-contributed packages that aren't included out-of-the-box in the cluster. You can search the [Maven repository](https://search.maven.org/) for the complete list of packages that are available. You can also get a list of available packages from other sources. For example, a complete list of community-contributed packages is available at [Spark Packages](https://spark-packages.org/).
111+
Zeppelin notebook in Apache Spark cluster on HDInsight can use external, community-contributed packages that aren't included in the cluster. Search the [Maven repository](https://search.maven.org/) for the complete list of packages that are available. You can also get a list of available packages from other sources. For example, a complete list of community-contributed packages is available at [Spark Packages](https://spark-packages.org/).
112112

113113
In this article, you'll see how to use the [spark-csv](https://search.maven.org/#artifactdetails%7Ccom.databricks%7Cspark-csv_2.10%7C1.4.0%7Cjar) package with the Jupyter notebook.
114114

@@ -144,21 +144,22 @@ The Zeppelin notebooks are saved to the cluster headnodes. So, if you delete the
144144

145145
![Download notebook](./media/apache-spark-zeppelin-notebook/zeppelin-download-notebook.png "Download the notebook")
146146

147-
This saves the notebook as a JSON file in your download location.
147+
This action saves the notebook as a JSON file in your download location.
148148

149-
## Use Shiro to Configure Access to Zeppelin Interpreters in Enterprise Security Package (ESP) Clusters
150-
As noted above, the `%sh` interpreter is not supported from HDInsight 4.0 onwards. Furthermore, since `%sh` interpreter introduces potential security issues, such as access keytabs using shell commands, it has been removed from HDInsight 3.6 ESP clusters as well. It means `%sh` interpreter is not available when clicking **Create new note** or in the Interpreter UI by default.
149+
## Use `Shiro` to Configure Access to Zeppelin Interpreters in Enterprise Security Package (ESP) Clusters
151150

152-
Privileged domain users can utilize the `Shiro.ini` file to control access to the Interpreter UI. Thus, only these users can create new `%sh` interpreters and set permissions on each new `%sh` interpreter. To control access using the `shiro.ini` file, use the following steps:
151+
As noted above, the `%sh` interpreter isn't supported from HDInsight 4.0 onwards. Furthermore, since `%sh` interpreter introduces potential security issues, such as access keytabs using shell commands, it has been removed from HDInsight 3.6 ESP clusters as well. It means `%sh` interpreter isn't available when clicking **Create new note** or in the Interpreter UI by default.
153152

154-
1. Define a new role using an existing domain group name. In the following example, `adminGroupName` is a group of privileged users in AAD. Do not use special characters or white spaces in the group name. The characters after `=` give the permissions for this role. `*` means the group has full permissions.
153+
Privileged domain users can use the `Shiro.ini` file to control access to the Interpreter UI. Only these users can create new `%sh` interpreters and set permissions on each new `%sh` interpreter. To control access using the `shiro.ini` file, use the following steps:
154+
155+
1. Define a new role using an existing domain group name. In the following example, `adminGroupName` is a group of privileged users in AAD. Don't use special characters or white spaces in the group name. The characters after `=` give the permissions for this role. `*` means the group has full permissions.
155156

156157
```
157158
[roles]
158159
adminGroupName = *
159160
```
160161

161-
2. Add the new role for access to Zeppelin interpreters. In the following example, all users in `adminGroupName` are given access to Zeppelin interpreters and are able to create new interpreters. You can put multiple roles between the brackets in `roles[]`, separated by commas. Then, users that have the necessary permissions, can access Zeppelin interpreters.
162+
2. Add the new role for access to Zeppelin interpreters. In the following example, all users in `adminGroupName` are given access to Zeppelin interpreters and can create new interpreters. You can put multiple roles between the brackets in `roles[]`, separated by commas. Then, users that have the necessary permissions, can access Zeppelin interpreters.
162163

163164
```
164165
[urls]
@@ -167,9 +168,9 @@ Privileged domain users can utilize the `Shiro.ini` file to control access to th
167168
168169
## Livy session management
169170
170-
When you run the first code paragraph in your Zeppelin notebook, a new Livy session is created in your HDInsight Spark cluster. This session is shared across all Zeppelin notebooks that you subsequently create. If for some reason the Livy session is killed (cluster reboot, and so on), you won't be able to run jobs from the Zeppelin notebook.
171+
The first code paragraph in your Zeppelin notebook creates a new Livy session in your cluster. This session is shared across all Zeppelin notebooks that you later create. If the Livy session is killed for any reason, jobs won't run from the Zeppelin notebook.
171172
172-
In such a case, you must perform the following steps before you can start running jobs from a Zeppelin notebook.
173+
In such a case, you must do the following steps before you can start running jobs from a Zeppelin notebook.
173174
174175
1. Restart the Livy interpreter from the Zeppelin notebook. To do so, open interpreter settings by selecting the logged in user name from the top-right corner, then select **Interpreter**.
175176
@@ -179,7 +180,7 @@ In such a case, you must perform the following steps before you can start runnin
179180
180181
![Restart the Livy interpreter](./media/apache-spark-zeppelin-notebook/hdinsight-zeppelin-restart-interpreter.png "Restart the Zeppelin interpreter")
181182
182-
3. Run a code cell from an existing Zeppelin notebook. This creates a new Livy session in the HDInsight cluster.
183+
3. Run a code cell from an existing Zeppelin notebook. This code creates a new Livy session in the HDInsight cluster.
183184
184185
## General information
185186
@@ -201,7 +202,7 @@ To validate the service from a command line, SSH to the head node. Switch user t
201202
|---|---|
202203
|zeppelin-server|/usr/hdp/current/zeppelin-server/|
203204
|Server Logs|/var/log/zeppelin|
204-
|Configuration Interpreter, Shiro, site.xml, log4j|/usr/hdp/current/zeppelin-server/conf or /etc/zeppelin/conf|
205+
|Configuration Interpreter, `Shiro`, site.xml, log4j|/usr/hdp/current/zeppelin-server/conf or /etc/zeppelin/conf|
205206
|PID directory|/var/run/zeppelin|
206207
207208
### Enable debug logging
@@ -222,7 +223,7 @@ To validate the service from a command line, SSH to the head node. Switch user t
222223
223224
### Scenarios
224225
225-
* [Apache Spark with BI: Perform interactive data analysis using Spark in HDInsight with BI tools](apache-spark-use-bi-tools.md)
226+
* [Apache Spark with BI: Interactive data analysis using Spark in HDInsight with BI tools](apache-spark-use-bi-tools.md)
226227
* [Apache Spark with Machine Learning: Use Spark in HDInsight for analyzing building temperature using HVAC data](apache-spark-ipython-notebook-machine-learning.md)
227228
* [Apache Spark with Machine Learning: Use Spark in HDInsight to predict food inspection results](apache-spark-machine-learning-mllib-ipython.md)
228229
* [Website log analysis using Apache Spark in HDInsight](apache-spark-custom-library-website-log-analysis.md)

0 commit comments

Comments
 (0)