You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/hdinsight/spark/apache-spark-zeppelin-notebook.md
+19-18Lines changed: 19 additions & 18 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -7,17 +7,17 @@ ms.reviewer: jasonh
7
7
ms.service: hdinsight
8
8
ms.topic: conceptual
9
9
ms.custom: hdinsightactive
10
-
ms.date: 02/18/2020
10
+
ms.date: 04/07/2020
11
11
---
12
12
13
13
# Use Apache Zeppelin notebooks with Apache Spark cluster on Azure HDInsight
14
14
15
-
HDInsight Spark clusters include [Apache Zeppelin](https://zeppelin.apache.org/) notebooks that you can use to run [Apache Spark](https://spark.apache.org/) jobs. In this article, you learn how to use the Zeppelin notebook on an HDInsight cluster.
15
+
HDInsight Spark clusters include [Apache Zeppelin](https://zeppelin.apache.org/) notebooks. Use the notebooks to run [Apache Spark](https://spark.apache.org/) jobs. In this article, you learn how to use the Zeppelin notebook on an HDInsight cluster.
16
16
17
17
## Prerequisites
18
18
19
19
* An Apache Spark cluster on HDInsight. For instructions, see [Create Apache Spark clusters in Azure HDInsight](apache-spark-jupyter-spark-sql.md).
20
-
* The URI scheme for your clusters primary storage. This would be `wasb://` for Azure Blob Storage, `abfs://` for Azure Data Lake Storage Gen2 or `adl://` for Azure Data Lake Storage Gen1. If secure transfer is enabled for Blob Storage, the URI would be `wasbs://`. For more information, see [Require secure transfer in Azure Storage](../../storage/common/storage-require-secure-transfer.md) .
20
+
* The URI scheme for your clusters primary storage. The scheme would be `wasb://` for Azure Blob Storage, `abfs://` for Azure Data Lake Storage Gen2 or `adl://` for Azure Data Lake Storage Gen1. If secure transfer is enabled for Blob Storage, the URI would be `wasbs://`. For more information, see [Require secure transfer in Azure Storage](../../storage/common/storage-require-secure-transfer.md) .
21
21
22
22
## Launch an Apache Zeppelin notebook
23
23
@@ -66,7 +66,7 @@ HDInsight Spark clusters include [Apache Zeppelin](https://zeppelin.apache.org/)
66
66
hvac.registerTempTable("hvac")
67
67
```
68
68
69
-
Press**SHIFT+ENTER** or select the **Play** button for the paragraph to run the snippet. The status on the right-corner of the paragraph should progress from READY, PENDING, RUNNING to FINISHED. The output shows up at the bottom of the same paragraph. The screenshot looks like the following:
69
+
Press**SHIFT+ENTER** or select the **Play** button for the paragraph to run the snippet. The status on the right-corner of the paragraph should progress from READY, PENDING, RUNNING to FINISHED. The output shows up at the bottom of the same paragraph. The screenshot looks like the followingimage:
70
70
71
71

72
72
@@ -75,7 +75,7 @@ HDInsight Spark clusters include [Apache Zeppelin](https://zeppelin.apache.org/)
75
75
> [!NOTE]
76
76
>%spark2 interpreter is not supported in Zeppelin notebooks across all HDInsight versions, and %sh interpreter will not be supported from HDInsight4.0 onwards.
77
77
78
-
5. You can now run SparkSQL statements on the `hvac` table. Paste the following query in a new paragraph. The query retrieves the building ID and the difference between the target and actual temperatures for each building on a givendate. Press**SHIFT+ENTER**.
78
+
5. You can now run SparkSQL statements on the `hvac` table. Paste the following query in a new paragraph. The query retrieves the building ID. Also the difference between the target and actual temperatures for each building on a givendate. Press**SHIFT+ENTER**.
79
79
80
80
```sql
81
81
%sql
@@ -84,7 +84,7 @@ HDInsight Spark clusters include [Apache Zeppelin](https://zeppelin.apache.org/)
84
84
85
85
The**%sql** statement at the beginning tells the notebook to use the LivyScala interpreter.
86
86
87
-
6. Select the **BarChart** icon to change the display. **settings**, which appears after you have selected **BarChart**, allows you to choose **Keys**, and **Values**. The following screenshot shows the output.
87
+
6. Select the **BarChart** icon to change the display. **settings**, appear after you have selected **BarChart**, allows you to choose **Keys**, and **Values**. The following screenshot shows the output.
88
88
89
89

90
90
@@ -108,7 +108,7 @@ HDInsight Spark clusters include [Apache Zeppelin](https://zeppelin.apache.org/)
108
108
109
109
##HowdoI use external packages with the notebook?
110
110
111
-
You can configure the Zeppelin notebook in ApacheSpark cluster on HDInsightto use external, community-contributed packages that aren't included out-of-the-box in the cluster. You can search the [Maven repository](https://search.maven.org/) for the complete list of packages that are available. You can also get a list of available packages from other sources. For example, a complete list of community-contributed packages is available at [SparkPackages](https://spark-packages.org/).
111
+
Zeppelin notebook in ApacheSpark cluster on HDInsightcan use external, community-contributed packages that aren't included in the cluster. Search the [Maven repository](https://search.maven.org/) for the complete list of packages that are available. You can also get a list of available packages from other sources. For example, a complete list of community-contributed packages is available at [SparkPackages](https://spark-packages.org/).
112
112
113
113
Inthis article, you'll see how to use the [spark-csv](https://search.maven.org/#artifactdetails%7Ccom.databricks%7Cspark-csv_2.10%7C1.4.0%7Cjar) packagewiththeJupyternotebook.
114
114
@@ -144,21 +144,22 @@ The Zeppelin notebooks are saved to the cluster headnodes. So, if you delete the
144
144
145
145

146
146
147
-
This saves the notebook asaJSON file in your download location.
147
+
Thisaction saves the notebook asaJSON file in your download location.
148
148
149
-
##UseShiro to ConfigureAccess to ZeppelinInterpreters in EnterpriseSecurityPackage (ESP) Clusters
150
-
As noted above, the `%sh` interpreter is not supported from HDInsight4.0 onwards. Furthermore, since `%sh` interpreter introduces potential security issues, such asaccess keytabs using shell commands, it has been removed from HDInsight3.6ESP clusters aswell. It means `%sh` interpreter is not available when clicking **Createnew note** or in the InterpreterUI by default.
149
+
##Use `Shiro` to ConfigureAccess to ZeppelinInterpreters in EnterpriseSecurityPackage (ESP) Clusters
151
150
152
-
Privileged domain users can utilize the `Shiro.ini` file to control access to the InterpreterUI. Thus, only these users can create new `%sh` interpreters and set permissions on each new`%sh` interpreter. To control access using the `shiro.ini` file, use the following steps:
151
+
As noted above, the `%sh` interpreter isn't supported from HDInsight4.0 onwards. Furthermore, since `%sh` interpreter introduces potential security issues, such asaccess keytabs using shell commands, it has been removed from HDInsight3.6ESP clusters aswell. It means `%sh` interpreter isn't available when clicking **Createnew note** or in the InterpreterUI by default.
153
152
154
-
1. Define a new role using an existing domain group name. In the following example, `adminGroupName` is a group of privileged users in AAD. Do not use special characters or white spaces in the group name. The characters after `=` give the permissions forthis role. `*` means the group has full permissions.
153
+
Privileged domain users can use the `Shiro.ini` file to control access to the InterpreterUI. Only these users can create new `%sh` interpreters and set permissions on each new `%sh` interpreter. To control access using the `shiro.ini` file, use the following steps:
154
+
155
+
1. Define a new role using an existing domain group name. In the following example, `adminGroupName` is a group of privileged users in AAD. Don't use special characters or white spaces in the group name. The characters after `=` give the permissions forthis role. `*` means the group has full permissions.
155
156
156
157
```
157
158
[roles]
158
159
adminGroupName =*
159
160
```
160
161
161
-
2. Add the new role for access to Zeppelin interpreters. In the following example, all users in `adminGroupName` are givenaccess to Zeppelin interpreters and are able to create new interpreters. You can put multiple roles between the brackets in `roles[]`, separated by commas. Then, users that have the necessary permissions, can access Zeppelin interpreters.
162
+
2. Add the new role for access to Zeppelin interpreters. In the following example, all users in `adminGroupName` are givenaccess to Zeppelin interpreters and can create new interpreters. You can put multiple roles between the brackets in `roles[]`, separated by commas. Then, users that have the necessary permissions, can access Zeppelin interpreters.
162
163
163
164
```
164
165
[urls]
@@ -167,9 +168,9 @@ Privileged domain users can utilize the `Shiro.ini` file to control access to th
167
168
168
169
## Livy session management
169
170
170
-
When you run the first code paragraph in your Zeppelin notebook, a new Livy session is created in your HDInsight Spark cluster. This session is shared across all Zeppelin notebooks that you subsequently create. If for some reason the Livy session is killed (cluster reboot, and so on), you won't be able to run jobs from the Zeppelin notebook.
171
+
The first code paragraph in your Zeppelin notebook creates a new Livy session in your cluster. This session is shared across all Zeppelin notebooks that you later create. If the Livy session is killed for any reason, jobs won't run from the Zeppelin notebook.
171
172
172
-
In such a case, you must perform the following steps before you can start running jobs from a Zeppelin notebook.
173
+
In such a case, you must do the following steps before you can start running jobs from a Zeppelin notebook.
173
174
174
175
1. Restart the Livy interpreter from the Zeppelin notebook. To do so, open interpreter settings by selecting the logged in user name from the top-right corner, then select **Interpreter**.
175
176
@@ -179,7 +180,7 @@ In such a case, you must perform the following steps before you can start runnin
179
180
180
181

181
182
182
-
3. Run a code cell from an existing Zeppelin notebook. This creates a new Livy session in the HDInsight cluster.
183
+
3. Run a code cell from an existing Zeppelin notebook. This code creates a new Livy session in the HDInsight cluster.
183
184
184
185
## General information
185
186
@@ -201,7 +202,7 @@ To validate the service from a command line, SSH to the head node. Switch user t
|Configuration Interpreter, Shiro, site.xml, log4j|/usr/hdp/current/zeppelin-server/conf or /etc/zeppelin/conf|
205
+
|Configuration Interpreter, `Shiro`, site.xml, log4j|/usr/hdp/current/zeppelin-server/conf or /etc/zeppelin/conf|
205
206
|PID directory|/var/run/zeppelin|
206
207
207
208
### Enable debug logging
@@ -222,7 +223,7 @@ To validate the service from a command line, SSH to the head node. Switch user t
222
223
223
224
### Scenarios
224
225
225
-
* [Apache Spark with BI: Perform interactive data analysis using Spark in HDInsight with BI tools](apache-spark-use-bi-tools.md)
226
+
* [Apache Spark with BI: Interactive data analysis using Spark in HDInsight with BI tools](apache-spark-use-bi-tools.md)
226
227
* [Apache Spark with Machine Learning: Use Spark in HDInsight for analyzing building temperature using HVAC data](apache-spark-ipython-notebook-machine-learning.md)
227
228
* [Apache Spark with Machine Learning: Use Spark in HDInsight to predict food inspection results](apache-spark-machine-learning-mllib-ipython.md)
228
229
* [Website log analysis using Apache Spark in HDInsight](apache-spark-custom-library-website-log-analysis.md)
0 commit comments