Skip to content

Commit 9e26407

Browse files
Merge pull request #80666 from dagiro/mvc39
mvc39
2 parents eb879de + 267536e commit 9e26407

File tree

5 files changed

+42
-73
lines changed

5 files changed

+42
-73
lines changed

articles/hdinsight/spark/apache-spark-intellij-tool-plugin.md

Lines changed: 42 additions & 73 deletions
Original file line numberDiff line numberDiff line change
@@ -1,31 +1,43 @@
11
---
2-
title: 'Azure Toolkit for IntelliJ: Create Spark applications for an HDInsight cluster '
3-
description: Use the Azure Toolkit for IntelliJ to develop Spark applications written in Scala, and submit them to an HDInsight Spark cluster.
2+
title: 'Tutorial - Azure Toolkit for IntelliJ: Create Spark applications for an HDInsight cluster'
3+
description: Tutorial - Use the Azure Toolkit for IntelliJ to develop Spark applications written in Scala, and submit them to an HDInsight Spark cluster.
44
author: hrasheed-msft
55
ms.reviewer: jasonh
66
ms.service: hdinsight
77
ms.custom: hdinsightactive
88
ms.topic: tutorial
9-
ms.date: 02/15/2019
10-
ms.author: maxluk
9+
ms.date: 06/26/2019
10+
ms.author: hrasheed
1111
---
12+
1213
# Tutorial: Use Azure Toolkit for IntelliJ to create Apache Spark applications for an HDInsight cluster
1314

14-
Use the Azure Toolkit for IntelliJ plug-in to develop [Apache Spark](https://spark.apache.org/) applications written in [Scala](https://www.scala-lang.org/), and then submit them to an HDInsight Spark cluster directly from the IntelliJ integrated development environment (IDE). You can use the plug-in in a few ways:
15+
This tutorial demonstrates how to use the Azure Toolkit for IntelliJ plug-in to develop Apache Spark applications written in [Scala](https://www.scala-lang.org/), and then submit them to an HDInsight Spark cluster directly from the IntelliJ integrated development environment (IDE). You can use the plug-in in a few ways:
1516

1617
* Develop and submit a Scala Spark application on an HDInsight Spark cluster.
1718
* Access your Azure HDInsight Spark cluster resources.
1819
* Develop and run a Scala Spark application locally.
1920

21+
In this tutorial, you learn how to:
22+
> [!div class="checklist"]
23+
> * Use the Azure Toolkit for IntelliJ plug-in
24+
> * Develop Apache Spark applications
25+
> * Submit application to Azure HDInsight cluster
26+
2027
## Prerequisites
2128

2229
* An Apache Spark cluster on HDInsight. For instructions, see [Create Apache Spark clusters in Azure HDInsight](apache-spark-jupyter-spark-sql.md).
30+
2331
* [Oracle Java Development kit](https://www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-2133151.html). This tutorial uses Java version 8.0.202.
32+
2433
* IntelliJ IDEA. This article uses [IntelliJ IDEA Community ver. 2018.3.4](https://www.jetbrains.com/idea/download/).
34+
2535
* Azure Toolkit for IntelliJ. See [Installing the Azure Toolkit for IntelliJ](https://docs.microsoft.com/java/azure/intellij/azure-toolkit-for-intellij-installation?view=azure-java-stable).
36+
2637
* WINUTILS.EXE. See [Problems running Hadoop on Windows](https://wiki.apache.org/hadoop/WindowsProblems).
2738

2839
## Install Scala plugin for IntelliJ IDEA
40+
2941
Perform the following steps to install the Scala plugin:
3042

3143
1. Open IntelliJ IDEA.
@@ -40,7 +52,6 @@ Perform the following steps to install the Scala plugin:
4052

4153
4. After the plugin installs successfully, you must restart the IDE.
4254

43-
4455
## Create a Spark Scala application for an HDInsight Spark cluster
4556

4657
1. Start IntelliJ IDEA, and select **Create New Project** to open the **New Project** window.
@@ -127,7 +138,6 @@ User can either [sign in to Azure subscription](#sign-in-to-your-azure-subscript
127138

128139
![The Azure Explorer link](./media/apache-spark-intellij-tool-plugin/explorer-rightclick-azure.png)
129140

130-
131141
3. In the **Azure Sign In** dialog box, choose **Device Login**, and then select **Sign in**.
132142

133143
![The Azure Sign In dialog box](./media/apache-spark-intellij-tool-plugin/view-explorer-2.png)
@@ -157,6 +167,7 @@ User can either [sign in to Azure subscription](#sign-in-to-your-azure-subscript
157167
![An expanded cluster-name node](./media/apache-spark-intellij-tool-plugin/view-explorer-4.png)
158168

159169
### Link a cluster
170+
160171
You can link an HDInsight cluster by using the Apache Ambari managed username. Similarly, for a domain-joined HDInsight cluster, you can link by using the domain and username, such as [email protected]. Also you can link Livy Service cluster.
161172

162173
1. From the menu bar, navigate to **View** > **Tool Windows** > **Azure Explorer**.
@@ -177,7 +188,7 @@ You can link an HDInsight cluster by using the Apache Ambari managed username. S
177188
|User Name| Enter cluster user name, default is admin.|
178189
|Password| Enter password for user name.|
179190

180-
![link HdInsight cluster dialog](./media/apache-spark-intellij-tool-plugin/link-hdinsight-cluster-dialog.png)
191+
![link HDInsight cluster dialog](./media/apache-spark-intellij-tool-plugin/link-hdinsight-cluster-dialog.png)
181192

182193
* **Livy Service**
183194

@@ -202,6 +213,7 @@ You can link an HDInsight cluster by using the Apache Ambari managed username. S
202213
![unlinked cluster](./media/apache-spark-intellij-tool-plugin/unlink.png)
203214

204215
## Run a Spark Scala application on an HDInsight Spark cluster
216+
205217
After creating a Scala application, you can submit it to the cluster.
206218

207219
1. From Project, navigate to **myApp** > **src** > **main** > **scala** > **myApp**. Right-click **myApp**, and select **Submit Spark Application** (It will likely be located at the bottom of the list).
@@ -219,7 +231,7 @@ After creating a Scala application, you can submit it to the cluster.
219231
|Main class name|The default value is the main class from the selected file. You can change the class by selecting the ellipsis(**...**) and choosing another class.|
220232
|Job configurations|You can change the default keys and/or values. For more information, see [Apache Livy REST API](https://livy.incubator.apache.org./docs/latest/rest-api.html).|
221233
|Command line arguments|You can enter arguments separated by space for the main class if needed.|
222-
|Referenced Jars and Referenced Files|You can enter the paths for the referenced Jars and files if any. You can also browse files in the Azure virtual file system which currently only supports ADLS Gen 2 cluster. For more information: [Apache Spark Configuration](https://spark.apache.org/docs/latest/configuration.html#runtime-environment). See also, [How to upload resources to cluster](https://docs.microsoft.com/azure/storage/blobs/storage-quickstart-blobs-storage-explorer).|
234+
|Referenced Jars and Referenced Files|You can enter the paths for the referenced Jars and files if any. You can also browse files in the Azure virtual file system, which currently only supports ADLS Gen 2 cluster. For more information: [Apache Spark Configuration](https://spark.apache.org/docs/latest/configuration.html#runtime-environment). See also, [How to upload resources to cluster](https://docs.microsoft.com/azure/storage/blobs/storage-quickstart-blobs-storage-explorer).|
223235
|Job Upload Storage|Expand to reveal additional options.|
224236
|Storage Type|Select **Use Azure Blob to upload** from the drop-down list.|
225237
|Storage Account|Enter your storage account.|
@@ -233,9 +245,11 @@ After creating a Scala application, you can submit it to the cluster.
233245
![The Spark Submission window](./media/apache-spark-intellij-tool-plugin/hdi-spark-app-result.png)
234246

235247
## Debug Apache Spark applications locally or remotely on an HDInsight cluster
248+
236249
We also recommend another way of submitting the Spark application to the cluster. You can do so by setting the parameters in the **Run/Debug configurations** IDE. For more information, see [Debug Apache Spark applications locally or remotely on an HDInsight cluster with Azure Toolkit for IntelliJ through SSH](apache-spark-intellij-tool-debug-remotely-through-ssh.md).
237250

238251
## Access and manage HDInsight Spark clusters by using Azure Toolkit for IntelliJ
252+
239253
You can perform various operations by using Azure Toolkit for IntelliJ. Most of the operations are initiated from **Azure Explorer**. From the menu bar, navigate to **View** > **Tool Windows** > **Azure Explorer**.
240254

241255
### Access the job view
@@ -272,16 +286,19 @@ You can perform various operations by using Azure Toolkit for IntelliJ. Most of
272286
2. When you're prompted, enter the admin credentials for the cluster. You specified these credentials during the cluster setup process.
273287

274288
### Manage Azure subscriptions
289+
275290
By default, Azure Toolkit for IntelliJ lists the Spark clusters from all your Azure subscriptions. If necessary, you can specify the subscriptions that you want to access.
276291

277292
1. From Azure Explorer, right-click the **Azure** root node, and then select **Select Subscriptions**.
278293

279294
2. From the **Select Subscriptions** window, clear the check boxes next to the subscriptions that you don't want to access, and then select **Close**.
280295

281296
## Spark Console
297+
282298
You can run Spark Local Console(Scala) or run Spark Livy Interactive Session Console(Scala).
283299

284300
### Spark Local Console(Scala)
301+
285302
Ensure you have satisfied the WINUTILS.EXE prerequisite.
286303

287304
1. From the menu bar, navigate to **Run** > **Edit Configurations...**.
@@ -314,8 +331,8 @@ Ensure you have satisfied the WINUTILS.EXE prerequisite.
314331

315332
![Local Console Result](./media/apache-spark-intellij-tool-plugin/local-console-result.png)
316333

317-
318334
### Spark Livy Interactive Session Console(Scala)
335+
319336
It is only supported on IntelliJ 2018.2 and 2018.3.
320337

321338
1. From the menu bar, navigate to **Run** > **Edit Configurations...**.
@@ -348,6 +365,7 @@ It is convenient for you to foresee the script result by sending some code to th
348365
![Send Selection to Spark Console](./media/apache-spark-intellij-tool-plugin/send-selection-to-console.png)
349366

350367
## Reader-only role
368+
351369
When users submit job to a cluster with reader-only role permission, Ambari credentials is required.
352370

353371
### Link cluster from context menu
@@ -358,7 +376,7 @@ When users submit job to a cluster with reader-only role permission, Ambari cred
358376

359377
![HDInsight Spark clusters in Azure Explorer](./media/apache-spark-intellij-tool-plugin/view-explorer-15.png)
360378

361-
3. Right click the cluster with reader-only role permission. Select **Link this cluster** from context menu to link cluster. Enter the Ambari username and Password.
379+
3. Right-click the cluster with reader-only role permission. Select **Link this cluster** from context menu to link cluster. Enter the Ambari username and Password.
362380

363381

364382
![HDInsight Spark clusters in Azure Explorer](./media/apache-spark-intellij-tool-plugin/view-explorer-11.png)
@@ -368,8 +386,6 @@ When users submit job to a cluster with reader-only role permission, Ambari cred
368386

369387
![HDInsight Spark clusters in Azure Explorer](./media/apache-spark-intellij-tool-plugin/view-explorer-8.png)
370388

371-
372-
373389
### Link cluster by expanding Jobs node
374390

375391
1. Click **Jobs** node, **Cluster Job Access Denied** window pops up.
@@ -382,7 +398,7 @@ When users submit job to a cluster with reader-only role permission, Ambari cred
382398

383399
1. Create an HDInsight Configuration. Then select **Remotely Run in Cluster**.
384400

385-
2. Select a cluster which has reader-only role permission for **Spark clusters(Linux only)**. Warning message shows out. You can Click **Link this cluster** to link cluster.
401+
2. Select a cluster, which has reader-only role permission for **Spark clusters(Linux only)**. Warning message shows out. You can Click **Link this cluster** to link cluster.
386402

387403
![HDInsight Spark clusters in Azure Explorer](./media/apache-spark-intellij-tool-plugin/create-config-1.png)
388404

@@ -398,9 +414,7 @@ When users submit job to a cluster with reader-only role permission, Ambari cred
398414

399415
![HDInsight Spark clusters in Azure Explorer](./media/apache-spark-intellij-tool-plugin/view-explorer-13.png)
400416

401-
402417
![HDInsight Spark clusters in Azure Explorer](./media/apache-spark-intellij-tool-plugin/view-explorer-12.png)
403-
404418

405419
## Convert existing IntelliJ IDEA applications to use Azure Toolkit for IntelliJ
406420

@@ -418,70 +432,25 @@ You can convert the existing Spark Scala applications that you created in Intell
418432

419433
3. Save the changes. Your application should now be compatible with Azure Toolkit for IntelliJ. You can test it by right-clicking the project name in Project. The pop-up menu now has the option **Submit Spark Application to HDInsight**.
420434

421-
## Troubleshooting
422-
423-
### Error in local run: *Use a larger heap size*
424-
In Spark 1.6, if you're using a 32-bit Java SDK during local run, you might encounter the following errors:
425-
426-
Exception in thread "main" java.lang.IllegalArgumentException: System memory 259522560 must be at least 4.718592E8. Please use a larger heap size.
427-
at org.apache.spark.memory.UnifiedMemoryManager$.getMaxMemory(UnifiedMemoryManager.scala:193)
428-
at org.apache.spark.memory.UnifiedMemoryManager$.apply(UnifiedMemoryManager.scala:175)
429-
at org.apache.spark.SparkEnv$.create(SparkEnv.scala:354)
430-
at org.apache.spark.SparkEnv$.createDriverEnv(SparkEnv.scala:193)
431-
at org.apache.spark.SparkContext.createSparkEnv(SparkContext.scala:288)
432-
at org.apache.spark.SparkContext.<init>(SparkContext.scala:457)
433-
at LogQuery$.main(LogQuery.scala:53)
434-
at LogQuery.main(LogQuery.scala)
435-
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
436-
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
437-
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
438-
at java.lang.reflect.Method.invoke(Method.java:606)
439-
at com.intellij.rt.execution.application.AppMain.main(AppMain.java:144)
440-
441-
These errors happen because the heap size is not large enough for Spark to run. Spark requires at least 471 MB. (For more information, see [SPARK-12081](https://issues.apache.org/jira/browse/SPARK-12081).) One simple solution is to use a 64-bit Java SDK. You can also change the JVM settings in IntelliJ by adding the following options:
442-
443-
-Xms128m -Xmx512m -XX:MaxPermSize=300m -ea
444-
445-
![Adding options to the "VM options" box in IntelliJ](./media/apache-spark-intellij-tool-plugin/change-heap-size.png)
446-
447-
## FAQ
448-
If the cluster is busy, you might get the error below.
449-
450-
![Intellij get error when cluster busy](./media/apache-spark-intellij-tool-plugin/intellij-interactive-cluster-busy-upload.png)
435+
## Clean up resources
451436

452-
![Intellij get error when cluster busy](./media/apache-spark-intellij-tool-plugin/intellij-interactive-cluster-busy-submit.png)
437+
If you're not going to continue to use this application, delete the cluster that you created with the following steps:
453438

454-
## Known issues
439+
1. Sign in to the [Azure portal](https://portal.azure.com/).
455440

456-
Currently, viewing Spark outputs directly is not supported.
441+
1. In the **Search** box at the top, type **HDInsight**.
457442

458-
## <a name="seealso"></a>Next steps
443+
1. Select **HDInsight clusters** under **Services**.
459444

460-
* [Overview: Apache Spark on Azure HDInsight](apache-spark-overview.md)
445+
1. In the list of HDInsight clusters that appears, select the **...** next to the cluster that you created for this tutorial.
461446

462-
### Demo
463-
* Create Scala project (video): [Create Apache Spark Scala Applications](https://channel9.msdn.com/Series/AzureDataLake/Create-Spark-Applications-with-the-Azure-Toolkit-for-IntelliJ)
464-
* Remote debug (video): [Use Azure Toolkit for IntelliJ to debug Apache Spark applications remotely on HDInsight Cluster](https://channel9.msdn.com/Series/AzureDataLake/Debug-HDInsight-Spark-Applications-with-Azure-Toolkit-for-IntelliJ)
447+
1. Select **Delete**. Select **Yes**.
465448

466-
### Scenarios
467-
* [Apache Spark with BI: Perform interactive data analysis by using Spark in HDInsight with BI tools](apache-spark-use-bi-tools.md)
468-
* [Apache Spark with Machine Learning: Use Spark in HDInsight to analyze building temperature using HVAC data](apache-spark-ipython-notebook-machine-learning.md)
469-
* [Apache Spark with Machine Learning: Use Spark in HDInsight to predict food inspection results](apache-spark-machine-learning-mllib-ipython.md)
470-
* [Website log analysis using Apache Spark in HDInsight](apache-spark-custom-library-website-log-analysis.md)
449+
![Delete an HDInsight cluster](./media/apache-spark-intellij-tool-plugin/hdinsight-azure-portal-delete-cluster.png "Delete HDInsight cluster")
471450

472-
### Creating and running applications
473-
* [Create a standalone application using Scala](apache-spark-create-standalone-application.md)
474-
* [Run jobs remotely on an Apache Spark cluster using Apache Livy](apache-spark-livy-rest-interface.md)
451+
## Next steps
475452

476-
### Tools and extensions
477-
* [Use Azure Toolkit for IntelliJ to debug Apache Spark applications remotely through VPN](apache-spark-intellij-tool-plugin-debug-jobs-remotely.md)
478-
* [Use Azure Toolkit for IntelliJ to debug Apache Spark applications remotely through SSH](apache-spark-intellij-tool-debug-remotely-through-ssh.md)
479-
* [Use HDInsight Tools in Azure Toolkit for Eclipse to create Apache Spark applications](apache-spark-eclipse-tool-plugin.md)
480-
* [Use Apache Zeppelin notebooks with an Apache Spark cluster on HDInsight](apache-spark-zeppelin-notebook.md)
481-
* [Kernels available for Jupyter notebook in Apache Spark cluster for HDInsight](apache-spark-jupyter-notebook-kernels.md)
482-
* [Use external packages with Jupyter notebooks](apache-spark-jupyter-notebook-use-external-packages.md)
483-
* [Install Jupyter on your computer and connect to an HDInsight Spark cluster](apache-spark-jupyter-notebook-install-locally.md)
453+
In this tutorial, you learned how to use the Azure Toolkit for IntelliJ plug-in to develop Apache Spark applications written in [Scala](https://www.scala-lang.org/), and then submitted them to an HDInsight Spark cluster directly from the IntelliJ integrated development environment (IDE). Advance to the next article to see how the data you registered in Apache Spark can be pulled into a BI analytics tool such as Power BI.
484454

485-
### Managing resources
486-
* [Manage resources for the Apache Spark cluster in Azure HDInsight](apache-spark-resource-manager.md)
487-
* [Track and debug jobs running on an Apache Spark cluster in HDInsight](apache-spark-job-debugging.md)
455+
> [!div class="nextstepaction"]
456+
> [Analyze data using BI tools](apache-spark-use-bi-tools.md)
47.3 KB
Loading

0 commit comments

Comments
 (0)