You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/hdinsight/spark/apache-spark-intellij-tool-plugin.md
+42-73Lines changed: 42 additions & 73 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,31 +1,43 @@
1
1
---
2
-
title: 'Azure Toolkit for IntelliJ: Create Spark applications for an HDInsight cluster'
3
-
description: Use the Azure Toolkit for IntelliJ to develop Spark applications written in Scala, and submit them to an HDInsight Spark cluster.
2
+
title: 'Tutorial - Azure Toolkit for IntelliJ: Create Spark applications for an HDInsight cluster'
3
+
description: Tutorial - Use the Azure Toolkit for IntelliJ to develop Spark applications written in Scala, and submit them to an HDInsight Spark cluster.
4
4
author: hrasheed-msft
5
5
ms.reviewer: jasonh
6
6
ms.service: hdinsight
7
7
ms.custom: hdinsightactive
8
8
ms.topic: tutorial
9
-
ms.date: 02/15/2019
10
-
ms.author: maxluk
9
+
ms.date: 06/26/2019
10
+
ms.author: hrasheed
11
11
---
12
+
12
13
# Tutorial: Use Azure Toolkit for IntelliJ to create Apache Spark applications for an HDInsight cluster
13
14
14
-
Use the Azure Toolkit for IntelliJ plug-in to develop [Apache Spark](https://spark.apache.org/) applications written in [Scala](https://www.scala-lang.org/), and then submit them to an HDInsight Spark cluster directly from the IntelliJ integrated development environment (IDE). You can use the plug-in in a few ways:
15
+
This tutorial demonstrates how to use the Azure Toolkit for IntelliJ plug-in to develop Apache Spark applications written in [Scala](https://www.scala-lang.org/), and then submit them to an HDInsight Spark cluster directly from the IntelliJ integrated development environment (IDE). You can use the plug-in in a few ways:
15
16
16
17
* Develop and submit a Scala Spark application on an HDInsight Spark cluster.
17
18
* Access your Azure HDInsight Spark cluster resources.
18
19
* Develop and run a Scala Spark application locally.
19
20
21
+
In this tutorial, you learn how to:
22
+
> [!div class="checklist"]
23
+
> * Use the Azure Toolkit for IntelliJ plug-in
24
+
> * Develop Apache Spark applications
25
+
> * Submit application to Azure HDInsight cluster
26
+
20
27
## Prerequisites
21
28
22
29
* An Apache Spark cluster on HDInsight. For instructions, see [Create Apache Spark clusters in Azure HDInsight](apache-spark-jupyter-spark-sql.md).
30
+
23
31
*[Oracle Java Development kit](https://www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-2133151.html). This tutorial uses Java version 8.0.202.
32
+
24
33
* IntelliJ IDEA. This article uses [IntelliJ IDEA Community ver. 2018.3.4](https://www.jetbrains.com/idea/download/).
34
+
25
35
* Azure Toolkit for IntelliJ. See [Installing the Azure Toolkit for IntelliJ](https://docs.microsoft.com/java/azure/intellij/azure-toolkit-for-intellij-installation?view=azure-java-stable).
36
+
26
37
* WINUTILS.EXE. See [Problems running Hadoop on Windows](https://wiki.apache.org/hadoop/WindowsProblems).
27
38
28
39
## Install Scala plugin for IntelliJ IDEA
40
+
29
41
Perform the following steps to install the Scala plugin:
30
42
31
43
1. Open IntelliJ IDEA.
@@ -40,7 +52,6 @@ Perform the following steps to install the Scala plugin:
40
52
41
53
4. After the plugin installs successfully, you must restart the IDE.
42
54
43
-
44
55
## Create a Spark Scala application for an HDInsight Spark cluster
45
56
46
57
1. Start IntelliJ IDEA, and select **Create New Project** to open the **New Project** window.
@@ -127,7 +138,6 @@ User can either [sign in to Azure subscription](#sign-in-to-your-azure-subscript
127
138
128
139

129
140
130
-
131
141
3. In the **Azure Sign In** dialog box, choose **Device Login**, and then select **Sign in**.
132
142
133
143

@@ -157,6 +167,7 @@ User can either [sign in to Azure subscription](#sign-in-to-your-azure-subscript
You can link an HDInsight cluster by using the Apache Ambari managed username. Similarly, for a domain-joined HDInsight cluster, you can link by using the domain and username, such as [email protected]. Also you can link Livy Service cluster.
161
172
162
173
1. From the menu bar, navigate to **View** > **Tool Windows** > **Azure Explorer**.
@@ -177,7 +188,7 @@ You can link an HDInsight cluster by using the Apache Ambari managed username. S
177
188
|User Name| Enter cluster user name, default is admin.|
## Run a Spark Scala application on an HDInsight Spark cluster
216
+
205
217
After creating a Scala application, you can submit it to the cluster.
206
218
207
219
1. From Project, navigate to **myApp** > **src** > **main** > **scala** > **myApp**. Right-click **myApp**, and select **Submit Spark Application** (It will likely be located at the bottom of the list).
@@ -219,7 +231,7 @@ After creating a Scala application, you can submit it to the cluster.
219
231
|Main class name|The default value is the main class from the selected file. You can change the class by selecting the ellipsis(**...**) and choosing another class.|
220
232
|Job configurations|You can change the default keys and/or values. For more information, see [Apache Livy REST API](https://livy.incubator.apache.org./docs/latest/rest-api.html).|
221
233
|Command line arguments|You can enter arguments separated by space for the main class if needed.|
222
-
|Referenced Jars and Referenced Files|You can enter the paths for the referenced Jars and files if any. You can also browse files in the Azure virtual file system which currently only supports ADLS Gen 2 cluster. For more information: [Apache Spark Configuration](https://spark.apache.org/docs/latest/configuration.html#runtime-environment). See also, [How to upload resources to cluster](https://docs.microsoft.com/azure/storage/blobs/storage-quickstart-blobs-storage-explorer).|
234
+
|Referenced Jars and Referenced Files|You can enter the paths for the referenced Jars and files if any. You can also browse files in the Azure virtual file system, which currently only supports ADLS Gen 2 cluster. For more information: [Apache Spark Configuration](https://spark.apache.org/docs/latest/configuration.html#runtime-environment). See also, [How to upload resources to cluster](https://docs.microsoft.com/azure/storage/blobs/storage-quickstart-blobs-storage-explorer).|
223
235
|Job Upload Storage|Expand to reveal additional options.|
224
236
|Storage Type|Select **Use Azure Blob to upload** from the drop-down list.|
225
237
|Storage Account|Enter your storage account.|
@@ -233,9 +245,11 @@ After creating a Scala application, you can submit it to the cluster.
## Debug Apache Spark applications locally or remotely on an HDInsight cluster
248
+
236
249
We also recommend another way of submitting the Spark application to the cluster. You can do so by setting the parameters in the **Run/Debug configurations** IDE. For more information, see [Debug Apache Spark applications locally or remotely on an HDInsight cluster with Azure Toolkit for IntelliJ through SSH](apache-spark-intellij-tool-debug-remotely-through-ssh.md).
237
250
238
251
## Access and manage HDInsight Spark clusters by using Azure Toolkit for IntelliJ
252
+
239
253
You can perform various operations by using Azure Toolkit for IntelliJ. Most of the operations are initiated from **Azure Explorer**. From the menu bar, navigate to **View** > **Tool Windows** > **Azure Explorer**.
240
254
241
255
### Access the job view
@@ -272,16 +286,19 @@ You can perform various operations by using Azure Toolkit for IntelliJ. Most of
272
286
2. When you're prompted, enter the admin credentials for the cluster. You specified these credentials during the cluster setup process.
273
287
274
288
### Manage Azure subscriptions
289
+
275
290
By default, Azure Toolkit for IntelliJ lists the Spark clusters from all your Azure subscriptions. If necessary, you can specify the subscriptions that you want to access.
276
291
277
292
1. From Azure Explorer, right-click the **Azure** root node, and then select **Select Subscriptions**.
278
293
279
294
2. From the **Select Subscriptions** window, clear the check boxes next to the subscriptions that you don't want to access, and then select **Close**.
280
295
281
296
## Spark Console
297
+
282
298
You can run Spark Local Console(Scala) or run Spark Livy Interactive Session Console(Scala).
283
299
284
300
### Spark Local Console(Scala)
301
+
285
302
Ensure you have satisfied the WINUTILS.EXE prerequisite.
286
303
287
304
1. From the menu bar, navigate to **Run** > **Edit Configurations...**.
@@ -314,8 +331,8 @@ Ensure you have satisfied the WINUTILS.EXE prerequisite.
It is only supported on IntelliJ 2018.2 and 2018.3.
320
337
321
338
1. From the menu bar, navigate to **Run** > **Edit Configurations...**.
@@ -348,6 +365,7 @@ It is convenient for you to foresee the script result by sending some code to th
348
365

349
366
350
367
## Reader-only role
368
+
351
369
When users submit job to a cluster with reader-only role permission, Ambari credentials is required.
352
370
353
371
### Link cluster from context menu
@@ -358,7 +376,7 @@ When users submit job to a cluster with reader-only role permission, Ambari cred
358
376
359
377

360
378
361
-
3. Rightclick the cluster with reader-only role permission. Select **Link this cluster** from context menu to link cluster. Enter the Ambari username and Password.
379
+
3. Right-click the cluster with reader-only role permission. Select **Link this cluster** from context menu to link cluster. Enter the Ambari username and Password.
362
380
363
381
364
382

@@ -368,8 +386,6 @@ When users submit job to a cluster with reader-only role permission, Ambari cred
368
386
369
387

@@ -382,7 +398,7 @@ When users submit job to a cluster with reader-only role permission, Ambari cred
382
398
383
399
1. Create an HDInsight Configuration. Then select **Remotely Run in Cluster**.
384
400
385
-
2. Select a cluster which has reader-only role permission for **Spark clusters(Linux only)**. Warning message shows out. You can Click **Link this cluster** to link cluster.
401
+
2. Select a cluster, which has reader-only role permission for **Spark clusters(Linux only)**. Warning message shows out. You can Click **Link this cluster** to link cluster.
386
402
387
403

388
404
@@ -398,9 +414,7 @@ When users submit job to a cluster with reader-only role permission, Ambari cred
398
414
399
415

400
416
401
-
402
417

403
-
404
418
405
419
## Convert existing IntelliJ IDEA applications to use Azure Toolkit for IntelliJ
406
420
@@ -418,70 +432,25 @@ You can convert the existing Spark Scala applications that you created in Intell
418
432
419
433
3. Save the changes. Your application should now be compatible with Azure Toolkit for IntelliJ. You can test it by right-clicking the project name in Project. The pop-up menu now has the option **Submit Spark Application to HDInsight**.
420
434
421
-
## Troubleshooting
422
-
423
-
### Error in local run: *Use a larger heap size*
424
-
In Spark 1.6, if you're using a 32-bit Java SDK during local run, you might encounter the following errors:
425
-
426
-
Exception in thread "main" java.lang.IllegalArgumentException: System memory 259522560 must be at least 4.718592E8. Please use a larger heap size.
427
-
at org.apache.spark.memory.UnifiedMemoryManager$.getMaxMemory(UnifiedMemoryManager.scala:193)
428
-
at org.apache.spark.memory.UnifiedMemoryManager$.apply(UnifiedMemoryManager.scala:175)
429
-
at org.apache.spark.SparkEnv$.create(SparkEnv.scala:354)
430
-
at org.apache.spark.SparkEnv$.createDriverEnv(SparkEnv.scala:193)
431
-
at org.apache.spark.SparkContext.createSparkEnv(SparkContext.scala:288)
432
-
at org.apache.spark.SparkContext.<init>(SparkContext.scala:457)
433
-
at LogQuery$.main(LogQuery.scala:53)
434
-
at LogQuery.main(LogQuery.scala)
435
-
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
436
-
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
437
-
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
438
-
at java.lang.reflect.Method.invoke(Method.java:606)
439
-
at com.intellij.rt.execution.application.AppMain.main(AppMain.java:144)
440
-
441
-
These errors happen because the heap size is not large enough for Spark to run. Spark requires at least 471 MB. (For more information, see [SPARK-12081](https://issues.apache.org/jira/browse/SPARK-12081).) One simple solution is to use a 64-bit Java SDK. You can also change the JVM settings in IntelliJ by adding the following options:
442
-
443
-
-Xms128m -Xmx512m -XX:MaxPermSize=300m -ea
444
-
445
-

446
-
447
-
## FAQ
448
-
If the cluster is busy, you might get the error below.
449
-
450
-

435
+
## Clean up resources
451
436
452
-

437
+
If you're not going to continue to use this application, delete the cluster that you created with the following steps:
453
438
454
-
## Known issues
439
+
1. Sign in to the [Azure portal](https://portal.azure.com/).
455
440
456
-
Currently, viewing Spark outputs directly is not supported.
441
+
1. In the **Search** box at the top, type **HDInsight**.
457
442
458
-
## <aname="seealso"></a>Next steps
443
+
1. Select **HDInsight clusters** under **Services**.
459
444
460
-
*[Overview: Apache Spark on Azure HDInsight](apache-spark-overview.md)
445
+
1. In the list of HDInsight clusters that appears, select the **...** next to the cluster that you created for this tutorial.
* Remote debug (video): [Use Azure Toolkit for IntelliJ to debug Apache Spark applications remotely on HDInsight Cluster](https://channel9.msdn.com/Series/AzureDataLake/Debug-HDInsight-Spark-Applications-with-Azure-Toolkit-for-IntelliJ)
447
+
1. Select **Delete**. Select **Yes**.
465
448
466
-
### Scenarios
467
-
*[Apache Spark with BI: Perform interactive data analysis by using Spark in HDInsight with BI tools](apache-spark-use-bi-tools.md)
468
-
*[Apache Spark with Machine Learning: Use Spark in HDInsight to analyze building temperature using HVAC data](apache-spark-ipython-notebook-machine-learning.md)
469
-
*[Apache Spark with Machine Learning: Use Spark in HDInsight to predict food inspection results](apache-spark-machine-learning-mllib-ipython.md)
470
-
*[Website log analysis using Apache Spark in HDInsight](apache-spark-custom-library-website-log-analysis.md)
449
+

471
450
472
-
### Creating and running applications
473
-
*[Create a standalone application using Scala](apache-spark-create-standalone-application.md)
474
-
*[Run jobs remotely on an Apache Spark cluster using Apache Livy](apache-spark-livy-rest-interface.md)
451
+
## Next steps
475
452
476
-
### Tools and extensions
477
-
*[Use Azure Toolkit for IntelliJ to debug Apache Spark applications remotely through VPN](apache-spark-intellij-tool-plugin-debug-jobs-remotely.md)
478
-
*[Use Azure Toolkit for IntelliJ to debug Apache Spark applications remotely through SSH](apache-spark-intellij-tool-debug-remotely-through-ssh.md)
479
-
*[Use HDInsight Tools in Azure Toolkit for Eclipse to create Apache Spark applications](apache-spark-eclipse-tool-plugin.md)
480
-
*[Use Apache Zeppelin notebooks with an Apache Spark cluster on HDInsight](apache-spark-zeppelin-notebook.md)
481
-
*[Kernels available for Jupyter notebook in Apache Spark cluster for HDInsight](apache-spark-jupyter-notebook-kernels.md)
482
-
*[Use external packages with Jupyter notebooks](apache-spark-jupyter-notebook-use-external-packages.md)
483
-
*[Install Jupyter on your computer and connect to an HDInsight Spark cluster](apache-spark-jupyter-notebook-install-locally.md)
453
+
In this tutorial, you learned how to use the Azure Toolkit for IntelliJ plug-in to develop Apache Spark applications written in [Scala](https://www.scala-lang.org/), and then submitted them to an HDInsight Spark cluster directly from the IntelliJ integrated development environment (IDE). Advance to the next article to see how the data you registered in Apache Spark can be pulled into a BI analytics tool such as Power BI.
484
454
485
-
### Managing resources
486
-
*[Manage resources for the Apache Spark cluster in Azure HDInsight](apache-spark-resource-manager.md)
487
-
*[Track and debug jobs running on an Apache Spark cluster in HDInsight](apache-spark-job-debugging.md)
455
+
> [!div class="nextstepaction"]
456
+
> [Analyze data using BI tools](apache-spark-use-bi-tools.md)
0 commit comments