You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
description: Tutorial - Create a Spark application written in Scala with Apache Maven as the build system and an existing Maven archetype for Scala provided by IntelliJ IDEA.
3
+
description: Tutorial - Create a Spark application written in Scala with Apache Maven as the build system. And an existing Maven archetype for Scala provided by IntelliJ IDEA.
4
4
author: hrasheed-msft
5
5
ms.author: hrasheed
6
6
ms.reviewer: jasonh
7
7
ms.service: hdinsight
8
8
ms.topic: tutorial
9
9
ms.custom: hdinsightactive,mvc
10
-
ms.date: 02/28/2020
10
+
ms.date: 04/17/2020
11
11
12
12
#customer intent: As a developer new to Apache Spark and to Apache Spark in Azure HDInsight, I want to learn how to create a Scala Maven application for Spark in HDInsight using IntelliJ.
13
13
---
14
14
15
15
# Tutorial: Create a Scala Maven application for Apache Spark in HDInsight using IntelliJ
16
16
17
-
In this tutorial, you learn how to create an [Apache Spark](./apache-spark-overview.md) application written in [Scala](https://www.scala-lang.org/) using [Apache Maven](https://maven.apache.org/) with IntelliJ IDEA. The article uses Apache Maven as the build system and starts with an existing Maven archetype for Scala provided by IntelliJ IDEA. Creating a Scala application in IntelliJ IDEA involves the following steps:
17
+
In this tutorial, you learn how to create an Apache Spark application written in Scala using Apache Maven with IntelliJ IDEA. The article uses Apache Maven as the build system. And starts with an existing Maven archetype for Scala provided by IntelliJ IDEA. Creating a Scala application in IntelliJ IDEA involves the following steps:
18
18
19
19
* Use Maven as the build system.
20
20
* Update Project Object Model (POM) file to resolve Spark module dependencies.
@@ -40,17 +40,17 @@ In this tutorial, you learn how to:
40
40
41
41
## Install Scala plugin for IntelliJ IDEA
42
42
43
-
Perform the following steps to install the Scala plugin:
43
+
Do the following steps to install the Scala plugin:
44
44
45
45
1. Open IntelliJ IDEA.
46
46
47
47
2. On the welcome screen, navigate to **Configure** > **Plugins** to open the **Plugins** window.
48
48
49
-

49
+

50
50
51
51
3. Select **Install** for the Scala plugin that is featured in the new window.
52
52
53
-

53
+

54
54
55
55
4. After the plugin installs successfully, you must restart the IDE.
56
56
@@ -75,8 +75,8 @@ Perform the following steps to install the Scala plugin:
75
75
| Property | Description |
76
76
| ----- | ----- |
77
77
|Project name| Enter a name.|
78
-
|Project location| Enter the desired location to save your project.|
79
-
|Project SDK| This will be blank on your first use of IDEA. Select **New...** and navigate to your JDK.|
78
+
|Project location| Enter the location to save your project.|
79
+
|Project SDK| This field will be blank on your first use of IDEA. Select **New...** and navigate to your JDK.|
80
80
|Spark Version|The creation wizard integrates the proper version for Spark SDK and Scala SDK. If the Spark cluster version is earlier than 2.0, select **Spark 1.x**. Otherwise, select **Spark2.x**. This example uses **Spark 2.3.0 (Scala 2.11.8)**.|
81
81
82
82

@@ -93,18 +93,18 @@ Perform the following steps to install the Scala plugin:
93
93
94
94
4. Select the **Create from archetype** checkbox.
95
95
96
-
5. From the list of archetypes, select **org.scala-tools.archetypes:scala-archetype-simple**. This archetype creates the right directory structure and downloads the required default dependencies to write Scala program.
96
+
5. From the list of archetypes, select **`org.scala-tools.archetypes:scala-archetype-simple`**. This archetype creates the right directory structure and downloads the required default dependencies to write Scala program.
97
97
98
-

98
+

99
99
100
100
6. Select **Next**.
101
101
102
-
7. Expand **Artifact Coordinates**. Provide relevant values for **GroupId**, and **ArtifactId**. **Name**, and **Location** will auto-populate. The following values are used in this tutorial:
102
+
7. Expand **Artifact Coordinates**. Provide relevant values for **GroupId**, and **ArtifactId**. **Name**, and **Location** will autopopulate. The following values are used in this tutorial:
103
103
104
104
-**GroupId:** com.microsoft.spark.example
105
105
-**ArtifactId:** SparkSimpleApp
106
106
107
-

107
+

108
108
109
109
8. Select **Next**.
110
110
@@ -114,7 +114,7 @@ Perform the following steps to install the Scala plugin:
114
114
115
115
11. Once the project has imported, from the left pane navigate to **SparkSimpleApp** > **src** > **test** > **scala** > **com** > **microsoft** > **spark** > **example**. Right-click **MySpec**, and then select **Delete...**. You don't need this file for the application. Select **OK** in the dialog box.
116
116
117
-
12. In the subsequent steps, you update the **pom.xml** to define the dependencies for the Spark Scala application. For those dependencies to be downloaded and resolved automatically, you must configure Maven accordingly.
117
+
12. In the later steps, you update the **pom.xml** to define the dependencies for the Spark Scala application. For those dependencies to be downloaded and resolved automatically, you must configure Maven.
118
118
119
119
13. From the **File** menu, select **Settings** to open the **Settings** window.
120
120
@@ -128,7 +128,7 @@ Perform the following steps to install the Scala plugin:
128
128
129
129
17. From the left pane, navigate to **src** > **main** > **scala** > **com.microsoft.spark.example**, and then double-click **App** to open App.scala.
130
130
131
-
18. Replace the existing sample code with the following code and save the changes. This code reads the data from the HVAC.csv (available on all HDInsight Spark clusters), retrieves the rows that only have one digit in the sixth column, and writes the output to **/HVACOut** under the default storage container for the cluster.
131
+
18. Replace the existing sample code with the following code and save the changes. This code reads the data from the HVAC.csv (available on all HDInsight Spark clusters). Retrieves the rows that only have one digit in the sixth column. And writes the output to **/HVACOut** under the default storage container for the cluster.
132
132
133
133
package com.microsoft.spark.example
134
134
@@ -169,29 +169,29 @@ Perform the following steps to install the Scala plugin:
169
169
170
170
Save changes to pom.xml.
171
171
172
-
22. Create the .jar file. IntelliJ IDEA enables creation of JAR as an artifact of a project. Perform the following steps.
172
+
22. Create the .jar file. IntelliJ IDEA enables creation of JAR as an artifact of a project. Do the following steps.
173
173
174
174
1. From the **File** menu, select **Project Structure...**.
175
175
176
176
2. From the **Project Structure** window, navigate to **Artifacts** > **the plus symbol +** > **JAR** > **From modules with dependencies...**.
177
177
178
-

178
+

179
179
180
180
3. In the **Create JAR from Modules** window, select the folder icon in the **Main Class** text box.
181
181
182
182
4. In the **Select Main Class** window, select the class that appears by default and then select **OK**.
183
183
184
-

184
+

185
185
186
186
5. In the **Create JAR from Modules** window, ensure the **extract to the target JAR** option is selected, and then select **OK**. This setting creates a single JAR with all dependencies.
187
187
188
188

189
189
190
190
6. The **Output Layout** tab lists all the jars that are included as part of the Maven project. You can select and delete the ones on which the Scala application has no direct dependency. For the application, you're creating here, you can remove all but the last one (**SparkSimpleApp compile output**). Select the jars to delete and then select the negative symbol **-**.
191
191
192
-

192
+

193
193
194
-
Ensure sure the **Include in project build** checkbox is selected, which ensures that the jar is created every time the project is built or updated. Select **Apply** and then **OK**.
194
+
Ensure sure the **Include in project build** checkbox is selected. This option ensures that the jar is created every time the project is built or updated. Select **Apply** and then **OK**.
195
195
196
196
7. To create the jar, navigate to **Build** > **Build Artifacts** > **Build**. The project will compile in about 30 seconds. The output jar is created under **\out\artifacts**.
197
197
@@ -201,7 +201,7 @@ Perform the following steps to install the Scala plugin:
201
201
202
202
To run the application on the cluster, you can use the following approaches:
203
203
204
-
***Copy the application jar to the Azure Storage blob** associated with the cluster. You can use [**AzCopy**](../../storage/common/storage-use-azcopy.md), a command-line utility, to do so. There are many other clients as well that you can use to upload data. You can find more about them at [Upload data for Apache Hadoop jobs in HDInsight](../hdinsight-upload-data.md).
204
+
***Copy the application jar to the Azure Storage blob** associated with the cluster. You can use **AzCopy**, a command-line utility, to do so. There are many other clients as well that you can use to upload data. You can find more about them at [Upload data for Apache Hadoop jobs in HDInsight](../hdinsight-upload-data.md).
205
205
206
206
***Use Apache Livy to submit an application job remotely** to the Spark cluster. Spark clusters on HDInsight includes Livy that exposes REST endpoints to remotely submit Spark jobs. For more information, see [Submit Apache Spark jobs remotely using Apache Livy with Spark clusters on HDInsight](apache-spark-livy-rest-interface.md).
207
207
@@ -219,7 +219,7 @@ If you're not going to continue to use this application, delete the cluster that
0 commit comments