Skip to content

Commit 85d5237

Browse files
authored
Merge pull request #111858 from dagiro/freshness_c7
freshness_c7
2 parents 2af5254 + 7c2cd16 commit 85d5237

File tree

1 file changed

+21
-21
lines changed

1 file changed

+21
-21
lines changed

articles/hdinsight/spark/apache-spark-create-standalone-application.md

Lines changed: 21 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -1,20 +1,20 @@
11
---
22
title: 'Tutorial: Scala Maven app for Spark & IntelliJ - Azure HDInsight'
3-
description: Tutorial - Create a Spark application written in Scala with Apache Maven as the build system and an existing Maven archetype for Scala provided by IntelliJ IDEA.
3+
description: Tutorial - Create a Spark application written in Scala with Apache Maven as the build system. And an existing Maven archetype for Scala provided by IntelliJ IDEA.
44
author: hrasheed-msft
55
ms.author: hrasheed
66
ms.reviewer: jasonh
77
ms.service: hdinsight
88
ms.topic: tutorial
99
ms.custom: hdinsightactive,mvc
10-
ms.date: 02/28/2020
10+
ms.date: 04/17/2020
1111

1212
#customer intent: As a developer new to Apache Spark and to Apache Spark in Azure HDInsight, I want to learn how to create a Scala Maven application for Spark in HDInsight using IntelliJ.
1313
---
1414

1515
# Tutorial: Create a Scala Maven application for Apache Spark in HDInsight using IntelliJ
1616

17-
In this tutorial, you learn how to create an [Apache Spark](./apache-spark-overview.md) application written in [Scala](https://www.scala-lang.org/) using [Apache Maven](https://maven.apache.org/) with IntelliJ IDEA. The article uses Apache Maven as the build system and starts with an existing Maven archetype for Scala provided by IntelliJ IDEA. Creating a Scala application in IntelliJ IDEA involves the following steps:
17+
In this tutorial, you learn how to create an Apache Spark application written in Scala using Apache Maven with IntelliJ IDEA. The article uses Apache Maven as the build system. And starts with an existing Maven archetype for Scala provided by IntelliJ IDEA. Creating a Scala application in IntelliJ IDEA involves the following steps:
1818

1919
* Use Maven as the build system.
2020
* Update Project Object Model (POM) file to resolve Spark module dependencies.
@@ -40,17 +40,17 @@ In this tutorial, you learn how to:
4040

4141
## Install Scala plugin for IntelliJ IDEA
4242

43-
Perform the following steps to install the Scala plugin:
43+
Do the following steps to install the Scala plugin:
4444

4545
1. Open IntelliJ IDEA.
4646

4747
2. On the welcome screen, navigate to **Configure** > **Plugins** to open the **Plugins** window.
4848

49-
![IntelliJ IDEA enable scala plugin](./media/apache-spark-create-standalone-application/enable-scala-plugin1.png)
49+
![`IntelliJ IDEA enable scala plugin`](./media/apache-spark-create-standalone-application/enable-scala-plugin1.png)
5050

5151
3. Select **Install** for the Scala plugin that is featured in the new window.
5252

53-
![IntelliJ IDEA install scala plugin](./media/apache-spark-create-standalone-application/install-scala-plugin.png)
53+
![`IntelliJ IDEA install scala plugin`](./media/apache-spark-create-standalone-application/install-scala-plugin.png)
5454

5555
4. After the plugin installs successfully, you must restart the IDE.
5656

@@ -75,8 +75,8 @@ Perform the following steps to install the Scala plugin:
7575
| Property | Description |
7676
| ----- | ----- |
7777
|Project name| Enter a name.|
78-
|Project location| Enter the desired location to save your project.|
79-
|Project SDK| This will be blank on your first use of IDEA. Select **New...** and navigate to your JDK.|
78+
|Project location| Enter the location to save your project.|
79+
|Project SDK| This field will be blank on your first use of IDEA. Select **New...** and navigate to your JDK.|
8080
|Spark Version|The creation wizard integrates the proper version for Spark SDK and Scala SDK. If the Spark cluster version is earlier than 2.0, select **Spark 1.x**. Otherwise, select **Spark2.x**. This example uses **Spark 2.3.0 (Scala 2.11.8)**.|
8181

8282
![IntelliJ IDEA Selecting the Spark SDK](./media/apache-spark-create-standalone-application/hdi-scala-new-project.png)
@@ -93,18 +93,18 @@ Perform the following steps to install the Scala plugin:
9393

9494
4. Select the **Create from archetype** checkbox.
9595

96-
5. From the list of archetypes, select **org.scala-tools.archetypes:scala-archetype-simple**. This archetype creates the right directory structure and downloads the required default dependencies to write Scala program.
96+
5. From the list of archetypes, select **`org.scala-tools.archetypes:scala-archetype-simple`**. This archetype creates the right directory structure and downloads the required default dependencies to write Scala program.
9797

98-
![IntelliJ IDEA create Maven project](./media/apache-spark-create-standalone-application/intellij-project-create-maven.png)
98+
![`IntelliJ IDEA create Maven project`](./media/apache-spark-create-standalone-application/intellij-project-create-maven.png)
9999

100100
6. Select **Next**.
101101

102-
7. Expand **Artifact Coordinates**. Provide relevant values for **GroupId**, and **ArtifactId**. **Name**, and **Location** will auto-populate. The following values are used in this tutorial:
102+
7. Expand **Artifact Coordinates**. Provide relevant values for **GroupId**, and **ArtifactId**. **Name**, and **Location** will autopopulate. The following values are used in this tutorial:
103103

104104
- **GroupId:** com.microsoft.spark.example
105105
- **ArtifactId:** SparkSimpleApp
106106

107-
![IntelliJ IDEA create Maven project](./media/apache-spark-create-standalone-application/intellij-artifact-coordinates.png)
107+
![`IntelliJ IDEA create Maven project`](./media/apache-spark-create-standalone-application/intellij-artifact-coordinates.png)
108108

109109
8. Select **Next**.
110110

@@ -114,7 +114,7 @@ Perform the following steps to install the Scala plugin:
114114

115115
11. Once the project has imported, from the left pane navigate to **SparkSimpleApp** > **src** > **test** > **scala** > **com** > **microsoft** > **spark** > **example**. Right-click **MySpec**, and then select **Delete...**. You don't need this file for the application. Select **OK** in the dialog box.
116116

117-
12. In the subsequent steps, you update the **pom.xml** to define the dependencies for the Spark Scala application. For those dependencies to be downloaded and resolved automatically, you must configure Maven accordingly.
117+
12. In the later steps, you update the **pom.xml** to define the dependencies for the Spark Scala application. For those dependencies to be downloaded and resolved automatically, you must configure Maven.
118118

119119
13. From the **File** menu, select **Settings** to open the **Settings** window.
120120

@@ -128,7 +128,7 @@ Perform the following steps to install the Scala plugin:
128128

129129
17. From the left pane, navigate to **src** > **main** > **scala** > **com.microsoft.spark.example**, and then double-click **App** to open App.scala.
130130

131-
18. Replace the existing sample code with the following code and save the changes. This code reads the data from the HVAC.csv (available on all HDInsight Spark clusters), retrieves the rows that only have one digit in the sixth column, and writes the output to **/HVACOut** under the default storage container for the cluster.
131+
18. Replace the existing sample code with the following code and save the changes. This code reads the data from the HVAC.csv (available on all HDInsight Spark clusters). Retrieves the rows that only have one digit in the sixth column. And writes the output to **/HVACOut** under the default storage container for the cluster.
132132

133133
package com.microsoft.spark.example
134134

@@ -169,29 +169,29 @@ Perform the following steps to install the Scala plugin:
169169

170170
Save changes to pom.xml.
171171

172-
22. Create the .jar file. IntelliJ IDEA enables creation of JAR as an artifact of a project. Perform the following steps.
172+
22. Create the .jar file. IntelliJ IDEA enables creation of JAR as an artifact of a project. Do the following steps.
173173

174174
1. From the **File** menu, select **Project Structure...**.
175175

176176
2. From the **Project Structure** window, navigate to **Artifacts** > **the plus symbol +** > **JAR** > **From modules with dependencies...**.
177177

178-
![IntelliJ IDEA project structure add jar](./media/apache-spark-create-standalone-application/hdinsight-create-jar1.png)
178+
![`IntelliJ IDEA project structure add jar`](./media/apache-spark-create-standalone-application/hdinsight-create-jar1.png)
179179

180180
3. In the **Create JAR from Modules** window, select the folder icon in the **Main Class** text box.
181181

182182
4. In the **Select Main Class** window, select the class that appears by default and then select **OK**.
183183

184-
![IntelliJ IDEA project structure select class](./media/apache-spark-create-standalone-application/hdinsight-create-jar2.png)
184+
![`IntelliJ IDEA project structure select class`](./media/apache-spark-create-standalone-application/hdinsight-create-jar2.png)
185185

186186
5. In the **Create JAR from Modules** window, ensure the **extract to the target JAR** option is selected, and then select **OK**. This setting creates a single JAR with all dependencies.
187187

188188
![IntelliJ IDEA project structure jar from module](./media/apache-spark-create-standalone-application/hdinsight-create-jar3.png)
189189

190190
6. The **Output Layout** tab lists all the jars that are included as part of the Maven project. You can select and delete the ones on which the Scala application has no direct dependency. For the application, you're creating here, you can remove all but the last one (**SparkSimpleApp compile output**). Select the jars to delete and then select the negative symbol **-**.
191191

192-
![IntelliJ IDEA project structure delete output](./media/apache-spark-create-standalone-application/hdi-delete-output-jars.png)
192+
![`IntelliJ IDEA project structure delete output`](./media/apache-spark-create-standalone-application/hdi-delete-output-jars.png)
193193

194-
Ensure sure the **Include in project build** checkbox is selected, which ensures that the jar is created every time the project is built or updated. Select **Apply** and then **OK**.
194+
Ensure sure the **Include in project build** checkbox is selected. This option ensures that the jar is created every time the project is built or updated. Select **Apply** and then **OK**.
195195

196196
7. To create the jar, navigate to **Build** > **Build Artifacts** > **Build**. The project will compile in about 30 seconds. The output jar is created under **\out\artifacts**.
197197

@@ -201,7 +201,7 @@ Perform the following steps to install the Scala plugin:
201201

202202
To run the application on the cluster, you can use the following approaches:
203203

204-
* **Copy the application jar to the Azure Storage blob** associated with the cluster. You can use [**AzCopy**](../../storage/common/storage-use-azcopy.md), a command-line utility, to do so. There are many other clients as well that you can use to upload data. You can find more about them at [Upload data for Apache Hadoop jobs in HDInsight](../hdinsight-upload-data.md).
204+
* **Copy the application jar to the Azure Storage blob** associated with the cluster. You can use **AzCopy**, a command-line utility, to do so. There are many other clients as well that you can use to upload data. You can find more about them at [Upload data for Apache Hadoop jobs in HDInsight](../hdinsight-upload-data.md).
205205

206206
* **Use Apache Livy to submit an application job remotely** to the Spark cluster. Spark clusters on HDInsight includes Livy that exposes REST endpoints to remotely submit Spark jobs. For more information, see [Submit Apache Spark jobs remotely using Apache Livy with Spark clusters on HDInsight](apache-spark-livy-rest-interface.md).
207207

@@ -219,7 +219,7 @@ If you're not going to continue to use this application, delete the cluster that
219219

220220
1. Select **Delete**. Select **Yes**.
221221

222-
![HDInsight azure portal delete cluster](./media/apache-spark-create-standalone-application/hdinsight-azure-portal-delete-cluster.png "Delete HDInsight cluster")
222+
![`HDInsight azure portal delete cluster`](./media/apache-spark-create-standalone-application/hdinsight-azure-portal-delete-cluster.png "Delete HDInsight cluster")
223223

224224
## Next step
225225

0 commit comments

Comments
 (0)