Skip to content

Commit 476cf61

Browse files
authored
add an emr serverless sample (#220)
1 parent 82de1ed commit 476cf61

File tree

5 files changed

+140
-0
lines changed

5 files changed

+140
-0
lines changed
Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
target/

emr-serverless-spark/README.md

Lines changed: 108 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,108 @@
1+
# Running EMR Serverless Jobs with Java
2+
3+
We will run a Java Spark job on EMR Serverless using a simple Java "Hello World" example in this example.
4+
5+
## Prerequisites
6+
7+
* LocalStack
8+
* `aws` CLI & `awslocal` script
9+
* Docker
10+
* Java and Maven
11+
12+
## Installation
13+
14+
Before creating the EMR Serverless job, we need to create a JAR file containing the Java code. We have the `java-demo-1.0.jar` file in the current directory. Alternatively, you can create the JAR file yourself by following the steps below.
15+
16+
```bash
17+
cd hello-world
18+
mvn package
19+
```
20+
21+
Next, we need to create an S3 bucket to store the JAR file. To do this, run the following command:
22+
23+
```bash
24+
export S3_BUCKET=test
25+
awslocal s3 mb s3://$S3_BUCKET
26+
```
27+
28+
You can now copy the JAR file from your current directory to the S3 bucket:
29+
30+
```bash
31+
awslocal s3 cp java-demo-1.0.jar s3://${S3_BUCKET}/code/java-spark/
32+
```
33+
34+
## Creating the EMR Serverless Job
35+
36+
Specify the ARN for the EMR Serverless job with the following command:
37+
38+
```bash
39+
export JOB_ROLE_ARN=arn:aws:iam::000000000000:role/emr-serverless-job-role
40+
```
41+
42+
We can now create an EMR Serverless application, which will run Spark 3.3.0. Run the following command:
43+
44+
```bash
45+
awslocal emr-serverless create-application \
46+
--type SPARK \
47+
--name serverless-java-demo \
48+
--release-label "emr-6.9.0" \
49+
--initial-capacity '{
50+
"DRIVER": {
51+
"workerCount": 1,
52+
"workerConfiguration": {
53+
"cpu": "4vCPU",
54+
"memory": "16GB"
55+
}
56+
},
57+
"EXECUTOR": {
58+
"workerCount": 3,
59+
"workerConfiguration": {
60+
"cpu": "4vCPU",
61+
"memory": "16GB"
62+
}
63+
}
64+
}'
65+
```
66+
67+
You can retrieve the Application ID from the output of the command, and export it as an environment variable:
68+
69+
```bash
70+
export APPLICATION_ID='<application-id>'
71+
```
72+
73+
Start the EMR Serverless application:
74+
75+
```shell
76+
awslocal emr-serverless start-application \
77+
--application-id $APPLICATION_ID
78+
```
79+
80+
## Running the EMR Serverless Job
81+
82+
You can now run the EMR Serverless job:
83+
84+
```bash
85+
awslocal emr-serverless start-job-run \
86+
--application-id $APPLICATION_ID \
87+
--execution-role-arn $JOB_ROLE_ARN \
88+
--job-driver '{
89+
"sparkSubmit": {
90+
"entryPoint": "s3://'${S3_BUCKET}'/code/java-spark/java-demo-1.0.jar",
91+
"sparkSubmitParameters": "--class HelloWorld"
92+
}
93+
}' \
94+
--configuration-overrides '{
95+
"monitoringConfiguration": {
96+
"s3MonitoringConfiguration": {
97+
"logUri": "s3://'${S3_BUCKET}'/logs/"
98+
}
99+
}
100+
}'
101+
```
102+
103+
The Spark logs will be written to the S3 bucket specified in the `logUri` parameter. You can stop the EMR Serverless application with the following command:
104+
105+
```bash
106+
awslocal emr-serverless stop-application \
107+
--application-id $APPLICATION_ID
108+
```
Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
<project>
2+
<groupId>aws.emr-serverless-samples</groupId>
3+
<artifactId>java-demo</artifactId>
4+
<modelVersion>4.0.0</modelVersion>
5+
<name>EMR Serverless Samples</name>
6+
<packaging>jar</packaging>
7+
<version>1.0</version>
8+
<dependencies>
9+
<dependency> <!-- Spark dependency -->
10+
<groupId>org.apache.spark</groupId>
11+
<artifactId>spark-sql_2.12</artifactId>
12+
<version>3.3.0</version>
13+
<scope>provided</scope>
14+
</dependency>
15+
</dependencies>
16+
<properties>
17+
<maven.compiler.source>1.8</maven.compiler.source>
18+
<maven.compiler.target>1.8</maven.compiler.target>
19+
</properties>
20+
</project>
Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
import org.apache.spark.sql.SparkSession;
2+
3+
public class HelloWorld {
4+
public static void main(String[] args) {
5+
SparkSession spark = SparkSession.builder().appName("Simple Application").getOrCreate();
6+
7+
System.out.println("Hello, from LocalStack's EMR Serverless implementation!");
8+
9+
spark.stop();
10+
}
11+
}
2.12 KB
Binary file not shown.

0 commit comments

Comments
 (0)