You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
title: Deploy Apache Spark on Google Axion processors
3
-
4
-
draft: true
5
-
cascade:
6
-
draft: true
7
-
3
+
8
4
minutes_to_complete: 60
9
5
10
-
who_is_this_for: This is an introductory topic for the software developers interested in migrating their Apache Spark workloads from x86_64 platforms to Arm-based platforms, or on Google Axionbased C4A virtual machines specifically.
6
+
who_is_this_for: This introductory topic is for software developers interested in migrating their Apache Spark workloads from x86_64 platforms to Arm-based platforms, specifically on Google Axion–based C4A virtual machines.
11
7
12
8
learning_objectives:
13
-
- Start an Arm virtual machine on the Google Cloud Platform using the C4A Google Axion instance family with RHEL 9 as the base image.
14
-
- Learn how to install and configure Apache Spark on Arm-based GCP C4A instances.
15
-
- Validate the functionality of spark through baseline testing.
16
-
- Benchmark Apache Spark’s performance on Arm.
9
+
- Start an Arm virtual machine on Google Cloud Platform (GCP) using the C4A Google Axion instance family with RHEL 9 as the base image
10
+
- Install and configure Apache Spark on Arm-based GCP C4A instances
11
+
- Validate Spark functionality through baseline testing
12
+
- Benchmark Apache Spark performance on Arm
17
13
18
14
prerequisites:
19
-
- A [Google Cloud Platform (GCP)](https://cloud.google.com/free?utm_source=google&hl=en) account with billing enabled.
20
-
- Familiarity with distributed computing concepts and the [Apache Spark architecture](https://spark.apache.org/docs/latest/).
15
+
- A [Google Cloud Platform (GCP)](https://cloud.google.com/free?utm_source=google&hl=en) account with billing enabled
16
+
- Familiarity with distributed computing concepts and the [Apache Spark architecture](https://spark.apache.org/docs/latest/)
21
17
22
18
author: Pareena Verma
23
19
@@ -27,36 +23,37 @@ subjects: Performance and Architecture
Copy file name to clipboardExpand all lines: content/learning-paths/servers-and-cloud-computing/spark-on-gcp/background.md
+4-4Lines changed: 4 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,20 +1,20 @@
1
1
---
2
-
title: "Google Axion C4A and Apache Spark"
2
+
title: Getting started with Apache Spark on Google Axion C4A (Arm Neoverse-V2)
3
3
4
4
weight: 2
5
5
6
6
layout: "learningpathall"
7
7
---
8
8
9
-
## Google Axion C4A instances
9
+
## Google Axion C4A Arm instances in Google Cloud
10
10
11
-
Google Axion C4A is a family of Arm-based virtual machines built on Google’s custom Axion CPU, which is based on Arm Neoverse-V2 cores. Designed for high-performance and energy-efficient computing, these virtual machine offer strong performance ideal for modern cloud workloads such as CI/CD pipelines, microservices, media processing, and general-purpose applications.
11
+
Google Axion C4A is a family of Arm-based virtual machines built on Google’s custom Axion CPU, which is based on Arm Neoverse-V2 cores. Designed for high-performance and energy-efficient computing, these virtual machines offer strong performance for modern cloud workloads such as CI/CD pipelines, microservices, media processing, and general-purpose applications.
12
12
13
13
The C4A series provides a cost-effective alternative to x86 virtual machines while leveraging the scalability and performance benefits of the Arm architecture in Google Cloud.
14
14
15
15
To learn more about Google Axion, refer to the [Introducing Google Axion Processors, our new Arm-based CPUs](https://cloud.google.com/blog/products/compute/introducing-googles-new-arm-based-cpu) blog.
16
16
17
-
## Apache Spark
17
+
## Apache Spark for big data processing on Arm
18
18
19
19
Apache Spark is an open-source, distributed computing system designed for fast and general-purpose big data processing.
title: Apache Spark baseline testing on Google Axion C4A Arm VM
3
3
weight: 5
4
4
5
5
### FIXED, DO NOT MODIFY
6
6
layout: learningpathall
7
7
---
8
+
## Validate Apache Spark installation with a baseline test
8
9
10
+
With Apache Spark installed successfully on your GCP C4A Arm-based virtual machine, you can now perform simple baseline testing to validate that Spark runs correctly and produces the expected output.
9
11
10
-
With Apache Spark installed successfully on your GCP C4A Arm-based virtual machine, you can now perform simple baseline testing to validate that Spark runs correctly and gives expected output.
12
+
## Run a baseline test for Apache Spark on Arm
11
13
12
-
## Spark Baseline Test
14
+
Use a text editor of your choice to create a simple Spark job file:
13
15
14
-
Using a file editor of your choice, create a simple Spark job file:
15
16
```console
16
17
nano ~/spark_baseline_test.scala
17
18
```
18
-
Copy the content below into `spark_baseline_test.scala`:
19
19
20
-
```console
21
-
val data = Seq(1, 2, 3, 4, 5)
22
-
val distData = spark.sparkContext.parallelize(data)
This is a basic Apache Spark example in Scala, demonstrating how to create an RDD (Resilient Distributed Dataset), perform a transformation, and collect results.
30
31
31
-
Lets look into the code, step by step:
32
+
This Scala example shows how to create an RDD (Resilient Distributed Dataset), apply a transformation, and collect results.
32
33
33
-
-**val data = Seq(1, 2, 3, 4, 5)** : Creates a local Scala sequence of integers.
34
-
-**val distData = spark.sparkContext.parallelize(data)** : Uses parallelize to convert the local sequence into a distributed RDD (so Spark can operate on it in parallel across cluster nodes or CPU cores).
35
-
-**val squared = distData.map(x => x * x).collect()** : `map(x => x * x)` squares each element in the list, `.collect()` brings all the transformed data back to the driver program as a regular Scala collection.
36
-
-**println("Squared values: " + squared.mkString(", "))** : Prints the squared values, joined by commas.
34
+
Here’s a step-by-step breakdown of the code:
37
35
36
+
-**`val data = Seq(1, 2, 3, 4, 5)`**: Creates a local Scala sequence of integers
37
+
-**`val distData = spark.sparkContext.parallelize(data)`**: Converts the local sequence into a distributed RDD, so Spark can process it in parallel across CPU cores or cluster nodes
38
+
-**`val squared = distData.map(x => x * x).collect()`**: Squares each element using `map`, then gathers results back to the driver program with `collect`
39
+
-**`println("Squared values: " + squared.mkString(", "))`**: Prints the squared values as a comma-separated list
38
40
39
-
### Run the Test in Spark Shell
41
+
## Run the Apache Spark baseline test in Spark shell
42
+
43
+
Run the test file in the interactive Spark shell:
40
44
41
-
Run the test you created in the interactive shell:
42
45
```console
43
-
spark-shell < ~/spark_baseline_test.scala
46
+
spark-shell < ~/spark_baseline_test.scala
44
47
```
45
-
The output should look similar to:
48
+
49
+
Alternatively, you can start the spark shell and then load the file from inside the shell:
50
+
51
+
```console
52
+
spark-shell
53
+
```
54
+
```scala
55
+
:load spark_baseline_test.scala
56
+
```
57
+
58
+
You should see output similar to:
59
+
46
60
```output
47
61
Squared values: 1, 4, 9, 16, 25
48
62
```
49
-
This confirms that Spark is working correctly with its driver, executor, and cluster manager in local mode.
50
-
63
+
64
+
This confirms that Spark is running correctly in local mode with its driver, executor, and cluster manager.
Copy file name to clipboardExpand all lines: content/learning-paths/servers-and-cloud-computing/spark-on-gcp/benchmarking.md
+14-10Lines changed: 14 additions & 10 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,12 +1,11 @@
1
1
---
2
-
title: Run Spark Benchmarks
2
+
title: Apache Spark performance benchmarks on Arm64 and x86_64 in Google Cloud
3
3
weight: 6
4
4
5
-
### FIXED, DO NOT MODIFY
6
5
layout: learningpathall
7
6
---
8
7
9
-
## Apache Spark Benchmarking
8
+
## How to run Apache Spark benchmarks on Arm64 in GCP
10
9
Apache Spark includes internal micro-benchmarks to evaluate the performance of core components like SQL execution, aggregation, joins, and data source reads. These benchmarks are helpful for comparing performance on x86_64 vs Arm64 platforms.
11
10
12
11
Follow the steps outlined to run Spark’s built-in SQL benchmarks using the SBT-based framework.
@@ -33,9 +32,11 @@ This compiles Spark and its dependencies, enabling the benchmarks build profile
This executes the `AggregateBenchmark`, which compares performance of SQL aggregation operations (e.g., SUM, STDDEV) with and without `WholeStageCodegen`. `WholeStageCodegen` is an optimization technique used by Spark SQL to improve the performance of query execution by generating Java bytecode for entire query stages (aka whole stages) instead of interpreting them step-by-step.
35
+
This executes the `AggregateBenchmark`, which compares performance of SQL aggregation operations (e.g., SUM, STDDEV) with and without `WholeStageCodegen`. `WholeStageCodegen` is an optimization technique used by Spark SQL to improve the performance of query execution by generating Java bytecode for entire query stages instead of interpreting them step-by-step.
37
36
37
+
## Example Apache Spark benchmark output (Arm64)
38
38
You should see output similar to:
39
+
39
40
```output
40
41
[info] Running benchmark: agg w/o group
41
42
[info] Running case: agg w/o group wholestage off
@@ -235,7 +236,7 @@ You should see output similar to:
235
236
[success] Total time: 669 s (11:09), completed Jul 24, 2025, 5:41:24 AM
236
237
237
238
```
238
-
### Benchmark Results Table Explained:
239
+
##Understanding Apache Spark benchmark metrics and results
239
240
240
241
-**Best Time (ms):** Fastest execution time observed (in milliseconds).
241
242
-**Avg Time (ms):** Average time across all iterations.
@@ -244,7 +245,7 @@ You should see output similar to:
244
245
-**Per Row (ns):** Average time taken per row (in nanoseconds).
245
246
-**Relative Speed comparison:** baseline (1.0X) is the slower version.
246
247
247
-
### Benchmark summary on `x86_64`:
248
+
##Apache Spark performance benchmark results on x86_64
248
249
The following benchmark results were collected by running the same benchmark on a `c3-standard-4` (4 vCPU, 2 core, 16 GB Memory) x86_64 virtual machine in GCP, running RHEL 9.
249
250
250
251
|**Benchmark Case**|**Sub-Case / Config**|**Best Time (ms)**|**Avg Time (ms)**|**Stdev (ms)**|**Rate (M/s)**|**Per Row (ns)**|**Relative**|
@@ -293,9 +294,10 @@ The following benchmark results were collected by running the same benchmark on
For easier comparison, the benchmark results collected from the earlier run on the `c4a-standard-4` (4 vCPU, 16 GB Memory) virtual machine, running RHEL 9 is summarized below:
299
+
##Apache Spark performance benchmark results on Arm64
300
+
Results from the earlier run on the `c4a-standard-4` (4 vCPU, 16 GB memory) Arm64 VM in GCP (RHEL 9):
299
301
300
302
| Benchmark Case | Sub-Case / Config | Best Time (ms) | Avg Time (ms) | Stdev (ms) | Rate (M/s) | Per Row (ns) | Relative |
## Apache Spark performance benchmarking comparison on Arm64 and x86_64
335
339
When you compare the benchmarking results you will notice that on the Google Axion C4A Arm-based instances:
336
340
337
341
-**Whole-stage code generation significantly boosts performance**, improving execution by up to **3×** (e.g., `agg w/o group` from 2728 ms to 856 ms).
338
342
-**Aggregation with Keys**, across row-based and non-hashmap variants deliver ~1.7–5.4× speedups.
339
-
For simple codegen+vectorized hashmap, x86 and Arm-based instances show similar performance.
340
343
-**Arm-based Spark shows strong hash performance**, `murmur3` and `UnsafeRowhash` on Arm-based instances are ~3×–5× faster, with the aggregate hashmap ~6× faster; the `fast hash` path is roughly on par.
341
344
342
345
Overall, when whole-stage codegen and vectorized hashmap paths are used, you should see multi-fold speedups on the Google Axion C4A Arm-based instances.
title: Create a Google Axion C4A Arm virtual machine
2
+
title: How to create a Google Axion C4A Arm virtual machine on GCP
3
3
weight: 3
4
4
5
5
### FIXED, DO NOT MODIFY
6
6
layout: learningpathall
7
7
---
8
8
9
-
## Introduction
9
+
## How to create a Google Axion C4A Arm VM on Google Cloud
10
10
11
-
In this section you will learn how to provision a **Google Axion C4A Arm virtual machine** on GCP with the **c4a-standard-4 (4 vCPUs, 16 GB Memory)** machine type, using the **Google Cloud Console**.
11
+
In this section, you learn how to provision a **Google Axion C4A Arm virtual machine** on Google Cloud Platform (GCP) using the **c4a-standard-4 (4 vCPUs, 16 GB memory)** machine type in the **Google Cloud Console**.
12
12
13
-
For more details, kindly follow the Learning Path on [Getting Started with Google Cloud Platform](https://learn.arm.com/learning-paths/servers-and-cloud-computing/csp/google/).
13
+
For background on GCP setup, see the Learning Path [Getting started with Google Cloud Platform](https://learn.arm.com/learning-paths/servers-and-cloud-computing/csp/google/).
14
14
15
-
### Create an Arm-based Virtual Machine (C4A)
15
+
### Create a Google Axion C4A Arm VM in Google Cloud Console
16
16
17
17
To create a virtual machine based on the C4A Arm architecture:
18
18
1. Navigate to the [Google Cloud Console](https://console.cloud.google.com/).
19
-
2. Go to **Compute Engine > VM Instances** and click on **Create Instance**.
20
-
3. Under the **Machine Configuration**:
21
-
- Fill in basic details like **Instance Name**, **Region**, and **Zone**.
22
-
- Choose the **Series** as `C4A`.
23
-
- Select a machine type such as `c4a-standard-4`.
24
-

25
-
4. Under the **OS and Storage**, click on **Change**, and select Arm64 based OS Image of your choice. For this Learning Path, choose **Red Hat Enterprise Linux** as the Operating System with **Red Hat Enterprise Linux 9** as the Version. Make sure you pick the version of image for Arm. Click on the **Select**.
26
-
5. Under **Networking**, enable **Allow HTTP traffic** to allow HTTP communication.
27
-
6. Click on **Create**, and the instance will launch.
19
+
2. Go to **Compute Engine > VM Instances** and select **Create Instance**.
20
+
3. Under **Machine configuration**:
21
+
- Enter details such as **Instance name**, **Region**, and **Zone**.
22
+
- Set **Series** to `C4A`.
23
+
- Select a machine type such as `c4a-standard-4`.
24
+
25
+

26
+
27
+
4. Under **OS and Storage**, select **Change**, then choose an Arm64-based OS image.
28
+
For this Learning Path, use **Red Hat Enterprise Linux 9**. Ensure you select the **Arm image** variant. Click **Select**.
29
+
5. Under **Networking**, enable **Allow HTTP traffic**.
0 commit comments