Skip to content

Commit 7c2d62d

Browse files
committed
Deploy TensorFlow on Google Cloud C4A (Arm-based Axion VMs)
Signed-off-by: odidev <[email protected]>
1 parent c7cf14a commit 7c2d62d

File tree

8 files changed

+419
-0
lines changed

8 files changed

+419
-0
lines changed
Lines changed: 58 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,58 @@
1+
---
2+
title: Deploy TensorFlow on Google Cloud C4A (Arm-based Axion VMs)
3+
4+
minutes_to_complete: 30
5+
6+
who_is_this_for: This learning path is intended for software developers deploying and optimizing TensorFlow workloads on Linux/Arm64 environments, specifically using Google Cloud C4A virtual machines powered by Axion processors.
7+
8+
learning_objectives:
9+
- Provision an Arm-based SUSE SLES virtual machine on Google Cloud (C4A with Axion processors)
10+
- Install TensorFlow on a SUSE Arm64 (C4A) instance
11+
- Verify TensorFlow by running basic computation and model training tests on Arm64
12+
- Benchmark TensorFlow using tf.keras to evaluate inference speed and model performance on Arm64 systems.
13+
14+
prerequisites:
15+
- A [Google Cloud Platform (GCP)](https://cloud.google.com/free) account with billing enabled
16+
- Basic familiarity with [TensorFlow](https://www.tensorflow.org/)
17+
18+
author: Pareena Verma
19+
20+
##### Tags
21+
skilllevels: Introductory
22+
subjects: ML
23+
cloud_service_providers: Google Cloud
24+
25+
armips:
26+
- Neoverse
27+
28+
tools_software_languages:
29+
- TensorFlow
30+
- Python
31+
- tf.keras
32+
33+
operatingsystems:
34+
- Linux
35+
36+
# ================================================================================
37+
# FIXED, DO NOT MODIFY
38+
# ================================================================================
39+
further_reading:
40+
- resource:
41+
title: Google Cloud documentation
42+
link: https://cloud.google.com/docs
43+
type: documentation
44+
45+
- resource:
46+
title: TensorFlow documentation
47+
link: https://www.tensorflow.org/learn
48+
type: documentation
49+
50+
- resource:
51+
title: Phoronix Test Suite (PTS) documentation
52+
link: https://www.phoronix-test-suite.com/
53+
type: documentation
54+
55+
weight: 1
56+
layout: "learningpathall"
57+
learning_path_main_page: "yes"
58+
---
Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
---
2+
# ================================================================================
3+
# FIXED, DO NOT MODIFY THIS FILE
4+
# ================================================================================
5+
weight: 21 # Set to always be larger than the content in this path to be at the end of the navigation.
6+
title: "Next Steps" # Always the same, html page title.
7+
layout: "learningpathall" # All files under learning paths have this same wrapper for Hugo processing.
8+
---
Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,24 @@
1+
---
2+
title: Getting started with TensorFlow on Google Axion C4A (Arm Neoverse-V2)
3+
4+
weight: 2
5+
6+
layout: "learningpathall"
7+
---
8+
9+
## Google Axion C4A Arm instances in Google Cloud
10+
11+
Google Axion C4A is a family of Arm-based virtual machines built on Google’s custom Axion CPU, which is based on Arm Neoverse-V2 cores. Designed for high-performance and energy-efficient computing, these virtual machines offer strong performance for modern cloud workloads such as CI/CD pipelines, microservices, media processing, and general-purpose applications.
12+
13+
The C4A series provides a cost-effective alternative to x86 virtual machines while leveraging the scalability and performance benefits of the Arm architecture in Google Cloud.
14+
15+
To learn more about Google Axion, refer to the [Introducing Google Axion Processors, our new Arm-based CPUs](https://cloud.google.com/blog/products/compute/introducing-googles-new-arm-based-cpu) blog.
16+
17+
## TensorFlow
18+
19+
[TensorFlow](https://www.tensorflow.org/) is an **open-source machine learning and deep learning framework** developed by **Google**. It helps developers and researchers **build, train, and deploy AI models** efficiently across **CPUs, GPUs, and TPUs**.
20+
21+
With support for **neural networks**, **natural language processing (NLP)**, and **computer vision**, TensorFlow is widely used for **AI research and production**.
22+
Its **flexibility** and **scalability** make it ideal for both **cloud** and **edge environments**.
23+
24+
To learn more, visit the [official TensorFlow website](https://www.tensorflow.org/).
Lines changed: 95 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,95 @@
1+
---
2+
title: TensorFlow Baseline Testing on Google Axion C4A Arm Virtual Machine
3+
weight: 5
4+
5+
### FIXED, DO NOT MODIFY
6+
layout: learningpathall
7+
---
8+
9+
## TensorFlow Baseline Testing on GCP SUSE VMs
10+
This section helps you check if TensorFlow is properly installed and working on your **Google Axion C4A Arm64 VM**. You will run small tests to confirm that your CPU can perform TensorFlow operations correctly.
11+
12+
13+
### Verify Installation
14+
This command checks if TensorFlow is installed correctly and prints its version number.
15+
16+
```console
17+
python -c "import tensorflow as tf; print(tf.__version__)"
18+
```
19+
### List Available Devices
20+
This command shows which hardware devices TensorFlow can use — like CPU or GPU. On most VMs, you’ll see only CPU listed.
21+
22+
```console
23+
python -c "import tensorflow as tf; print(tf.config.list_physical_devices())"
24+
```
25+
26+
You should see an output similar to:
27+
```output
28+
[PhysicalDevice(name='/physical_device:CPU:0', device_type='CPU')]
29+
```
30+
31+
### Run a Simple Computation
32+
This test multiplies two large matrices to check that TensorFlow computations work correctly on your CPU and measures how long it takes.
33+
34+
```python
35+
python -c "import tensorflow as tf; import time;
36+
a = tf.random.uniform((1000,1000)); b = tf.random.uniform((1000,1000));
37+
start = time.time(); c = tf.matmul(a,b); end = time.time();
38+
print('Computation time:', end - start, 'seconds')"
39+
```
40+
- This checks **CPU speed** and the correctness of basic operations.
41+
- Note the **computation time** as your baseline.
42+
43+
You should see an output similar to:
44+
```output
45+
Computation time: 0.008263111114501953 seconds
46+
```
47+
### Test Neural Network Execution
48+
Create a new file for testing a simple neural network:
49+
50+
```console
51+
vi test_nn.py
52+
```
53+
This opens a new Python file where you’ll write a short TensorFlow test program.
54+
Paste the code below into the `test_nn.py` file:
55+
56+
```python
57+
import tensorflow as tf
58+
from tensorflow.keras.models import Sequential
59+
from tensorflow.keras.layers import Dense
60+
import numpy as np
61+
62+
# Dummy data
63+
x = np.random.rand(1000, 20)
64+
y = np.random.rand(1000, 1)
65+
66+
# Define the model
67+
model = Sequential([
68+
Dense(64, activation='relu', input_shape=(20,)),
69+
Dense(1)
70+
])
71+
72+
# Compile the model
73+
model.compile(optimizer='adam', loss='mse')
74+
75+
# Train for 1 epoch
76+
model.fit(x, y, epochs=1, batch_size=32)
77+
```
78+
This script creates and trains a simple neural network using random data — just to make sure TensorFlow’s deep learning functions work properly.
79+
80+
**Run the Script**
81+
82+
Execute the script with Python:
83+
84+
```console
85+
python test_nn.py
86+
```
87+
88+
**Output**
89+
90+
TensorFlow will print training progress, like:
91+
```output
92+
32/32 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - loss: 0.1024
93+
```
94+
95+
This confirms that TensorFlow is working properly on your Arm64 VM.
Lines changed: 131 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,131 @@
1+
---
2+
title: TensorFlow Benchmarking
3+
weight: 6
4+
5+
### FIXED, DO NOT MODIFY
6+
layout: learningpathall
7+
---
8+
9+
10+
## TensorFlow Benchmarking with tf.keras
11+
This guide benchmarks multiple TensorFlow models (ResNet50, MobileNetV2, and InceptionV3) using dummy input data. It measures average inference time and throughput for each model running on the CPU.
12+
13+
`tf.keras` is **TensorFlow’s high-level API** for building, training, and benchmarking deep learning models. It provides access to **predefined architectures** such as **ResNet**, **MobileNet**, and **Inception**, making it easy to evaluate model performance on different hardware setups like **CPU**, **GPU**, or **TPU**.
14+
15+
### Activate your TensorFlow virtual environment
16+
This step enables your isolated Python environment (`tf-venv`) where TensorFlow is installed. It ensures that all TensorFlow-related packages and dependencies run in a clean, controlled setup without affecting system-wide Python installations.
17+
18+
```console
19+
source ~/tf-venv/bin/activate
20+
python -c "import tensorflow as tf; print(tf.__version__)"
21+
```
22+
### Install required packages
23+
Here, you install TensorFlow 2.20.0 and NumPy, the core libraries needed for model creation, computation, and benchmarking. NumPy supports efficient numerical operations, while TensorFlow handles deep learning workloads.
24+
25+
```console
26+
pip install tensorflow==2.20.0 numpy
27+
```
28+
29+
### Create a Python file named tf_cpu_benchmark.py:
30+
This step creates a Python script (`tf_cpu_benchmark.py`) that will run TensorFlow model benchmarking tests.
31+
32+
```console
33+
vi tf_cpu_benchmark.py
34+
```
35+
36+
Paste the following code:
37+
```python
38+
import tensorflow as tf
39+
import time
40+
41+
# List of models to benchmark
42+
models = {
43+
"ResNet50": tf.keras.applications.ResNet50,
44+
"MobileNetV2": tf.keras.applications.MobileNetV2,
45+
"InceptionV3": tf.keras.applications.InceptionV3
46+
}
47+
48+
batch_size = 32
49+
num_runs = 50
50+
51+
for name, constructor in models.items():
52+
print(f"\nBenchmarking {name}...")
53+
# Create model without pretrained weights
54+
model = constructor(weights=None, input_shape=(224,224,3))
55+
# Generate dummy input
56+
dummy_input = tf.random.uniform([batch_size, 224, 224, 3])
57+
# Warm-up
58+
_ = model(dummy_input)
59+
# Benchmark
60+
start = time.time()
61+
for _ in range(num_runs):
62+
_ = model(dummy_input)
63+
end = time.time()
64+
avg_time = (end - start) / num_runs
65+
throughput = batch_size / avg_time
66+
print(f"{name} average inference time per batch: {avg_time:.4f} seconds")
67+
print(f"{name} throughput: {throughput:.2f} images/sec")
68+
```
69+
- **Import libraries** – Loads TensorFlow and `time` for model creation and timing.
70+
- **Define models** – Lists three TensorFlow Keras models: **ResNet50**, **MobileNetV2**, and **InceptionV3**.
71+
- **Set parameters** – Configures `batch_size = 32` and runs each model **50 times** for stable benchmarking.
72+
- **Create model instances** – Initializes each model **without pretrained weights** for fair CPU testing.
73+
- **Generate dummy input** – Creates random data shaped like real images **(224×224×3)** for inference.
74+
- **Warm-up phase** – Runs one inference to **stabilize model graph and memory usage**.
75+
- **Benchmark loop** – Measures total time for 50 runs and calculates **average inference time per batch**.
76+
- **Compute throughput** – Calculates how many **images per second** the model can process.
77+
- **Print results** – Displays **average inference time and throughput** for each model.
78+
79+
### Run the benchmark
80+
Execute the benchmarking script:
81+
82+
```console
83+
python tf_cpu_benchmark.py
84+
```
85+
86+
You should see an output similar to:
87+
```output
88+
Benchmarking ResNet50...
89+
ResNet50 average inference time per batch: 1.2051 seconds
90+
ResNet50 throughput: 26.55 images/sec
91+
92+
Benchmarking MobileNetV2...
93+
MobileNetV2 average inference time per batch: 0.2909 seconds
94+
MobileNetV2 throughput: 110.02 images/sec
95+
96+
Benchmarking InceptionV3...
97+
InceptionV3 average inference time per batch: 0.8971 seconds
98+
InceptionV3 throughput: 35.67 images/sec
99+
```
100+
101+
### Benchmark Metrics Explanation
102+
103+
- **Average Inference Time per Batch (seconds):** Measures how long it takes to process one batch of input data. Lower values indicate faster inference performance.
104+
- **Throughput (images/sec):** Indicates how many images the model can process per second. Higher throughput means better overall efficiency.
105+
- **Model Type:** Refers to the neural network architecture used for testing (e.g., ResNet50, MobileNetV2, InceptionV3). Each model has different computational complexity.
106+
107+
### Benchmark summary on x86_64
108+
To compare the benchmark results, the following results were collected by running the same benchmark on a `x86 - c4-standard-4` (4 vCPUs, 15 GB Memory) x86_64 VM in GCP, running SUSE:
109+
110+
| **Model** | **Average Inference Time per Batch (seconds)** | **Throughput (images/sec)** |
111+
|------------------|-----------------------------------------------:|-----------------------------:|
112+
| **ResNet50** | 1.3690 | 23.37 |
113+
| **MobileNetV2** | 0.4274 | 74.87 |
114+
| **InceptionV3** | 0.8799 | 36.37 |
115+
116+
### Benchmark summary on Arm64
117+
Results from the earlier run on the `c4a-standard-4` (4 vCPU, 16 GB memory) Arm64 VM in GCP (SUSE):
118+
119+
| **Model** | **Average Inference Time per Batch (seconds)** | **Throughput (images/sec)** |
120+
|------------------|-----------------------------------------------:|-----------------------------:|
121+
| **ResNet50** | 1.2051 | 26.55 |
122+
| **MobileNetV2** | 0.2909 | 110.02 |
123+
| **InceptionV3** | 0.8971 | 35.67 |
124+
125+
### TensorFlow benchmarking comparison on Arm64 and x86_64
126+
127+
- **Arm64 VMs show strong performance** for lightweight CNNs like **MobileNetV2**, achieving over **110 images/sec**, indicating excellent optimization for CPU-based inference.
128+
- **Medium-depth models** like **InceptionV3** maintain a **balanced trade-off between accuracy and latency**, confirming consistent multi-core utilization on Arm.
129+
- **Heavier architectures** such as **ResNet50** show expected longer inference times but still deliver **stable throughput**, reflecting good floating-point efficiency.
130+
- Compared to **x86_64**, **Arm64 provides energy-efficient yet competitive performance**, particularly for **mobile, quantized, or edge AI workloads**.
131+
- **Overall**, Arm64 demonstrates that **TensorFlow workloads can run efficiently on cloud-native ARM processors**, making them a **cost-effective and power-efficient alternative** for AI inference and model prototyping.
261 KB
Loading
Lines changed: 72 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,72 @@
1+
---
2+
title: Install TensorFlow
3+
weight: 4
4+
5+
### FIXED, DO NOT MODIFY
6+
layout: learningpathall
7+
---
8+
9+
## TensorFlow Installation on GCP SUSE VM
10+
TensorFlow is a widely used **open-source machine learning library** developed by Google, designed for building and deploying ML models efficiently. On Arm64 SUSE VMs, TensorFlow can run on CPU natively, or on GPU if available.
11+
12+
### System Preparation
13+
Update the system and install Python3 and pip:
14+
15+
```console
16+
sudo zypper refresh
17+
sudo zypper update -y
18+
sudo zypper install -y python3 python3-pip python3-venv
19+
```
20+
This ensures your system is up-to-date and installs Python with the essential tools required for TensorFlow setup.
21+
22+
**Verify Python version:**
23+
24+
Confirm that Python and pip are correctly installed and identify their versions to ensure compatibility with TensorFlow requirements.
25+
26+
```console
27+
python3 --version
28+
pip3 --version
29+
```
30+
31+
### Create a Virtual Environment (Recommended)
32+
Set up an isolated Python environment (`tf-venv`) so that TensorFlow and its dependencies don’t interfere with system-wide packages or other projects.
33+
34+
```console
35+
python3 -m venv tf-venv
36+
source tf-venv/bin/activate
37+
```
38+
Create and activate an isolated Python environment to keep TensorFlow dependencies separate from system packages.
39+
40+
### Upgrade pip
41+
Upgrade pip to the latest version for smooth and reliable package installation.
42+
43+
```console
44+
pip install --upgrade pip
45+
```
46+
47+
### Install TensorFlow
48+
Install the latest stable TensorFlow version for Arm64:
49+
50+
```console
51+
pip install tensorflow==2.20.0
52+
```
53+
54+
{{% notice Note %}}
55+
TensorFlow 2.18.0 introduced compatibility with NumPy 2.0, incorporating its updated type promotion rules and improved numerical precision.
56+
You can view [this release note](https://blog.tensorflow.org/2024/10/whats-new-in-tensorflow-218.html)
57+
58+
The [Arm Ecosystem Dashboard](https://developer.arm.com/ecosystem-dashboard/) recommends Tensorflow version 2.18.0, the minimum recommended on the Arm platforms.
59+
{{% /notice %}}
60+
61+
### Verify installation:
62+
Run a quick Python command to check that TensorFlow was installed successfully and print the installed version number for confirmation.
63+
64+
```console
65+
python -c "import tensorflow as tf; print(tf.__version__)"
66+
```
67+
68+
You should see an output similar to:
69+
```output
70+
2.20.0
71+
```
72+
TensorFlow installation is complete. You can now go ahead with the baseline testing of TensorFlow in the next section.

0 commit comments

Comments
 (0)