Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
---
title: Deploy TensorFlow on Google Cloud C4A (Arm-based Axion VMs)

minutes_to_complete: 30

who_is_this_for: This learning path is intended for software developers deploying and optimizing TensorFlow workloads on Linux/Arm64 environments, specifically using Google Cloud C4A virtual machines powered by Axion processors.

learning_objectives:
- Provision an Arm-based SUSE SLES virtual machine on Google Cloud (C4A with Axion processors)
- Install TensorFlow on a SUSE Arm64 (C4A) instance
- Verify TensorFlow by running basic computation and model training tests on Arm64
- Benchmark TensorFlow using tf.keras to evaluate inference speed and model performance on Arm64 systems.

prerequisites:
- A [Google Cloud Platform (GCP)](https://cloud.google.com/free) account with billing enabled
- Basic familiarity with [TensorFlow](https://www.tensorflow.org/)

author: Pareena Verma

##### Tags
skilllevels: Introductory
subjects: ML
cloud_service_providers: Google Cloud

armips:
- Neoverse

tools_software_languages:
- TensorFlow
- Python
- tf.keras

operatingsystems:
- Linux

# ================================================================================
# FIXED, DO NOT MODIFY
# ================================================================================
further_reading:
- resource:
title: Google Cloud documentation
link: https://cloud.google.com/docs
type: documentation

- resource:
title: TensorFlow documentation
link: https://www.tensorflow.org/learn
type: documentation

- resource:
title: Phoronix Test Suite (PTS) documentation
link: https://www.phoronix-test-suite.com/
type: documentation

weight: 1
layout: "learningpathall"
learning_path_main_page: "yes"
---
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
---
# ================================================================================
# FIXED, DO NOT MODIFY THIS FILE
# ================================================================================
weight: 21 # Set to always be larger than the content in this path to be at the end of the navigation.
title: "Next Steps" # Always the same, html page title.
layout: "learningpathall" # All files under learning paths have this same wrapper for Hugo processing.
---
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
---
title: Getting started with TensorFlow on Google Axion C4A (Arm Neoverse-V2)

weight: 2

layout: "learningpathall"
---

## Google Axion C4A Arm instances in Google Cloud

Google Axion C4A is a family of Arm-based virtual machines built on Google’s custom Axion CPU, which is based on Arm Neoverse-V2 cores. Designed for high-performance and energy-efficient computing, these virtual machines offer strong performance for modern cloud workloads such as CI/CD pipelines, microservices, media processing, and general-purpose applications.

The C4A series provides a cost-effective alternative to x86 virtual machines while leveraging the scalability and performance benefits of the Arm architecture in Google Cloud.

To learn more about Google Axion, refer to the [Introducing Google Axion Processors, our new Arm-based CPUs](https://cloud.google.com/blog/products/compute/introducing-googles-new-arm-based-cpu) blog.

## TensorFlow

[TensorFlow](https://www.tensorflow.org/) is an **open-source machine learning and deep learning framework** developed by **Google**. It helps developers and researchers **build, train, and deploy AI models** efficiently across **CPUs, GPUs, and TPUs**.

With support for **neural networks**, **natural language processing (NLP)**, and **computer vision**, TensorFlow is widely used for **AI research and production**.
Its **flexibility** and **scalability** make it ideal for both **cloud** and **edge environments**.

To learn more, visit the [official TensorFlow website](https://www.tensorflow.org/).
Original file line number Diff line number Diff line change
@@ -0,0 +1,95 @@
---
title: TensorFlow Baseline Testing on Google Axion C4A Arm Virtual Machine
weight: 5

### FIXED, DO NOT MODIFY
layout: learningpathall
---

## TensorFlow Baseline Testing on GCP SUSE VMs
This section helps you check if TensorFlow is properly installed and working on your **Google Axion C4A Arm64 VM**. You will run small tests to confirm that your CPU can perform TensorFlow operations correctly.


### Verify Installation
This command checks if TensorFlow is installed correctly and prints its version number.

```console
python -c "import tensorflow as tf; print(tf.__version__)"
```
### List Available Devices
This command shows which hardware devices TensorFlow can use — like CPU or GPU. On most VMs, you’ll see only CPU listed.

```console
python -c "import tensorflow as tf; print(tf.config.list_physical_devices())"
```

You should see an output similar to:
```output
[PhysicalDevice(name='/physical_device:CPU:0', device_type='CPU')]
```

### Run a Simple Computation
This test multiplies two large matrices to check that TensorFlow computations work correctly on your CPU and measures how long it takes.

```python
python -c "import tensorflow as tf; import time;
a = tf.random.uniform((1000,1000)); b = tf.random.uniform((1000,1000));
start = time.time(); c = tf.matmul(a,b); end = time.time();
print('Computation time:', end - start, 'seconds')"
```
- This checks **CPU speed** and the correctness of basic operations.
- Note the **computation time** as your baseline.

You should see an output similar to:
```output
Computation time: 0.008263111114501953 seconds
```
### Test Neural Network Execution
Create a new file for testing a simple neural network:

```console
vi test_nn.py
```
This opens a new Python file where you’ll write a short TensorFlow test program.
Paste the code below into the `test_nn.py` file:

```python
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
import numpy as np

# Dummy data
x = np.random.rand(1000, 20)
y = np.random.rand(1000, 1)

# Define the model
model = Sequential([
Dense(64, activation='relu', input_shape=(20,)),
Dense(1)
])

# Compile the model
model.compile(optimizer='adam', loss='mse')

# Train for 1 epoch
model.fit(x, y, epochs=1, batch_size=32)
```
This script creates and trains a simple neural network using random data — just to make sure TensorFlow’s deep learning functions work properly.

**Run the Script**

Execute the script with Python:

```console
python test_nn.py
```

**Output**

TensorFlow will print training progress, like:
```output
32/32 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - loss: 0.1024
```

This confirms that TensorFlow is working properly on your Arm64 VM.
Original file line number Diff line number Diff line change
@@ -0,0 +1,131 @@
---
title: TensorFlow Benchmarking
weight: 6

### FIXED, DO NOT MODIFY
layout: learningpathall
---


## TensorFlow Benchmarking with tf.keras
This guide benchmarks multiple TensorFlow models (ResNet50, MobileNetV2, and InceptionV3) using dummy input data. It measures average inference time and throughput for each model running on the CPU.

`tf.keras` is **TensorFlow’s high-level API** for building, training, and benchmarking deep learning models. It provides access to **predefined architectures** such as **ResNet**, **MobileNet**, and **Inception**, making it easy to evaluate model performance on different hardware setups like **CPU**, **GPU**, or **TPU**.

### Activate your TensorFlow virtual environment
This step enables your isolated Python environment (`tf-venv`) where TensorFlow is installed. It ensures that all TensorFlow-related packages and dependencies run in a clean, controlled setup without affecting system-wide Python installations.

```console
source ~/tf-venv/bin/activate
python -c "import tensorflow as tf; print(tf.__version__)"
```
### Install required packages
Here, you install TensorFlow 2.20.0 and NumPy, the core libraries needed for model creation, computation, and benchmarking. NumPy supports efficient numerical operations, while TensorFlow handles deep learning workloads.

```console
pip install tensorflow==2.20.0 numpy
```

### Create a Python file named tf_cpu_benchmark.py:
This step creates a Python script (`tf_cpu_benchmark.py`) that will run TensorFlow model benchmarking tests.

```console
vi tf_cpu_benchmark.py
```

Paste the following code:
```python
import tensorflow as tf
import time

# List of models to benchmark
models = {
"ResNet50": tf.keras.applications.ResNet50,
"MobileNetV2": tf.keras.applications.MobileNetV2,
"InceptionV3": tf.keras.applications.InceptionV3
}

batch_size = 32
num_runs = 50

for name, constructor in models.items():
print(f"\nBenchmarking {name}...")
# Create model without pretrained weights
model = constructor(weights=None, input_shape=(224,224,3))
# Generate dummy input
dummy_input = tf.random.uniform([batch_size, 224, 224, 3])
# Warm-up
_ = model(dummy_input)
# Benchmark
start = time.time()
for _ in range(num_runs):
_ = model(dummy_input)
end = time.time()
avg_time = (end - start) / num_runs
throughput = batch_size / avg_time
print(f"{name} average inference time per batch: {avg_time:.4f} seconds")
print(f"{name} throughput: {throughput:.2f} images/sec")
```
- **Import libraries** – Loads TensorFlow and `time` for model creation and timing.
- **Define models** – Lists three TensorFlow Keras models: **ResNet50**, **MobileNetV2**, and **InceptionV3**.
- **Set parameters** – Configures `batch_size = 32` and runs each model **50 times** for stable benchmarking.
- **Create model instances** – Initializes each model **without pretrained weights** for fair CPU testing.
- **Generate dummy input** – Creates random data shaped like real images **(224×224×3)** for inference.
- **Warm-up phase** – Runs one inference to **stabilize model graph and memory usage**.
- **Benchmark loop** – Measures total time for 50 runs and calculates **average inference time per batch**.
- **Compute throughput** – Calculates how many **images per second** the model can process.
- **Print results** – Displays **average inference time and throughput** for each model.

### Run the benchmark
Execute the benchmarking script:

```console
python tf_cpu_benchmark.py
```

You should see an output similar to:
```output
Benchmarking ResNet50...
ResNet50 average inference time per batch: 1.2051 seconds
ResNet50 throughput: 26.55 images/sec

Benchmarking MobileNetV2...
MobileNetV2 average inference time per batch: 0.2909 seconds
MobileNetV2 throughput: 110.02 images/sec

Benchmarking InceptionV3...
InceptionV3 average inference time per batch: 0.8971 seconds
InceptionV3 throughput: 35.67 images/sec
```

### Benchmark Metrics Explanation

- **Average Inference Time per Batch (seconds):** Measures how long it takes to process one batch of input data. Lower values indicate faster inference performance.
- **Throughput (images/sec):** Indicates how many images the model can process per second. Higher throughput means better overall efficiency.
- **Model Type:** Refers to the neural network architecture used for testing (e.g., ResNet50, MobileNetV2, InceptionV3). Each model has different computational complexity.

### Benchmark summary on x86_64
To compare the benchmark results, the following results were collected by running the same benchmark on a `x86 - c4-standard-4` (4 vCPUs, 15 GB Memory) x86_64 VM in GCP, running SUSE:

| **Model** | **Average Inference Time per Batch (seconds)** | **Throughput (images/sec)** |
|------------------|-----------------------------------------------:|-----------------------------:|
| **ResNet50** | 1.3690 | 23.37 |
| **MobileNetV2** | 0.4274 | 74.87 |
| **InceptionV3** | 0.8799 | 36.37 |

### Benchmark summary on Arm64
Results from the earlier run on the `c4a-standard-4` (4 vCPU, 16 GB memory) Arm64 VM in GCP (SUSE):

| **Model** | **Average Inference Time per Batch (seconds)** | **Throughput (images/sec)** |
|------------------|-----------------------------------------------:|-----------------------------:|
| **ResNet50** | 1.2051 | 26.55 |
| **MobileNetV2** | 0.2909 | 110.02 |
| **InceptionV3** | 0.8971 | 35.67 |

### TensorFlow benchmarking comparison on Arm64 and x86_64

- **Arm64 VMs show strong performance** for lightweight CNNs like **MobileNetV2**, achieving over **110 images/sec**, indicating excellent optimization for CPU-based inference.
- **Medium-depth models** like **InceptionV3** maintain a **balanced trade-off between accuracy and latency**, confirming consistent multi-core utilization on Arm.
- **Heavier architectures** such as **ResNet50** show expected longer inference times but still deliver **stable throughput**, reflecting good floating-point efficiency.
- Compared to **x86_64**, **Arm64 provides energy-efficient yet competitive performance**, particularly for **mobile, quantized, or edge AI workloads**.
- **Overall**, Arm64 demonstrates that **TensorFlow workloads can run efficiently on cloud-native ARM processors**, making them a **cost-effective and power-efficient alternative** for AI inference and model prototyping.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Original file line number Diff line number Diff line change
@@ -0,0 +1,72 @@
---
title: Install TensorFlow
weight: 4

### FIXED, DO NOT MODIFY
layout: learningpathall
---

## TensorFlow Installation on GCP SUSE VM
TensorFlow is a widely used **open-source machine learning library** developed by Google, designed for building and deploying ML models efficiently. On Arm64 SUSE VMs, TensorFlow can run on CPU natively, or on GPU if available.

### System Preparation
Update the system and install Python3 and pip:

```console
sudo zypper refresh
sudo zypper update -y
sudo zypper install -y python3 python3-pip python3-venv
```
This ensures your system is up-to-date and installs Python with the essential tools required for TensorFlow setup.

**Verify Python version:**

Confirm that Python and pip are correctly installed and identify their versions to ensure compatibility with TensorFlow requirements.

```console
python3 --version
pip3 --version
```

### Create a Virtual Environment (Recommended)
Set up an isolated Python environment (`tf-venv`) so that TensorFlow and its dependencies don’t interfere with system-wide packages or other projects.

```console
python3 -m venv tf-venv
source tf-venv/bin/activate
```
Create and activate an isolated Python environment to keep TensorFlow dependencies separate from system packages.

### Upgrade pip
Upgrade pip to the latest version for smooth and reliable package installation.

```console
pip install --upgrade pip
```

### Install TensorFlow
Install the latest stable TensorFlow version for Arm64:

```console
pip install tensorflow==2.20.0
```

{{% notice Note %}}
TensorFlow 2.18.0 introduced compatibility with NumPy 2.0, incorporating its updated type promotion rules and improved numerical precision.
You can view [this release note](https://blog.tensorflow.org/2024/10/whats-new-in-tensorflow-218.html)

The [Arm Ecosystem Dashboard](https://developer.arm.com/ecosystem-dashboard/) recommends Tensorflow version 2.18.0, the minimum recommended on the Arm platforms.
{{% /notice %}}

### Verify installation:
Run a quick Python command to check that TensorFlow was installed successfully and print the installed version number for confirmation.

```console
python -c "import tensorflow as tf; print(tf.__version__)"
```

You should see an output similar to:
```output
2.20.0
```
TensorFlow installation is complete. You can now go ahead with the baseline testing of TensorFlow in the next section.
Loading