Deploy TensorFlow on Google Cloud C4A (Arm-based Axion VMs)

odidev · odidev · commit 273e4c24bfc4 · 2025-11-13T04:44:11.000Z
Signed-off-by: odidev &lt;odidev@puresoftware.com&gt;
diff --git a/content/learning-paths/servers-and-cloud-computing/tensorflow-gcp/_index.md b/content/learning-paths/servers-and-cloud-computing/tensorflow-gcp/_index.md
@@ -0,0 +1,58 @@
+---
+title: Deploy TensorFlow on Google Cloud C4A (Arm-based Axion VMs)
+
+minutes_to_complete: 30
+
+who_is_this_for: This learning path is intended for software developers deploying and optimizing TensorFlow workloads on Linux/Arm64 environments, specifically using Google Cloud C4A virtual machines powered by Axion processors. 
+
+learning_objectives:
+  - Provision an Arm-based SUSE SLES virtual machine on Google Cloud (C4A with Axion processors)
+  - Install TensorFlow on a SUSE Arm64 (C4A) instance
+  - Verify TensorFlow by running basic computation and model training tests on Arm64  
+  - Benchmark TensorFlow using tf.keras to evaluate inference speed and model performance on Arm64 systems.
+
+prerequisites:
+  - A [Google Cloud Platform (GCP)](https://cloud.google.com/free) account with billing enabled  
+  - Basic familiarity with [TensorFlow](https://www.tensorflow.org/)
+
+author: Pareena Verma
+
+##### Tags
+skilllevels: Introductory
+subjects: ML
+cloud_service_providers: Google Cloud
+
+armips:
+  - Neoverse
+
+tools_software_languages:
+  - TensorFlow
+  - Python
+  - tf.keras
+
+operatingsystems:
+  - Linux
+
+# ================================================================================
+#       FIXED, DO NOT MODIFY
+# ================================================================================
+further_reading:
+  - resource:
+      title: Google Cloud documentation
+      link: https://cloud.google.com/docs
+      type: documentation
+
+  - resource:
+      title: TensorFlow documentation
+      link: https://www.tensorflow.org/learn
+      type: documentation
+  
+  - resource:
+      title: Phoronix Test Suite (PTS) documentation
+      link: https://www.phoronix-test-suite.com/
+      type: documentation   
+
+weight: 1
+layout: "learningpathall"
+learning_path_main_page: "yes"
+---
diff --git a/content/learning-paths/servers-and-cloud-computing/tensorflow-gcp/_next-steps.md b/content/learning-paths/servers-and-cloud-computing/tensorflow-gcp/_next-steps.md
@@ -0,0 +1,8 @@
+---
+# ================================================================================
+#       FIXED, DO NOT MODIFY THIS FILE
+# ================================================================================
+weight: 21                  # Set to always be larger than the content in this path to be at the end of the navigation.
+title: "Next Steps"         # Always the same, html page title.
+layout: "learningpathall"   # All files under learning paths have this same wrapper for Hugo processing.
+---
diff --git a/content/learning-paths/servers-and-cloud-computing/tensorflow-gcp/background.md b/content/learning-paths/servers-and-cloud-computing/tensorflow-gcp/background.md
@@ -0,0 +1,24 @@
+---
+title: Getting started with TensorFlow on Google Axion C4A (Arm Neoverse-V2)
+
+weight: 2
+
+layout: "learningpathall"
+---
+
+## Google Axion C4A Arm instances in Google Cloud
+
+Google Axion C4A is a family of Arm-based virtual machines built on Google’s custom Axion CPU, which is based on Arm Neoverse-V2 cores. Designed for high-performance and energy-efficient computing, these virtual machines offer strong performance for modern cloud workloads such as CI/CD pipelines, microservices, media processing, and general-purpose applications.
+
+The C4A series provides a cost-effective alternative to x86 virtual machines while leveraging the scalability and performance benefits of the Arm architecture in Google Cloud.
+
+To learn more about Google Axion, refer to the [Introducing Google Axion Processors, our new Arm-based CPUs](https://cloud.google.com/blog/products/compute/introducing-googles-new-arm-based-cpu) blog.
+
+## TensorFlow 
+
+[TensorFlow](https://www.tensorflow.org/) is an **open-source machine learning and deep learning framework** developed by **Google**.  It helps developers and researchers **build, train, and deploy AI models** efficiently across **CPUs, GPUs, and TPUs**.
+
+With support for **neural networks**, **natural language processing (NLP)**, and **computer vision**, TensorFlow is widely used for **AI research and production**.  
+Its **flexibility** and **scalability** make it ideal for both **cloud** and **edge environments**.
+
+To learn more, visit the [official TensorFlow website](https://www.tensorflow.org/).
diff --git a/content/learning-paths/servers-and-cloud-computing/tensorflow-gcp/baseline.md b/content/learning-paths/servers-and-cloud-computing/tensorflow-gcp/baseline.md
@@ -0,0 +1,95 @@
+---
+title: TensorFlow Baseline Testing on Google Axion C4A Arm Virtual Machine
+weight: 5
+
+### FIXED, DO NOT MODIFY
+layout: learningpathall
+---
+
+## TensorFlow Baseline Testing on GCP SUSE VMs
+This section helps you check if TensorFlow is properly installed and working on your **Google Axion C4A Arm64 VM**. You will run small tests to confirm that your CPU can perform TensorFlow operations correctly.
+
+
+### Verify Installation
+This command checks if TensorFlow is installed correctly and prints its version number.
+
+```console
+python -c "import tensorflow as tf; print(tf.__version__)"
+```
+### List Available Devices
+This command shows which hardware devices TensorFlow can use — like CPU or GPU. On most VMs, you’ll see only CPU listed.
+
+```console
+python -c "import tensorflow as tf; print(tf.config.list_physical_devices())"
+```
+
+You should see an output similar to:
+```output
+[PhysicalDevice(name='/physical_device:CPU:0', device_type='CPU')]
+```
+
+### Run a Simple Computation
+This test multiplies two large matrices to check that TensorFlow computations work correctly on your CPU and measures how long it takes.
+
+```python
+python -c "import tensorflow as tf; import time; 
+a = tf.random.uniform((1000,1000)); b = tf.random.uniform((1000,1000));
+start = time.time(); c = tf.matmul(a,b); end = time.time(); 
+print('Computation time:', end - start, 'seconds')"
+```
+- This checks **CPU speed** and the correctness of basic operations.
+- Note the **computation time** as your baseline.
+
+You should see an output similar to:
+```output
+Computation time: 0.008263111114501953 seconds
+```
+### Test Neural Network Execution
+Create a new file for testing a simple neural network:
+
+```console
+vi test_nn.py
+```
+This opens a new Python file where you’ll write a short TensorFlow test program.
+Paste the code below into the `test_nn.py` file:
+
+```python
+import tensorflow as tf
+from tensorflow.keras.models import Sequential
+from tensorflow.keras.layers import Dense
+import numpy as np
+
+# Dummy data
+x = np.random.rand(1000, 20)
+y = np.random.rand(1000, 1)
+
+# Define the model
+model = Sequential([
+    Dense(64, activation='relu', input_shape=(20,)),
+    Dense(1)
+])
+
+# Compile the model
+model.compile(optimizer='adam', loss='mse')
+
+# Train for 1 epoch
+model.fit(x, y, epochs=1, batch_size=32)
+```
+This script creates and trains a simple neural network using random data — just to make sure TensorFlow’s deep learning functions work properly.
+
+**Run the Script**
+
+Execute the script with Python:
+
+```console
+python test_nn.py
+```
+
+**Output**
+
+TensorFlow will print training progress, like:
+```output
+32/32 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - loss: 0.1024
+```
+
+This confirms that TensorFlow is working properly on your Arm64 VM.
diff --git a/content/learning-paths/servers-and-cloud-computing/tensorflow-gcp/benchmarking.md b/content/learning-paths/servers-and-cloud-computing/tensorflow-gcp/benchmarking.md
@@ -0,0 +1,131 @@
+---
+title: TensorFlow Benchmarking
+weight: 6
+
+### FIXED, DO NOT MODIFY
+layout: learningpathall
+---
+
+
+## TensorFlow Benchmarking with tf.keras
+This guide benchmarks multiple TensorFlow models (ResNet50, MobileNetV2, and InceptionV3) using dummy input data. It measures average inference time and throughput for each model running on the CPU.
+
+`tf.keras` is **TensorFlow’s high-level API** for building, training, and benchmarking deep learning models. It provides access to **predefined architectures** such as **ResNet**, **MobileNet**, and **Inception**, making it easy to evaluate model performance on different hardware setups like **CPU**, **GPU**, or **TPU**.
+
+### Activate your TensorFlow virtual environment**
+This step enables your isolated Python environment (`tf-venv`) where TensorFlow is installed. It ensures that all TensorFlow-related packages and dependencies run in a clean, controlled setup without affecting system-wide Python installations.
+
+```console
+source ~/tf-venv/bin/activate
+python -c "import tensorflow as tf; print(tf.__version__)"
+```
+### Install required packages
+Here, you install TensorFlow 2.20.0 and NumPy, the core libraries needed for model creation, computation, and benchmarking. NumPy supports efficient numerical operations, while TensorFlow handles deep learning workloads.
+
+```console
+pip install tensorflow==2.20.0 numpy
+```
+
+### Create a Python file named tf_cpu_benchmark.py:
+This step creates a Python script (`tf_cpu_benchmark.py`) that will run TensorFlow model benchmarking tests.
+
+```console
+vi tf_cpu_benchmark.py
+```
+
+Paste the following code:
+```python
+import tensorflow as tf
+import time
+
+# List of models to benchmark
+models = {
+    "ResNet50": tf.keras.applications.ResNet50,
+    "MobileNetV2": tf.keras.applications.MobileNetV2,
+    "InceptionV3": tf.keras.applications.InceptionV3
+}
+
+batch_size = 32
+num_runs = 50
+
+for name, constructor in models.items():
+    print(f"\nBenchmarking {name}...")
+    # Create model without pretrained weights
+    model = constructor(weights=None, input_shape=(224,224,3))
+    # Generate dummy input
+    dummy_input = tf.random.uniform([batch_size, 224, 224, 3])
+    # Warm-up
+    _ = model(dummy_input)
+    # Benchmark
+    start = time.time()
+    for _ in range(num_runs):
+        _ = model(dummy_input)
+    end = time.time()
+    avg_time = (end - start) / num_runs
+    throughput = batch_size / avg_time
+    print(f"{name} average inference time per batch: {avg_time:.4f} seconds")
+    print(f"{name} throughput: {throughput:.2f} images/sec")
+```
+- **Import libraries** – Loads TensorFlow and `time` for model creation and timing.  
+- **Define models** – Lists three TensorFlow Keras models: **ResNet50**, **MobileNetV2**, and **InceptionV3**.  
+- **Set parameters** – Configures `batch_size = 32` and runs each model **50 times** for stable benchmarking.  
+- **Create model instances** – Initializes each model **without pretrained weights** for fair CPU testing.  
+- **Generate dummy input** – Creates random data shaped like real images **(224×224×3)** for inference.  
+- **Warm-up phase** – Runs one inference to **stabilize model graph and memory usage**.  
+- **Benchmark loop** – Measures total time for 50 runs and calculates **average inference time per batch**.  
+- **Compute throughput** – Calculates how many **images per second** the model can process.  
+- **Print results** – Displays **average inference time and throughput** for each model.  
+
+### Run the benchmark
+Execute the benchmarking script:
+
+```console
+python tf_cpu_benchmark.py
+```
+
+You should see an output similar to:
+```output
+Benchmarking ResNet50...
+ResNet50 average inference time per batch: 1.2051 seconds
+ResNet50 throughput: 26.55 images/sec
+
+Benchmarking MobileNetV2...
+MobileNetV2 average inference time per batch: 0.2909 seconds
+MobileNetV2 throughput: 110.02 images/sec
+
+Benchmarking InceptionV3...
+InceptionV3 average inference time per batch: 0.8971 seconds
+InceptionV3 throughput: 35.67 images/sec
+```
+
+### Benchmark Metrics Explanation
+
+- **Average Inference Time per Batch (seconds):** Measures how long it takes to process one batch of input data. Lower values indicate faster inference performance.
+- **Throughput (images/sec):** Indicates how many images the model can process per second. Higher throughput means better overall efficiency.
+- **Model Type:** Refers to the neural network architecture used for testing (e.g., ResNet50, MobileNetV2, InceptionV3). Each model has different computational complexity.
+
+### Benchmark summary on x86_64
+To compare the benchmark results, the following results were collected by running the same benchmark on a `x86 - c4-standard-4` (4 vCPUs, 15 GB Memory) x86_64 VM in GCP, running SUSE:
+
+| **Model**       | **Average Inference Time per Batch (seconds)** | **Throughput (images/sec)** |
+|------------------|-----------------------------------------------:|-----------------------------:|
+| **ResNet50**     | 1.3690                                         | 23.37                        |
+| **MobileNetV2**  | 0.4274                                         | 74.87                        |
+| **InceptionV3**  | 0.8799                                         | 36.37                        |
+
+### Benchmark summary on Arm64
+Results from the earlier run on the `c4a-standard-4` (4 vCPU, 16 GB memory) Arm64 VM in GCP (SUSE):
+
+| **Model**       | **Average Inference Time per Batch (seconds)** | **Throughput (images/sec)** |
+|------------------|-----------------------------------------------:|-----------------------------:|
+| **ResNet50**     | 1.2051                                         | 26.55                        |
+| **MobileNetV2**  | 0.2909                                         | 110.02                       |
+| **InceptionV3**  | 0.8971                                         | 35.67                        |
+
+### TensorFlow benchmarking comparison on Arm64 and x86_64
+
+- **Arm64 VMs show strong performance** for lightweight CNNs like **MobileNetV2**, achieving over **110 images/sec**, indicating excellent optimization for CPU-based inference. 
+- **Medium-depth models** like **InceptionV3** maintain a **balanced trade-off between accuracy and latency**, confirming consistent multi-core utilization on Arm.  
+- **Heavier architectures** such as **ResNet50** show expected longer inference times but still deliver **stable throughput**, reflecting good floating-point efficiency.  
+- Compared to **x86_64**, **Arm64 provides energy-efficient yet competitive performance**, particularly for **mobile, quantized, or edge AI workloads**.  
+- **Overall**, Arm64 demonstrates that **TensorFlow workloads can run efficiently on cloud-native ARM processors**, making them a **cost-effective and power-efficient alternative** for AI inference and model prototyping.
diff --git a/content/learning-paths/servers-and-cloud-computing/tensorflow-gcp/images/gcp-vm.png b/content/learning-paths/servers-and-cloud-computing/tensorflow-gcp/images/gcp-vm.png
diff --git a/content/learning-paths/servers-and-cloud-computing/tensorflow-gcp/installation.md b/content/learning-paths/servers-and-cloud-computing/tensorflow-gcp/installation.md
@@ -0,0 +1,72 @@
+---
+title: Install TensorFlow 
+weight: 4
+
+### FIXED, DO NOT MODIFY
+layout: learningpathall
+---
+
+## TensorFlow Installation on GCP SUSE VM
+TensorFlow is a widely used **open-source machine learning library** developed by Google, designed for building and deploying ML models efficiently. On Arm64 SUSE VMs, TensorFlow can run on CPU natively, or on GPU if available.
+
+### System Preparation
+Update the system and install Python3 and pip:
+
+```console
+sudo zypper refresh
+sudo zypper update -y
+sudo zypper install -y python3 python3-pip python3-venv
+```
+This ensures your system is up-to-date and installs Python with the essential tools required for TensorFlow setup.
+
+**Verify Python version:**
+
+Confirm that Python and pip are correctly installed and identify their versions to ensure compatibility with TensorFlow requirements.
+
+```console
+python3 --version
+pip3 --version
+```
+
+### Create a Virtual Environment (Recommended)
+Set up an isolated Python environment (`tf-venv`) so that TensorFlow and its dependencies don’t interfere with system-wide packages or other projects.
+
+```console
+python3 -m venv tf-venv
+source tf-venv/bin/activate
+```
+Create and activate an isolated Python environment to keep TensorFlow dependencies separate from system packages.
+
+### Upgrade pip
+Upgrade pip to the latest version for smooth and reliable package installation.
+
+```console
+pip install --upgrade pip
+```
+
+### Install TensorFlow
+Install the latest stable TensorFlow version for Arm64:
+
+```console
+pip install tensorflow==2.20.0
+```
+
+{{% notice Note %}}
+TensorFlow 2.18.0 introduced compatibility with NumPy 2.0, incorporating its updated type promotion rules and improved numerical precision.
+You can view [this release note](https://blog.tensorflow.org/2024/10/whats-new-in-tensorflow-218.html)
+
+The [Arm Ecosystem Dashboard](https://developer.arm.com/ecosystem-dashboard/) recommends Tensorflow version 2.18.0, the minimum recommended on the Arm platforms.
+{{% /notice %}}
+
+### Verify installation:
+Run a quick Python command to check that TensorFlow was installed successfully and print the installed version number for confirmation.
+
+```console
+python -c "import tensorflow as tf; print(tf.__version__)"
+```
+
+You should see an output similar to:
+```output
+2.20.0
+```
+TensorFlow installation is complete. You can now go ahead with the baseline testing of TensorFlow in the next section.
diff --git a/content/learning-paths/servers-and-cloud-computing/tensorflow-gcp/instance.md b/content/learning-paths/servers-and-cloud-computing/tensorflow-gcp/instance.md