|
| 1 | +--- |
| 2 | +title: TensorFlow Benchmarking |
| 3 | +weight: 6 |
| 4 | + |
| 5 | +### FIXED, DO NOT MODIFY |
| 6 | +layout: learningpathall |
| 7 | +--- |
| 8 | + |
| 9 | + |
| 10 | +## TensorFlow Benchmarking with tf.keras |
| 11 | +This guide benchmarks multiple TensorFlow models (ResNet50, MobileNetV2, and InceptionV3) using dummy input data. It measures average inference time and throughput for each model running on the CPU. |
| 12 | + |
| 13 | +`tf.keras` is **TensorFlow’s high-level API** for building, training, and benchmarking deep learning models. It provides access to **predefined architectures** such as **ResNet**, **MobileNet**, and **Inception**, making it easy to evaluate model performance on different hardware setups like **CPU**, **GPU**, or **TPU**. |
| 14 | + |
| 15 | +### Activate your TensorFlow virtual environment** |
| 16 | +This step enables your isolated Python environment (`tf-venv`) where TensorFlow is installed. It ensures that all TensorFlow-related packages and dependencies run in a clean, controlled setup without affecting system-wide Python installations. |
| 17 | + |
| 18 | +```console |
| 19 | +source ~/tf-venv/bin/activate |
| 20 | +python -c "import tensorflow as tf; print(tf.__version__)" |
| 21 | +``` |
| 22 | +### Install required packages |
| 23 | +Here, you install TensorFlow 2.20.0 and NumPy, the core libraries needed for model creation, computation, and benchmarking. NumPy supports efficient numerical operations, while TensorFlow handles deep learning workloads. |
| 24 | + |
| 25 | +```console |
| 26 | +pip install tensorflow==2.20.0 numpy |
| 27 | +``` |
| 28 | + |
| 29 | +### Create a Python file named tf_cpu_benchmark.py: |
| 30 | +This step creates a Python script (`tf_cpu_benchmark.py`) that will run TensorFlow model benchmarking tests. |
| 31 | + |
| 32 | +```console |
| 33 | +vi tf_cpu_benchmark.py |
| 34 | +``` |
| 35 | + |
| 36 | +Paste the following code: |
| 37 | +```python |
| 38 | +import tensorflow as tf |
| 39 | +import time |
| 40 | + |
| 41 | +# List of models to benchmark |
| 42 | +models = { |
| 43 | + "ResNet50": tf.keras.applications.ResNet50, |
| 44 | + "MobileNetV2": tf.keras.applications.MobileNetV2, |
| 45 | + "InceptionV3": tf.keras.applications.InceptionV3 |
| 46 | +} |
| 47 | + |
| 48 | +batch_size = 32 |
| 49 | +num_runs = 50 |
| 50 | + |
| 51 | +for name, constructor in models.items(): |
| 52 | + print(f"\nBenchmarking {name}...") |
| 53 | + # Create model without pretrained weights |
| 54 | + model = constructor(weights=None, input_shape=(224,224,3)) |
| 55 | + # Generate dummy input |
| 56 | + dummy_input = tf.random.uniform([batch_size, 224, 224, 3]) |
| 57 | + # Warm-up |
| 58 | + _ = model(dummy_input) |
| 59 | + # Benchmark |
| 60 | + start = time.time() |
| 61 | + for _ in range(num_runs): |
| 62 | + _ = model(dummy_input) |
| 63 | + end = time.time() |
| 64 | + avg_time = (end - start) / num_runs |
| 65 | + throughput = batch_size / avg_time |
| 66 | + print(f"{name} average inference time per batch: {avg_time:.4f} seconds") |
| 67 | + print(f"{name} throughput: {throughput:.2f} images/sec") |
| 68 | +``` |
| 69 | +- **Import libraries** – Loads TensorFlow and `time` for model creation and timing. |
| 70 | +- **Define models** – Lists three TensorFlow Keras models: **ResNet50**, **MobileNetV2**, and **InceptionV3**. |
| 71 | +- **Set parameters** – Configures `batch_size = 32` and runs each model **50 times** for stable benchmarking. |
| 72 | +- **Create model instances** – Initializes each model **without pretrained weights** for fair CPU testing. |
| 73 | +- **Generate dummy input** – Creates random data shaped like real images **(224×224×3)** for inference. |
| 74 | +- **Warm-up phase** – Runs one inference to **stabilize model graph and memory usage**. |
| 75 | +- **Benchmark loop** – Measures total time for 50 runs and calculates **average inference time per batch**. |
| 76 | +- **Compute throughput** – Calculates how many **images per second** the model can process. |
| 77 | +- **Print results** – Displays **average inference time and throughput** for each model. |
| 78 | + |
| 79 | +### Run the benchmark |
| 80 | +Execute the benchmarking script: |
| 81 | + |
| 82 | +```console |
| 83 | +python tf_cpu_benchmark.py |
| 84 | +``` |
| 85 | + |
| 86 | +You should see an output similar to: |
| 87 | +```output |
| 88 | +Benchmarking ResNet50... |
| 89 | +ResNet50 average inference time per batch: 1.2051 seconds |
| 90 | +ResNet50 throughput: 26.55 images/sec |
| 91 | +
|
| 92 | +Benchmarking MobileNetV2... |
| 93 | +MobileNetV2 average inference time per batch: 0.2909 seconds |
| 94 | +MobileNetV2 throughput: 110.02 images/sec |
| 95 | +
|
| 96 | +Benchmarking InceptionV3... |
| 97 | +InceptionV3 average inference time per batch: 0.8971 seconds |
| 98 | +InceptionV3 throughput: 35.67 images/sec |
| 99 | +``` |
| 100 | + |
| 101 | +### Benchmark Metrics Explanation |
| 102 | + |
| 103 | +- **Average Inference Time per Batch (seconds):** Measures how long it takes to process one batch of input data. Lower values indicate faster inference performance. |
| 104 | +- **Throughput (images/sec):** Indicates how many images the model can process per second. Higher throughput means better overall efficiency. |
| 105 | +- **Model Type:** Refers to the neural network architecture used for testing (e.g., ResNet50, MobileNetV2, InceptionV3). Each model has different computational complexity. |
| 106 | + |
| 107 | +### Benchmark summary on x86_64 |
| 108 | +To compare the benchmark results, the following results were collected by running the same benchmark on a `x86 - c4-standard-4` (4 vCPUs, 15 GB Memory) x86_64 VM in GCP, running SUSE: |
| 109 | + |
| 110 | +| **Model** | **Average Inference Time per Batch (seconds)** | **Throughput (images/sec)** | |
| 111 | +|------------------|-----------------------------------------------:|-----------------------------:| |
| 112 | +| **ResNet50** | 1.3690 | 23.37 | |
| 113 | +| **MobileNetV2** | 0.4274 | 74.87 | |
| 114 | +| **InceptionV3** | 0.8799 | 36.37 | |
| 115 | + |
| 116 | +### Benchmark summary on Arm64 |
| 117 | +Results from the earlier run on the `c4a-standard-4` (4 vCPU, 16 GB memory) Arm64 VM in GCP (SUSE): |
| 118 | + |
| 119 | +| **Model** | **Average Inference Time per Batch (seconds)** | **Throughput (images/sec)** | |
| 120 | +|------------------|-----------------------------------------------:|-----------------------------:| |
| 121 | +| **ResNet50** | 1.2051 | 26.55 | |
| 122 | +| **MobileNetV2** | 0.2909 | 110.02 | |
| 123 | +| **InceptionV3** | 0.8971 | 35.67 | |
| 124 | + |
| 125 | +### TensorFlow benchmarking comparison on Arm64 and x86_64 |
| 126 | + |
| 127 | +- **Arm64 VMs show strong performance** for lightweight CNNs like **MobileNetV2**, achieving over **110 images/sec**, indicating excellent optimization for CPU-based inference. |
| 128 | +- **Medium-depth models** like **InceptionV3** maintain a **balanced trade-off between accuracy and latency**, confirming consistent multi-core utilization on Arm. |
| 129 | +- **Heavier architectures** such as **ResNet50** show expected longer inference times but still deliver **stable throughput**, reflecting good floating-point efficiency. |
| 130 | +- Compared to **x86_64**, **Arm64 provides energy-efficient yet competitive performance**, particularly for **mobile, quantized, or edge AI workloads**. |
| 131 | +- **Overall**, Arm64 demonstrates that **TensorFlow workloads can run efficiently on cloud-native ARM processors**, making them a **cost-effective and power-efficient alternative** for AI inference and model prototyping. |
0 commit comments