Content dev

madeline-underwood · madeline-underwood · commit 11afffdf4372 · 2025-09-04T14:16:49.000Z
diff --git a/content/learning-paths/servers-and-cloud-computing/tune-network-workloads-on-bare-metal/1_setup.md b/content/learning-paths/servers-and-cloud-computing/tune-network-workloads-on-bare-metal/1_setup.md
@@ -8,7 +8,7 @@ layout: learningpathall
 
 ## Overview
 
-Tomcat is a common client–server web workload that serves HTTP/HTTPS requests. In this section, you will set up a benchmarking environment using **Apache Tomcat** (server) and **wrk2** (client) to generate load and measure performance on an Arm-based bare‑metal instance. This guide was validated on an AWS **c8g.metal‑48xl** running Ubuntu 24.04.
+Tomcat is a common client–server web workload that serves HTTP/HTTPS requests. In this section, you will set up a benchmarking environment using Apache Tomcat (server) and `wrk2` (client) to generate load and measure performance on an Arm-based bare‑metal instance. This guide was validated on an AWS `c8g.metal‑48xl` instance running Ubuntu 24.04.
 
 ## Set up the Tomcat benchmark server
 
@@ -36,7 +36,7 @@ Alternatively, you can build Tomcat [from source](https://github.com/apache/tomc
 
 ## Enable access to Tomcat examples
 
-To access the built‑in examples from your local network or external IP, modify the `context.xml` file and update the `RemoteAddrValve` to allow your clients.
+To access the built‑in examples from your local network or external IP, modify the `context.xml` file and update `RemoteAddrValve` to allow your clients.
 
 The file is located at:
 
diff --git a/content/learning-paths/servers-and-cloud-computing/tune-network-workloads-on-bare-metal/2_baseline.md b/content/learning-paths/servers-and-cloud-computing/tune-network-workloads-on-bare-metal/2_baseline.md
@@ -26,7 +26,7 @@ This baseline includes:
 - Disabling access logging
 - Setting optimal thread counts
 
-### Align IOMMU settings with Ubuntu defaults
+## Align IOMMU settings with Ubuntu defaults
 
 {{% notice Note %}}
 If you are using a cloud image (for example, AWS) with non-default kernel parameters, align IOMMU settings with the Ubuntu defaults: `iommu.strict=1` and `iommu.passthrough=0`.
@@ -60,7 +60,7 @@ You should see that under the default configuration, `iommu.strict` is enabled,
 ...
 ```
 
-### Establish a baseline on Arm Neoverse bare-metal instances
+## Establish a baseline on Arm Neoverse bare-metal instances
 
 {{% notice Note %}}
 To mirror a typical Tomcat deployment and simplify tuning, keep **8 CPU cores online** and set the remaining cores offline. Adjust the CPU range to match your instance. The example below assumes 192 CPUs (as on AWS `c8g.metal-48xl`).
@@ -115,7 +115,7 @@ To mirror a typical Tomcat deployment and simplify tuning, keep **8 CPU cores on
     Transfer/sec:    129.90MB
     ```
 
-### Disable access logging
+## Disable access logging
 
 Disabling access logs removes I/O overhead during benchmarking.
 
@@ -157,7 +157,7 @@ Disabling access logs removes I/O overhead during benchmarking.
     Transfer/sec:    144.36MB
     ```
 
-### Set optimal thread counts
+## Set optimal thread counts
 
 To minimize contention and context switching, align Tomcat’s CPU‑intensive thread count with available CPU cores.
 
diff --git a/content/learning-paths/servers-and-cloud-computing/tune-network-workloads-on-bare-metal/4_local-numa.md b/content/learning-paths/servers-and-cloud-computing/tune-network-workloads-on-bare-metal/4_local-numa.md
@@ -1,5 +1,5 @@
 ---
-title: NUMA-based Tuning
+title: NUMA-based tuning
 weight: 5
 
 ### FIXED, DO NOT MODIFY
@@ -10,7 +10,7 @@ layout: learningpathall
 
 In this section, you configure local NUMA and assess the performance uplift achieved through tuning. Cross‑NUMA data transfers generally incur higher latency than intra‑NUMA transfers, so Tomcat should be deployed on the NUMA node where the network interface resides to reduce cross‑node memory traffic and improve throughput and latency.
 
-### Configure local NUMA
+## Configure local NUMA
 
 Check NUMA topology and relative latencies:
 
@@ -78,7 +78,7 @@ NUMA:
 ...
 ```
 
-### Validate performance after NUMA tuning
+## Validate performance after NUMA tuning
 
 Restart Tomcat on the Arm Neoverse bare‑metal instance:
 
diff --git a/content/learning-paths/servers-and-cloud-computing/tune-network-workloads-on-bare-metal/5_iommu.md b/content/learning-paths/servers-and-cloud-computing/tune-network-workloads-on-bare-metal/5_iommu.md
@@ -6,28 +6,30 @@ weight: 6
 layout: learningpathall
 ---
 
-## IOMMU-based tuning
-IOMMU (Input-Output Memory Management Unit) is a hardware feature that manages how I/O devices access memory. 
-In cloud environments, SmartNICs are typically used to offload the IOMMU workload. On bare-metal systems, to align performance with the cloud, you should disable `iommu.strict` and enable `iommu.passthrough` settings to achieve better performance.
+## Tune with IOMMU
 
-### Setting IOMMU
+IOMMU (Input–Output Memory Management Unit) controls how I/O devices access memory. In many cloud environments, SmartNICs offload IOMMU-related work. On Arm Neoverse bare‑metal systems, you can often improve Tomcat networking performance by **disabling strict mode** and **enabling passthrough** (setting `iommu.strict=0` and `iommu.passthrough=1`).
 
-1. To configure the IOMMU setting, use a text editor to modify the `grub` file by adding or updating the `GRUB_CMDLINE_LINUX` configuration.
+## Configure IOMMU settings
+
+Edit the GRUB configuration to set IOMMU to passthrough and disable strict invalidations:
 
 ```bash
 sudo vi /etc/default/grub
 ```
-then add or update:
+Add or update the kernel command line:
 ```bash
 GRUB_CMDLINE_LINUX="iommu.strict=0 iommu.passthrough=1"
 ```
 
-2. Update GRUB and reboot to apply the settings.
+Update GRUB and reboot to apply the settings:
+
 ```bash
 sudo update-grub && sudo reboot
 ```
 
-3. Verify if the settings have been successfully applied:
+Verify that IOMMU is in passthrough mode after reboot:
+
 ```bash
 sudo dmesg | grep iommu
 ```
@@ -38,24 +40,30 @@ You will notice that the IOMMU is already in passthrough mode:
 [    0.855658] iommu: Default domain type: Passthrough (set via kernel command line)
 ```
 
-### The result after configuring IOMMU
+## Validate performance after IOMMU tuning
+
+Prepare the Arm Neoverse bare‑metal server (ensure your `${net}` interface variable is set; if not, set it to your NIC name, for example `net=enP11p4s0`), align queues, and restart Tomcat:
 
-1. Run the following command on the Arm Neoverse bare-metal where `Tomcat` is on:
 ```bash
 for no in {96..103}; do sudo bash -c "echo 1 > /sys/devices/system/cpu/cpu${no}/online"; done
 for no in {0..95} {104..191}; do sudo bash -c "echo 0 > /sys/devices/system/cpu/cpu${no}/online"; done
-net=$(ls /sys/class/net/ | grep 'en')
+
+# Ensure NIC queue count matches the number of online CPUs (example: 8)
 sudo ethtool -L ${net} combined 8
+
+# Restart Tomcat with a higher file‑descriptor limit
 ~/apache-tomcat-11.0.10/bin/shutdown.sh 2>/dev/null
 ulimit -n 65535 && ~/apache-tomcat-11.0.10/bin/startup.sh
 ```
 
-2. Run run `wrk2` on the `x86_64` bare-metal instance as shown:
+Run `wrk2` on the `x86_64` benchmarking client to measure throughput and latency:
+
 ```bash
 ulimit -n 65535 && wrk -c1280 -t128 -R500000 -d60 http://${tomcat_ip}:8080/examples/servlets/servlet/HelloWorldExample
 ```
 
-The result after iommu tuning should look like:
+Sample results after IOMMU tuning:
+
 ```output
   Thread Stats   Avg      Stdev     Max   +/- Stdev
     Latency     4.92s     2.49s   10.08s    62.27%
diff --git a/content/learning-paths/servers-and-cloud-computing/tune-network-workloads-on-bare-metal/6_summary.md b/content/learning-paths/servers-and-cloud-computing/tune-network-workloads-on-bare-metal/6_summary.md
@@ -6,15 +6,26 @@ weight: 7
 layout: learningpathall
 ---
 
-## Summary
-You will observe that each tuning method can bring significant performance improvements while running Tomcat as shown in the results summary below:
+## Review the results: Tomcat performance tuning on Arm Neoverse
 
-| Method          | Requests/sec | Latency-Avg |
-|:----------------|:-------------|:------------|
-| Baseline         | 357835.75    | 10.26s      |
-| NIC-Queue       | 378782.37    | 8.35s       |
-| NUMA-Local      | 363744.39    | 9.41s       |
-| IOMMU           | 428628.50    | 4.92s       |
+Each tuning technique delivered measurable gains for the Tomcat HTTP benchmark on an Arm Neoverse bare‑metal server (workload generated with **wrk2**). The table summarizes requests per second and average latency at each stage.
 
+| Method       | Requests/sec | Avg latency (s) |
+|:-------------|-------------:|----------------:|
+| Baseline     | 357,835.75   | 10.26           |
+| NIC queues   | 378,782.37   | 8.35            |
+| NUMA-local   | 363,744.39   | 9.41            |
+| IOMMU        | 428,628.50   | 4.92            |
 
-The same tuning methods can be applied as general guidance to help optimize and tune other network-based workloads.
+### Key takeaways
+
+- **IOMMU passthrough** produced the largest throughput gain: **+19.8%** vs. baseline, with a **52.0%** drop in average latency.
+- **NIC queue count alignment** improved throughput by **+5.9%** and reduced average latency by **18.6%**.
+- **NUMA locality** yielded a smaller but consistent benefit: **+1.7%** throughput and **8.3%** lower average latency.
+- Together, these techniques (IOMMU tuning, NIC queue optimization, and NUMA-aware placement) form a practical checklist for improving network workload performance on Arm Neoverse.
+
+### Next steps
+
+- Apply the same tuning pattern to other HTTP services and microservices (for example, NGINX, Envoy, or custom Jetty/Tomcat apps).
+- Re‑evaluate queue counts, CPU pinning, and IOMMU mode as you scale cores, update kernels, or change NIC drivers/firmware.
+- Track end‑to‑end SLOs (p95/p99 latency and error rates) in addition to average metrics to ensure sustained gains under real traffic.
diff --git a/content/learning-paths/servers-and-cloud-computing/tune-network-workloads-on-bare-metal/_index.md b/content/learning-paths/servers-and-cloud-computing/tune-network-workloads-on-bare-metal/_index.md
@@ -14,7 +14,7 @@ learning_objectives:
 
 prerequisites:
     - An Arm Neoverse-based bare-metal server running Ubuntu 24.04 to run Apache Tomcat (this Learning Path was tested with an AWS c8g.metal-48xl instance)
-    - Access to an x86_64 bare-metal server running Ubuntu 24.04 to run wrk2
+    - Access to an x86_64 bare-metal server running Ubuntu 24.04 to run `wrk2`
     - Basic familiarity with Java applications
 
 author: Ying Yu, Ker Liu, Rui Chang