Final tweaks

madeline-underwood · madeline-underwood · commit eac28129d40a · 2025-09-04T18:30:37.000Z
diff --git a/content/learning-paths/servers-and-cloud-computing/tune-network-workloads-on-bare-metal/1_setup.md b/content/learning-paths/servers-and-cloud-computing/tune-network-workloads-on-bare-metal/1_setup.md
@@ -8,7 +8,7 @@ layout: learningpathall
 
 ## Overview
 
-Tomcat is a common client–server web workload that serves HTTP/HTTPS requests. In this section, you will set up a benchmarking environment using Apache Tomcat (server) and `wrk2` (client) to generate load and measure performance on an Arm-based bare‑metal instance. This guide was validated on an AWS `c8g.metal‑48xl` instance running Ubuntu 24.04.
+Tomcat is a common client–server web workload that serves HTTP/HTTPS requests. In this section, you will set up a benchmarking environment using Apache Tomcat (server) and `wrk2` (client) to generate load and measure performance on an Arm-based bare‑metal instance. This Learning Path was validated on an AWS `c8g.metal‑48xl` instance running Ubuntu 24.04.
 
 ## Set up the Tomcat benchmark server
 
@@ -63,7 +63,7 @@ Allowing `.*` permits access from all IP addresses and should be used only in is
 ## Start the Tomcat server
 
 {{% notice Note %}}
-For maximum performance, ensure the per‑process limit for open file descriptors is high enough.
+For maximum performance, ensure the per‑process limit for open file descriptors is sufficient.
 {{% /notice %}}
 
 Start the server:
@@ -106,7 +106,7 @@ Ensure port **8080** is open in the security group or firewall for your Arm‑ba
 [Wrk2](https://github.com/giltene/wrk2) is a high-performance HTTP benchmarking tool specialized in generating constant throughput loads and measuring latency percentiles for web services. `wrk2` is an enhanced version of `wrk` that provides accurate latency statistics under controlled request rates, ideal for performance testing of HTTP servers.
 
 {{% notice Note %}}
-Currently, `wrk2` is only supported on **x86_64** machines. Run the client steps below on a bare‑metal x86_64 server running Ubuntu 24.04.
+Currently, `wrk2` is only supported on x86_64 machines. Run the client steps below on a bare‑metal x86_64 server running Ubuntu 24.04.
 {{% /notice %}}
 
 ## Install dependencies
@@ -140,7 +140,7 @@ sudo cp wrk /usr/local/bin
 As with Tomcat, set a high open‑files limit to avoid hitting FD caps during the run.
 {{% /notice %}}
 
-Benchmark the HelloWorld servlet running on Tomcat:
+Benchmark the `HelloWorld` servlet running on Tomcat:
 
 ```bash
 ulimit -n 65535 && wrk -c32 -t16 -R50000 -d60 http://${tomcat_ip}:8080/examples/servlets/servlet/HelloWorldExample
diff --git a/content/learning-paths/servers-and-cloud-computing/tune-network-workloads-on-bare-metal/2_baseline.md b/content/learning-paths/servers-and-cloud-computing/tune-network-workloads-on-bare-metal/2_baseline.md
@@ -11,7 +11,7 @@ layout: learningpathall
 In this section, you establish a baseline configuration before applying advanced techniques to tune the performance of Tomcat-based network workloads on an Arm Neoverse bare-metal instance.
 
 {{% notice Note %}}
-To avoid running out of file descriptors under load, raise the file‑descriptor limit on **both** the server and the client:
+To avoid running out of file descriptors under load, raise the file‑descriptor limit on *both* the server and the client:
 ```bash
 ulimit -n 65535
 ```
@@ -32,24 +32,24 @@ This baseline includes:
 If you are using a cloud image (for example, AWS) with non-default kernel parameters, align IOMMU settings with the Ubuntu defaults: `iommu.strict=1` and `iommu.passthrough=0`.
 {{% /notice %}}
 
-1. Edit GRUB and add (or update) `GRUB_CMDLINE_LINUX`:
+Edit GRUB and add (or update) `GRUB_CMDLINE_LINUX`:
 
-    ```bash
+```bash
     sudo vi /etc/default/grub
-    ```
+```
 
-    Add or update the line to include:
-    ```bash
+Add or update the line to include:
+```bash
     GRUB_CMDLINE_LINUX="iommu.strict=1 iommu.passthrough=0"
-    ```
+```
 
-2. Update GRUB and reboot to apply the settings:
+Update GRUB and reboot to apply the settings:
 
-    ```bash
+```bash
     sudo update-grub && sudo reboot
-    ```
+```
 
-3. Verify that the default settings have been successfully applied:
+Verify that the default settings have been successfully applied:
 ```bash
 sudo dmesg | grep iommu
 ```
@@ -63,23 +63,23 @@ You should see that under the default configuration, `iommu.strict` is enabled,
 ## Establish a baseline on Arm Neoverse bare-metal instances
 
 {{% notice Note %}}
-To mirror a typical Tomcat deployment and simplify tuning, keep **8 CPU cores online** and set the remaining cores offline. Adjust the CPU range to match your instance. The example below assumes 192 CPUs (as on AWS `c8g.metal-48xl`).
+To mirror a typical Tomcat deployment and simplify tuning, keep 8 CPU cores online and set the remaining cores offline. Adjust the CPU range to match your instance. The example below assumes 192 CPUs (as on AWS `c8g.metal-48xl`).
 {{% /notice %}}
 
-1. Set CPUs 8–191 offline:
+Set CPUs 8–191 offline:
 
-    ```bash
+```bash
     for no in {8..191}; do sudo bash -c "echo 0 > /sys/devices/system/cpu/cpu${no}/online"; done
-    ```
+```
 
-2. Confirm that CPUs `0–7` are online and the rest are offline:
+Confirm that CPUs `0–7` are online and the rest are offline:
 
-    ```bash
+```bash
     lscpu
-    ```
+```
 
-    Example output:
-    ```output
+Example output:
+```output
     Architecture:                aarch64
       CPU op-mode(s):            64-bit
       Byte Order:                Little Endian
@@ -89,86 +89,86 @@ To mirror a typical Tomcat deployment and simplify tuning, keep **8 CPU cores on
     Vendor ID:                   ARM
       Model name:                Neoverse-V2
     ...
-    ```
+```
 
-3. Restart Tomcat on the Arm instance:
+Restart Tomcat on the Arm instance:
 
-    ```bash
+```bash
     ~/apache-tomcat-11.0.10/bin/shutdown.sh 2>/dev/null
     ulimit -n 65535 && ~/apache-tomcat-11.0.10/bin/startup.sh
-    ```
+```
 
-4. From your `x86_64` benchmarking client, run `wrk2` (replace `<tomcat_ip>` with the server’s IP):
+From your `x86_64` benchmarking client, run `wrk2` (replace `<tomcat_ip>` with the server’s IP):
 
-    ```bash
+```bash
     ulimit -n 65535 && wrk -c1280 -t128 -R500000 -d60 http://<tomcat_ip>:8080/examples/servlets/servlet/HelloWorldExample
-    ```
+```
 
-    Example result:
-    ```output
+Example result:
+```output
       Thread Stats   Avg      Stdev     Max   +/- Stdev
         Latency    16.76s     6.59s   27.56s    56.98%
         Req/Sec     1.97k   165.05     2.33k    89.90%
       14680146 requests in 1.00m, 7.62GB read
       Socket errors: connect 1264, read 0, write 0, timeout 1748
     Requests/sec: 244449.62
     Transfer/sec:    129.90MB
-    ```
+```
 
 ## Disable access logging
 
 Disabling access logs removes I/O overhead during benchmarking.
 
-1. Edit `server.xml` and comment out (or remove) the **`org.apache.catalina.valves.AccessLogValve`** block:
+Edit `server.xml` and comment out (or remove) the **`org.apache.catalina.valves.AccessLogValve`** block:
 
-    ```bash
+```bash
     vi ~/apache-tomcat-11.0.10/conf/server.xml
-    ```
+```
 
-    ```xml
+```xml
     <!--
         <Valve className="org.apache.catalina.valves.AccessLogValve" directory="logs"
                 prefix="localhost_access_log" suffix=".txt"
                 pattern="%h %l %u %t &quot;%r&quot; %s %b" />
     -->
-    ```
+```
 
-2. Restart Tomcat:
+Restart Tomcat:
 
-    ```bash
+```bash
     ~/apache-tomcat-11.0.10/bin/shutdown.sh 2>/dev/null
     ulimit -n 65535 && ~/apache-tomcat-11.0.10/bin/startup.sh
-    ```
+```
 
-3. Re-run `wrk2`:
+Re-run `wrk2`:
 
-    ```bash
+```bash
     ulimit -n 65535 && wrk -c1280 -t128 -R500000 -d60 http://<tomcat_ip>:8080/examples/servlets/servlet/HelloWorldExample
-    ```
+```
 
-    Example result:
-    ```output
+Example result:
+```output
       Thread Stats   Avg      Stdev     Max   +/- Stdev
         Latency    16.16s     6.45s   28.26s    57.85%
         Req/Sec     2.16k     5.91     2.17k    77.50%
       16291136 requests in 1.00m, 8.45GB read
       Socket errors: connect 0, read 0, write 0, timeout 75
     Requests/sec: 271675.12
     Transfer/sec:    144.36MB
-    ```
+```
 
 ## Set optimal thread counts
 
 To minimize contention and context switching, align Tomcat’s CPU‑intensive thread count with available CPU cores.
 
-1. While `wrk2` is running, identify CPU‑intensive Tomcat threads:
+While `wrk2` is running, identify CPU‑intensive Tomcat threads:
 
-    ```bash
+```bash
     top -H -p "$(pgrep -n java)"
-    ```
+```
 
-    Example output:
-    ```output
+Example output:
+```output
     top - 08:57:29 up 20 min,  1 user,  load average: 4.17, 2.35, 1.22
     Threads: 231 total,   8 running, 223 sleeping,   0 stopped,   0 zombie
     %Cpu(s): 31.7 us, 20.2 sy,  0.0 ni, 31.0 id,  0.0 wa,  0.0 hi, 17.2 si,  0.0 st
@@ -204,24 +204,24 @@ To minimize contention and context switching, align Tomcat’s CPU‑intensive t
 ...
 ```
 
-    You’ll typically see **`http-nio-8080-e`** and **`http-nio-8080-P`** threads as CPU intensive. Because the **`http-nio-8080-P`** thread count is fixed at 1 (in current Tomcat releases), and you have 8 online CPU cores, set **`http-nio-8080-e`** to **7**.
+You’ll typically see `http-nio-8080-e` and `http-nio-8080-P` threads as CPU-intensive. Because the `http-nio-8080-P` thread count is fixed at 1 (in current Tomcat releases), and you have 8 online CPU cores, set `http-nio-8080-e` to 7.
 
-2. Edit `server.xml` and update the HTTP connector to set the worker thread counts and connection limits:
+Edit `server.xml` and update the HTTP connector to set the worker thread counts and connection limits:
 
-    ```bash
+```bash
     vi ~/apache-tomcat-11.0.10/conf/server.xml
-    ```
+```
 
-    Replace the existing connector:
-    ```xml
+Replace the existing connector:
+```xml
     <!-- Before -->
         <Connector port="8080" protocol="HTTP/1.1"
                    connectionTimeout="20000"
                    redirectPort="8443" />
-    ```
+```
 
-    With the tuned settings:
-    ```xml
+With the tuned settings:
+```xml
     <!-- After -->
         <Connector port="8080" protocol="HTTP/1.1"
                    connectionTimeout="20000"
@@ -231,25 +231,25 @@ To minimize contention and context switching, align Tomcat’s CPU‑intensive t
                    maxKeepAliveRequests="500000"
                    maxConnections="100000"
         />
-    ```
+```
 
-3. Restart Tomcat and re-run `wrk2`:
+Restart Tomcat and re-run `wrk2`:
 
-    ```bash
+```bash
     ~/apache-tomcat-11.0.10/bin/shutdown.sh 2>/dev/null
     ulimit -n 65535 && ~/apache-tomcat-11.0.10/bin/startup.sh
 
     ulimit -n 65535 && wrk -c1280 -t128 -R500000 -d60 http://<tomcat_ip>:8080/examples/servlets/servlet/HelloWorldExample
-    ```
+```
 
-    Example result:
-    ```output
+Example result:
+```output
       Thread Stats   Avg      Stdev     Max   +/- Stdev
         Latency    10.26s     4.55s   19.81s    62.51%
         Req/Sec     2.86k    89.49     3.51k    77.06%
       21458421 requests in 1.00m, 11.13GB read
     Requests/sec: 357835.75
     Transfer/sec:    190.08MB
-    ```
+```
 
 With a solid baseline in place, you’re ready to proceed to NIC queue tuning, NUMA locality optimization, and IOMMU exploration in the next sections.
diff --git a/content/learning-paths/servers-and-cloud-computing/tune-network-workloads-on-bare-metal/3_nic-queue.md b/content/learning-paths/servers-and-cloud-computing/tune-network-workloads-on-bare-metal/3_nic-queue.md
@@ -41,7 +41,7 @@ Use the following command to check the current transmit/receive queues of the ${
 ```bash
 sudo ethtool -l ${net}
 ```
-It can be observed that the number of transmit/receive queues for the ${net} network interface is currently 63.
+You can see that the number of transmit/receive queues for the ${net} network interface is currently 63:
 ```bash
 Channel parameters for enP11p4s0:
 Pre-set maximums:
diff --git a/content/learning-paths/servers-and-cloud-computing/tune-network-workloads-on-bare-metal/5_iommu.md b/content/learning-paths/servers-and-cloud-computing/tune-network-workloads-on-bare-metal/5_iommu.md
@@ -8,7 +8,7 @@ layout: learningpathall
 
 ## Tune with IOMMU
 
-IOMMU (Input–Output Memory Management Unit) controls how I/O devices access memory. In many cloud environments, SmartNICs offload IOMMU-related work. On Arm Neoverse bare‑metal systems, you can often improve Tomcat networking performance by **disabling strict mode** and **enabling passthrough** (setting `iommu.strict=0` and `iommu.passthrough=1`).
+IOMMU (Input–Output Memory Management Unit) controls how I/O devices access memory. In many cloud environments, SmartNICs offload IOMMU-related work. On Arm Neoverse bare‑metal systems, you can often improve Tomcat networking performance by disabling strict mode and enabling passthrough (setting `iommu.strict=0` and `iommu.passthrough=1`).
 
 ## Configure IOMMU settings
 
diff --git a/content/learning-paths/servers-and-cloud-computing/tune-network-workloads-on-bare-metal/_index.md b/content/learning-paths/servers-and-cloud-computing/tune-network-workloads-on-bare-metal/_index.md
@@ -6,14 +6,14 @@ minutes_to_complete: 60
 who_is_this_for: This is an advanced topic for engineers who want to tune the performance of network workloads on Arm Neoverse-based bare-metal instances.
 
 learning_objectives: 
-    - Set up a benchmarking environment using Apache Tomcat and wrk2 on an Arm Neoverse bare‑metal host
-    - Establish a reproducible baseline performance configuration (throughput and latency) before tuning
-    - Tune NIC multi‑queue, RSS/RPS/XPS, and IRQ affinity to increase throughput and stabilize latency
-    - Optimize NUMA locality by pinning Tomcat workers and interrupts to local CPUs and memory
-    - Evaluate IOMMU configuration options and select the setting that maximizes networking performance
+    - Set up Apache Tomcat and wrk2 to benchmark HTTP on an Arm Neoverse bare‑metal host
+    - Establish a reproducible baseline baseline (file‑descriptor limits, logging, thread counts, fixed core set)
+    - Tune NIC queue count to match available cores and measure impact
+    - Improve NUMA locality by placing Tomcat on the NIC’s NUMA node and aligning worker threads with cores
+    - Compare IOMMU strict mode and IOMMU passthrough mode, and select the configuration that delivers the best performance for your workload
 
 prerequisites:
-    - An Arm Neoverse-based bare-metal server running Ubuntu 24.04 to run Apache Tomcat (this Learning Path was tested with an AWS c8g.metal-48xl instance)
+    - An Arm Neoverse-based bare-metal server running Ubuntu 24.04 to run Apache Tomcat
     - Access to an x86_64 bare-metal server running Ubuntu 24.04 to run `wrk2`
     - Basic familiarity with Java applications