Skip to content

Commit eac2812

Browse files
Final tweaks
1 parent b608ba3 commit eac2812

File tree

5 files changed

+77
-77
lines changed

5 files changed

+77
-77
lines changed

content/learning-paths/servers-and-cloud-computing/tune-network-workloads-on-bare-metal/1_setup.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ layout: learningpathall
88

99
## Overview
1010

11-
Tomcat is a common client–server web workload that serves HTTP/HTTPS requests. In this section, you will set up a benchmarking environment using Apache Tomcat (server) and `wrk2` (client) to generate load and measure performance on an Arm-based bare‑metal instance. This guide was validated on an AWS `c8g.metal‑48xl` instance running Ubuntu 24.04.
11+
Tomcat is a common client–server web workload that serves HTTP/HTTPS requests. In this section, you will set up a benchmarking environment using Apache Tomcat (server) and `wrk2` (client) to generate load and measure performance on an Arm-based bare‑metal instance. This Learning Path was validated on an AWS `c8g.metal‑48xl` instance running Ubuntu 24.04.
1212

1313
## Set up the Tomcat benchmark server
1414

@@ -63,7 +63,7 @@ Allowing `.*` permits access from all IP addresses and should be used only in is
6363
## Start the Tomcat server
6464

6565
{{% notice Note %}}
66-
For maximum performance, ensure the per‑process limit for open file descriptors is high enough.
66+
For maximum performance, ensure the per‑process limit for open file descriptors is sufficient.
6767
{{% /notice %}}
6868

6969
Start the server:
@@ -106,7 +106,7 @@ Ensure port **8080** is open in the security group or firewall for your Arm‑ba
106106
[Wrk2](https://github.com/giltene/wrk2) is a high-performance HTTP benchmarking tool specialized in generating constant throughput loads and measuring latency percentiles for web services. `wrk2` is an enhanced version of `wrk` that provides accurate latency statistics under controlled request rates, ideal for performance testing of HTTP servers.
107107

108108
{{% notice Note %}}
109-
Currently, `wrk2` is only supported on **x86_64** machines. Run the client steps below on a bare‑metal x86_64 server running Ubuntu 24.04.
109+
Currently, `wrk2` is only supported on x86_64 machines. Run the client steps below on a bare‑metal x86_64 server running Ubuntu 24.04.
110110
{{% /notice %}}
111111

112112
## Install dependencies
@@ -140,7 +140,7 @@ sudo cp wrk /usr/local/bin
140140
As with Tomcat, set a high open‑files limit to avoid hitting FD caps during the run.
141141
{{% /notice %}}
142142

143-
Benchmark the HelloWorld servlet running on Tomcat:
143+
Benchmark the `HelloWorld` servlet running on Tomcat:
144144

145145
```bash
146146
ulimit -n 65535 && wrk -c32 -t16 -R50000 -d60 http://${tomcat_ip}:8080/examples/servlets/servlet/HelloWorldExample

content/learning-paths/servers-and-cloud-computing/tune-network-workloads-on-bare-metal/2_baseline.md

Lines changed: 65 additions & 65 deletions
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@ layout: learningpathall
1111
In this section, you establish a baseline configuration before applying advanced techniques to tune the performance of Tomcat-based network workloads on an Arm Neoverse bare-metal instance.
1212

1313
{{% notice Note %}}
14-
To avoid running out of file descriptors under load, raise the file‑descriptor limit on **both** the server and the client:
14+
To avoid running out of file descriptors under load, raise the file‑descriptor limit on *both* the server and the client:
1515
```bash
1616
ulimit -n 65535
1717
```
@@ -32,24 +32,24 @@ This baseline includes:
3232
If you are using a cloud image (for example, AWS) with non-default kernel parameters, align IOMMU settings with the Ubuntu defaults: `iommu.strict=1` and `iommu.passthrough=0`.
3333
{{% /notice %}}
3434

35-
1. Edit GRUB and add (or update) `GRUB_CMDLINE_LINUX`:
35+
Edit GRUB and add (or update) `GRUB_CMDLINE_LINUX`:
3636

37-
```bash
37+
```bash
3838
sudo vi /etc/default/grub
39-
```
39+
```
4040

41-
Add or update the line to include:
42-
```bash
41+
Add or update the line to include:
42+
```bash
4343
GRUB_CMDLINE_LINUX="iommu.strict=1 iommu.passthrough=0"
44-
```
44+
```
4545

46-
2. Update GRUB and reboot to apply the settings:
46+
Update GRUB and reboot to apply the settings:
4747

48-
```bash
48+
```bash
4949
sudo update-grub && sudo reboot
50-
```
50+
```
5151

52-
3. Verify that the default settings have been successfully applied:
52+
Verify that the default settings have been successfully applied:
5353
```bash
5454
sudo dmesg | grep iommu
5555
```
@@ -63,23 +63,23 @@ You should see that under the default configuration, `iommu.strict` is enabled,
6363
## Establish a baseline on Arm Neoverse bare-metal instances
6464

6565
{{% notice Note %}}
66-
To mirror a typical Tomcat deployment and simplify tuning, keep **8 CPU cores online** and set the remaining cores offline. Adjust the CPU range to match your instance. The example below assumes 192 CPUs (as on AWS `c8g.metal-48xl`).
66+
To mirror a typical Tomcat deployment and simplify tuning, keep 8 CPU cores online and set the remaining cores offline. Adjust the CPU range to match your instance. The example below assumes 192 CPUs (as on AWS `c8g.metal-48xl`).
6767
{{% /notice %}}
6868

69-
1. Set CPUs 8–191 offline:
69+
Set CPUs 8–191 offline:
7070

71-
```bash
71+
```bash
7272
for no in {8..191}; do sudo bash -c "echo 0 > /sys/devices/system/cpu/cpu${no}/online"; done
73-
```
73+
```
7474

75-
2. Confirm that CPUs `0–7` are online and the rest are offline:
75+
Confirm that CPUs `0–7` are online and the rest are offline:
7676

77-
```bash
77+
```bash
7878
lscpu
79-
```
79+
```
8080

81-
Example output:
82-
```output
81+
Example output:
82+
```output
8383
Architecture: aarch64
8484
CPU op-mode(s): 64-bit
8585
Byte Order: Little Endian
@@ -89,86 +89,86 @@ To mirror a typical Tomcat deployment and simplify tuning, keep **8 CPU cores on
8989
Vendor ID: ARM
9090
Model name: Neoverse-V2
9191
...
92-
```
92+
```
9393

94-
3. Restart Tomcat on the Arm instance:
94+
Restart Tomcat on the Arm instance:
9595

96-
```bash
96+
```bash
9797
~/apache-tomcat-11.0.10/bin/shutdown.sh 2>/dev/null
9898
ulimit -n 65535 && ~/apache-tomcat-11.0.10/bin/startup.sh
99-
```
99+
```
100100

101-
4. From your `x86_64` benchmarking client, run `wrk2` (replace `<tomcat_ip>` with the server’s IP):
101+
From your `x86_64` benchmarking client, run `wrk2` (replace `<tomcat_ip>` with the server’s IP):
102102

103-
```bash
103+
```bash
104104
ulimit -n 65535 && wrk -c1280 -t128 -R500000 -d60 http://<tomcat_ip>:8080/examples/servlets/servlet/HelloWorldExample
105-
```
105+
```
106106

107-
Example result:
108-
```output
107+
Example result:
108+
```output
109109
Thread Stats Avg Stdev Max +/- Stdev
110110
Latency 16.76s 6.59s 27.56s 56.98%
111111
Req/Sec 1.97k 165.05 2.33k 89.90%
112112
14680146 requests in 1.00m, 7.62GB read
113113
Socket errors: connect 1264, read 0, write 0, timeout 1748
114114
Requests/sec: 244449.62
115115
Transfer/sec: 129.90MB
116-
```
116+
```
117117

118118
## Disable access logging
119119

120120
Disabling access logs removes I/O overhead during benchmarking.
121121

122-
1. Edit `server.xml` and comment out (or remove) the **`org.apache.catalina.valves.AccessLogValve`** block:
122+
Edit `server.xml` and comment out (or remove) the **`org.apache.catalina.valves.AccessLogValve`** block:
123123

124-
```bash
124+
```bash
125125
vi ~/apache-tomcat-11.0.10/conf/server.xml
126-
```
126+
```
127127

128-
```xml
128+
```xml
129129
<!--
130130
<Valve className="org.apache.catalina.valves.AccessLogValve" directory="logs"
131131
prefix="localhost_access_log" suffix=".txt"
132132
pattern="%h %l %u %t &quot;%r&quot; %s %b" />
133133
-->
134-
```
134+
```
135135

136-
2. Restart Tomcat:
136+
Restart Tomcat:
137137

138-
```bash
138+
```bash
139139
~/apache-tomcat-11.0.10/bin/shutdown.sh 2>/dev/null
140140
ulimit -n 65535 && ~/apache-tomcat-11.0.10/bin/startup.sh
141-
```
141+
```
142142

143-
3. Re-run `wrk2`:
143+
Re-run `wrk2`:
144144

145-
```bash
145+
```bash
146146
ulimit -n 65535 && wrk -c1280 -t128 -R500000 -d60 http://<tomcat_ip>:8080/examples/servlets/servlet/HelloWorldExample
147-
```
147+
```
148148

149-
Example result:
150-
```output
149+
Example result:
150+
```output
151151
Thread Stats Avg Stdev Max +/- Stdev
152152
Latency 16.16s 6.45s 28.26s 57.85%
153153
Req/Sec 2.16k 5.91 2.17k 77.50%
154154
16291136 requests in 1.00m, 8.45GB read
155155
Socket errors: connect 0, read 0, write 0, timeout 75
156156
Requests/sec: 271675.12
157157
Transfer/sec: 144.36MB
158-
```
158+
```
159159

160160
## Set optimal thread counts
161161

162162
To minimize contention and context switching, align Tomcat’s CPU‑intensive thread count with available CPU cores.
163163

164-
1. While `wrk2` is running, identify CPU‑intensive Tomcat threads:
164+
While `wrk2` is running, identify CPU‑intensive Tomcat threads:
165165

166-
```bash
166+
```bash
167167
top -H -p "$(pgrep -n java)"
168-
```
168+
```
169169

170-
Example output:
171-
```output
170+
Example output:
171+
```output
172172
top - 08:57:29 up 20 min, 1 user, load average: 4.17, 2.35, 1.22
173173
Threads: 231 total, 8 running, 223 sleeping, 0 stopped, 0 zombie
174174
%Cpu(s): 31.7 us, 20.2 sy, 0.0 ni, 31.0 id, 0.0 wa, 0.0 hi, 17.2 si, 0.0 st
@@ -204,24 +204,24 @@ To minimize contention and context switching, align Tomcat’s CPU‑intensive t
204204
...
205205
```
206206

207-
You’ll typically see **`http-nio-8080-e`** and **`http-nio-8080-P`** threads as CPU intensive. Because the **`http-nio-8080-P`** thread count is fixed at 1 (in current Tomcat releases), and you have 8 online CPU cores, set **`http-nio-8080-e`** to **7**.
207+
You’ll typically see `http-nio-8080-e` and `http-nio-8080-P` threads as CPU-intensive. Because the `http-nio-8080-P` thread count is fixed at 1 (in current Tomcat releases), and you have 8 online CPU cores, set `http-nio-8080-e` to 7.
208208

209-
2. Edit `server.xml` and update the HTTP connector to set the worker thread counts and connection limits:
209+
Edit `server.xml` and update the HTTP connector to set the worker thread counts and connection limits:
210210

211-
```bash
211+
```bash
212212
vi ~/apache-tomcat-11.0.10/conf/server.xml
213-
```
213+
```
214214

215-
Replace the existing connector:
216-
```xml
215+
Replace the existing connector:
216+
```xml
217217
<!-- Before -->
218218
<Connector port="8080" protocol="HTTP/1.1"
219219
connectionTimeout="20000"
220220
redirectPort="8443" />
221-
```
221+
```
222222

223-
With the tuned settings:
224-
```xml
223+
With the tuned settings:
224+
```xml
225225
<!-- After -->
226226
<Connector port="8080" protocol="HTTP/1.1"
227227
connectionTimeout="20000"
@@ -231,25 +231,25 @@ To minimize contention and context switching, align Tomcat’s CPU‑intensive t
231231
maxKeepAliveRequests="500000"
232232
maxConnections="100000"
233233
/>
234-
```
234+
```
235235

236-
3. Restart Tomcat and re-run `wrk2`:
236+
Restart Tomcat and re-run `wrk2`:
237237

238-
```bash
238+
```bash
239239
~/apache-tomcat-11.0.10/bin/shutdown.sh 2>/dev/null
240240
ulimit -n 65535 && ~/apache-tomcat-11.0.10/bin/startup.sh
241241

242242
ulimit -n 65535 && wrk -c1280 -t128 -R500000 -d60 http://<tomcat_ip>:8080/examples/servlets/servlet/HelloWorldExample
243-
```
243+
```
244244

245-
Example result:
246-
```output
245+
Example result:
246+
```output
247247
Thread Stats Avg Stdev Max +/- Stdev
248248
Latency 10.26s 4.55s 19.81s 62.51%
249249
Req/Sec 2.86k 89.49 3.51k 77.06%
250250
21458421 requests in 1.00m, 11.13GB read
251251
Requests/sec: 357835.75
252252
Transfer/sec: 190.08MB
253-
```
253+
```
254254

255255
With a solid baseline in place, you’re ready to proceed to NIC queue tuning, NUMA locality optimization, and IOMMU exploration in the next sections.

content/learning-paths/servers-and-cloud-computing/tune-network-workloads-on-bare-metal/3_nic-queue.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -41,7 +41,7 @@ Use the following command to check the current transmit/receive queues of the ${
4141
```bash
4242
sudo ethtool -l ${net}
4343
```
44-
It can be observed that the number of transmit/receive queues for the ${net} network interface is currently 63.
44+
You can see that the number of transmit/receive queues for the ${net} network interface is currently 63:
4545
```bash
4646
Channel parameters for enP11p4s0:
4747
Pre-set maximums:

content/learning-paths/servers-and-cloud-computing/tune-network-workloads-on-bare-metal/5_iommu.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ layout: learningpathall
88

99
## Tune with IOMMU
1010

11-
IOMMU (Input–Output Memory Management Unit) controls how I/O devices access memory. In many cloud environments, SmartNICs offload IOMMU-related work. On Arm Neoverse bare‑metal systems, you can often improve Tomcat networking performance by **disabling strict mode** and **enabling passthrough** (setting `iommu.strict=0` and `iommu.passthrough=1`).
11+
IOMMU (Input–Output Memory Management Unit) controls how I/O devices access memory. In many cloud environments, SmartNICs offload IOMMU-related work. On Arm Neoverse bare‑metal systems, you can often improve Tomcat networking performance by disabling strict mode and enabling passthrough (setting `iommu.strict=0` and `iommu.passthrough=1`).
1212

1313
## Configure IOMMU settings
1414

content/learning-paths/servers-and-cloud-computing/tune-network-workloads-on-bare-metal/_index.md

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -6,14 +6,14 @@ minutes_to_complete: 60
66
who_is_this_for: This is an advanced topic for engineers who want to tune the performance of network workloads on Arm Neoverse-based bare-metal instances.
77

88
learning_objectives:
9-
- Set up a benchmarking environment using Apache Tomcat and wrk2 on an Arm Neoverse bare‑metal host
10-
- Establish a reproducible baseline performance configuration (throughput and latency) before tuning
11-
- Tune NIC multi‑queue, RSS/RPS/XPS, and IRQ affinity to increase throughput and stabilize latency
12-
- Optimize NUMA locality by pinning Tomcat workers and interrupts to local CPUs and memory
13-
- Evaluate IOMMU configuration options and select the setting that maximizes networking performance
9+
- Set up Apache Tomcat and wrk2 to benchmark HTTP on an Arm Neoverse bare‑metal host
10+
- Establish a reproducible baseline baseline (file‑descriptor limits, logging, thread counts, fixed core set)
11+
- Tune NIC queue count to match available cores and measure impact
12+
- Improve NUMA locality by placing Tomcat on the NIC’s NUMA node and aligning worker threads with cores
13+
- Compare IOMMU strict mode and IOMMU passthrough mode, and select the configuration that delivers the best performance for your workload
1414

1515
prerequisites:
16-
- An Arm Neoverse-based bare-metal server running Ubuntu 24.04 to run Apache Tomcat (this Learning Path was tested with an AWS c8g.metal-48xl instance)
16+
- An Arm Neoverse-based bare-metal server running Ubuntu 24.04 to run Apache Tomcat
1717
- Access to an x86_64 bare-metal server running Ubuntu 24.04 to run `wrk2`
1818
- Basic familiarity with Java applications
1919

0 commit comments

Comments
 (0)