Skip to content

Commit b5dd3ab

Browse files
authored
Merge pull request #2261 from iJobsYuYing/main
update tune-network-workloads-on-bare-metal result with AWS c8g.metal…
2 parents 2e43b25 + f4a91ab commit b5dd3ab

File tree

7 files changed

+215
-171
lines changed

7 files changed

+215
-171
lines changed

content/learning-paths/servers-and-cloud-computing/tune-network-workloads-on-bare-metal/1_setup.md

Lines changed: 31 additions & 31 deletions
Original file line numberDiff line numberDiff line change
@@ -11,9 +11,9 @@ layout: learningpathall
1111

1212
There are numerouse client-server and network-based workloads, and Tomcat is a typical example of such applications, which provide services via HTTP/HTTPS network requests.
1313

14-
In this section, you'll set up a benchmark environment using Apache Tomcat and `wrk2` to simulate HTTP load and evaluate performance on an Arm-based bare-metal (**__`Nvidia-Grace`__**).
14+
In this section, you'll set up a benchmark environment using `Apache Tomcat` and `wrk2` to simulate HTTP load and evaluate performance on an Arm-based bare-metal, such as **__`AWS c8g.metal-48xl`__**.
1515

16-
## Set up the Tomcat benchmark server on **Nvidia Grace**
16+
## Set up the Tomcat benchmark server on **AWS c8g.metal-48xl**
1717
[Apache Tomcat](https://tomcat.apache.org/) is an open-source Java Servlet container that runs Java web applications, handles HTTP requests, and serves dynamic content. It supports technologies such as Servlet, JSP, and WebSocket.
1818

1919
## Install the Java Development Kit (JDK)
@@ -30,8 +30,8 @@ sudo apt install -y openjdk-21-jdk
3030
Download and extract Tomcat:
3131

3232
```bash
33-
wget -c https://dlcdn.apache.org/tomcat/tomcat-11/v11.0.9/bin/apache-tomcat-11.0.9.tar.gz
34-
tar xzf apache-tomcat-11.0.9.tar.gz
33+
wget -c https://dlcdn.apache.org/tomcat/tomcat-11/v11.0.10/bin/apache-tomcat-11.0.10.tar.gz
34+
tar xzf apache-tomcat-11.0.10.tar.gz
3535
```
3636
Alternatively, you can build Tomcat [from source](https://github.com/apache/tomcat).
3737

@@ -41,7 +41,7 @@ To access the built-in examples from your local network or external IP, use a te
4141

4242
The file is at:
4343
```bash
44-
apache-tomcat-11.0.9/webapps/examples/META-INF/context.xml
44+
~/apache-tomcat-11.0.10/webapps/examples/META-INF/context.xml
4545
```
4646

4747
```xml
@@ -60,17 +60,17 @@ To achieve maximum performance of Tomcat, the maximum number of file descriptors
6060
Start the server:
6161

6262
```bash
63-
ulimit -n 65535 && ./apache-tomcat-11.0.9/bin/startup.sh
63+
ulimit -n 65535 && ~/apache-tomcat-11.0.10/bin/startup.sh
6464
```
6565

6666
You should see output like:
6767

6868
```output
69-
Using CATALINA_BASE: /home/ubuntu/apache-tomcat-11.0.9
70-
Using CATALINA_HOME: /home/ubuntu/apache-tomcat-11.0.9
71-
Using CATALINA_TMPDIR: /home/ubuntu/apache-tomcat-11.0.9/temp
69+
Using CATALINA_BASE: /home/ubuntu/apache-tomcat-11.0.10
70+
Using CATALINA_HOME: /home/ubuntu/apache-tomcat-11.0.10
71+
Using CATALINA_TMPDIR: /home/ubuntu/apache-tomcat-11.0.10/temp
7272
Using JRE_HOME: /usr
73-
Using CLASSPATH: /home/ubuntu/apache-tomcat-11.0.9/bin/bootstrap.jar:/home/ubuntu/apache-tomcat-11.0.9/bin/tomcat-juli.jar
73+
Using CLASSPATH: /home/ubuntu/apache-tomcat-11.0.10/bin/bootstrap.jar:/home/ubuntu/apache-tomcat-11.0.10/bin/tomcat-juli.jar
7474
Using CATALINA_OPTS:
7575
Tomcat started.
7676
```
@@ -132,28 +132,28 @@ ulimit -n 65535 && wrk -c32 -t16 -R50000 -d60 http://${tomcat_ip}:8080/examples/
132132
You should see output similar to:
133133

134134
```console
135-
Running 1m test @ http://172.26.203.139:8080/examples/servlets/servlet/HelloWorldExample
135+
Running 1m test @ http://172.31.46.193:8080/examples/servlets/servlet/HelloWorldExample
136136
16 threads and 32 connections
137-
Thread calibration: mean lat.: 0.986ms, rate sampling interval: 10ms
138-
Thread calibration: mean lat.: 0.984ms, rate sampling interval: 10ms
139-
Thread calibration: mean lat.: 0.999ms, rate sampling interval: 10ms
140-
Thread calibration: mean lat.: 0.994ms, rate sampling interval: 10ms
141-
Thread calibration: mean lat.: 0.983ms, rate sampling interval: 10ms
142-
Thread calibration: mean lat.: 0.989ms, rate sampling interval: 10ms
143-
Thread calibration: mean lat.: 0.991ms, rate sampling interval: 10ms
144-
Thread calibration: mean lat.: 0.993ms, rate sampling interval: 10ms
145-
Thread calibration: mean lat.: 0.985ms, rate sampling interval: 10ms
146-
Thread calibration: mean lat.: 0.990ms, rate sampling interval: 10ms
147-
Thread calibration: mean lat.: 0.987ms, rate sampling interval: 10ms
148-
Thread calibration: mean lat.: 0.990ms, rate sampling interval: 10ms
149-
Thread calibration: mean lat.: 0.984ms, rate sampling interval: 10ms
150-
Thread calibration: mean lat.: 0.991ms, rate sampling interval: 10ms
151-
Thread calibration: mean lat.: 0.978ms, rate sampling interval: 10ms
152-
Thread calibration: mean lat.: 0.976ms, rate sampling interval: 10ms
137+
Thread calibration: mean lat.: 3.381ms, rate sampling interval: 10ms
138+
Thread calibration: mean lat.: 3.626ms, rate sampling interval: 10ms
139+
Thread calibration: mean lat.: 3.020ms, rate sampling interval: 10ms
140+
Thread calibration: mean lat.: 3.578ms, rate sampling interval: 10ms
141+
Thread calibration: mean lat.: 3.166ms, rate sampling interval: 10ms
142+
Thread calibration: mean lat.: 3.275ms, rate sampling interval: 10ms
143+
Thread calibration: mean lat.: 3.454ms, rate sampling interval: 10ms
144+
Thread calibration: mean lat.: 3.655ms, rate sampling interval: 10ms
145+
Thread calibration: mean lat.: 3.334ms, rate sampling interval: 10ms
146+
Thread calibration: mean lat.: 3.089ms, rate sampling interval: 10ms
147+
Thread calibration: mean lat.: 3.365ms, rate sampling interval: 10ms
148+
Thread calibration: mean lat.: 3.382ms, rate sampling interval: 10ms
149+
Thread calibration: mean lat.: 3.342ms, rate sampling interval: 10ms
150+
Thread calibration: mean lat.: 3.349ms, rate sampling interval: 10ms
151+
Thread calibration: mean lat.: 3.023ms, rate sampling interval: 10ms
152+
Thread calibration: mean lat.: 3.275ms, rate sampling interval: 10ms
153153
Thread Stats Avg Stdev Max +/- Stdev
154-
Latency 1.00ms 454.90us 5.09ms 63.98%
155-
Req/Sec 3.31k 241.68 4.89k 63.83%
156-
2999817 requests in 1.00m, 1.56GB read
157-
Requests/sec: 49997.08
154+
Latency 1.02ms 398.88us 4.24ms 66.77%
155+
Req/Sec 3.30k 210.16 4.44k 70.04%
156+
2999776 requests in 1.00m, 1.56GB read
157+
Requests/sec: 49996.87
158158
Transfer/sec: 26.57MB
159159
```

content/learning-paths/servers-and-cloud-computing/tune-network-workloads-on-bare-metal/2_baseline.md

Lines changed: 109 additions & 59 deletions
Original file line numberDiff line numberDiff line change
@@ -11,45 +11,79 @@ To achieve maximum performance, ulimit -n 65535 must be executed on both server
1111
{{% /notice %}}
1212

1313
## Optimal baseline before tuning
14-
- Baseline on Grace bare-metal (default configuration)
15-
- Baseline on Grace bare-metal (access logging disabled)
16-
- Baseline on Grace bare-metal (optimal thread count)
14+
- Align the IOMMU settings with default Ubuntu
15+
- Baseline on Arm Neoverse bare-metal (default configuration)
16+
- Baseline on Arm Neoverse bare-metal (access logging disabled)
17+
- Baseline on Arm Neoverse bare-metal (optimal thread count)
18+
19+
### Align the IOMMU settings with default Ubuntu
20+
21+
{{% notice Note %}}
22+
Due to the customized Ubuntu distribution on AWS, you first need to align the IOMMU settings with default Ubuntu: iommu.strict=1 and iommu.passthrough=0.
23+
{{% /notice %}}
24+
25+
1. Setting IOMMU default status, use a text editor to modify the `grub` file by adding or updating the `GRUB_CMDLINE_LINUX` configuration.
26+
27+
```bash
28+
sudo vi /etc/default/grub
29+
```
30+
then add or update
31+
```bash
32+
GRUB_CMDLINE_LINUX="iommu.strict=1 iommu.passthrough=0"
33+
```
34+
35+
2. Update GRUB and reboot to apply the default settings.
36+
```bash
37+
sudo update-grub && sudo reboot
38+
```
39+
40+
3. Verify whether the default settings have been successfully applied.
41+
```bash
42+
sudo dmesg | grep iommu
43+
```
44+
It can be observed that under the default configuration, iommu.strict is enabled, and iommu.passthrough is disabled.
45+
```bash
46+
[ 0.877401] iommu: Default domain type: Translated (set via kernel command line)
47+
[ 0.877404] iommu: DMA domain TLB invalidation policy: strict mode (set via kernel command line)
48+
...
49+
```
50+
51+
### Baseline on Arm Neoverse bare-metal (default configuration)
1752

18-
### Baseline on Grace bare-metal (default configuration)
1953
{{% notice Note %}}
2054
To align with the typical deployment scenario of Tomcat, reserve 8 cores online and set all other cores offline
2155
{{% /notice %}}
2256

2357
1. You can offline the CPU cores using the below command.
2458
```bash
25-
for no in {8..143}; do sudo bash -c "echo 0 > /sys/devices/system/cpu/cpu${no}/online"; done
59+
for no in {8..191}; do sudo bash -c "echo 0 > /sys/devices/system/cpu/cpu${no}/online"; done
2660
```
2761
2. Use the following commands to verify that cores 0-7 are online and the remaining cores are offline.
2862
```bash
2963
lscpu
3064
```
3165
You can check the following information:
3266
```bash
33-
Architecture: aarch64
34-
CPU op-mode(s): 64-bit
35-
Byte Order: Little Endian
36-
CPU(s): 144
37-
On-line CPU(s) list: 0-7
38-
Off-line CPU(s) list: 8-143
39-
Vendor ID: ARM
40-
Model name: Neoverse-V2
67+
Architecture: aarch64
68+
CPU op-mode(s): 64-bit
69+
Byte Order: Little Endian
70+
CPU(s): 192
71+
On-line CPU(s) list: 0-7
72+
Off-line CPU(s) list: 8-191
73+
Vendor ID: ARM
74+
Model name: Neoverse-V2
4175
...
4276
```
4377

44-
3. Use the following command on the Grace bare-metal where `Tomcat` is on
78+
3. Use the following command on the Arm Neoverse bare-metal where `Tomcat` is on
4579
```bash
46-
~/apache-tomcat-11.0.9/bin/shutdown.sh 2>/dev/null
47-
ulimit -n 65535 && ~/apache-tomcat-11.0.9/bin/startup.sh
80+
~/apache-tomcat-11.0.10/bin/shutdown.sh 2>/dev/null
81+
ulimit -n 65535 && ~/apache-tomcat-11.0.10/bin/startup.sh
4882
```
4983

5084
4. And use the following command on the `x86_64` bare-metal where `wrk2` is on
5185
```bash
52-
tomcat_ip=10.169.226.181
86+
tomcat_ip=172.31.46.193
5387
```
5488
```bash
5589
ulimit -n 65535 && wrk -c1280 -t128 -R500000 -d60 http://${tomcat_ip}:8080/examples/servlets/servlet/HelloWorldExample
@@ -58,20 +92,20 @@ ulimit -n 65535 && wrk -c1280 -t128 -R500000 -d60 http://${tomcat_ip}:8080/examp
5892
The result of default configuration is:
5993
```bash
6094
Thread Stats Avg Stdev Max +/- Stdev
61-
Latency 13.29s 3.25s 19.07s 57.79%
62-
Req/Sec 347.59 430.94 0.97k 66.67%
63-
3035300 requests in 1.00m, 1.58GB read
64-
Socket errors: connect 1280, read 0, write 0, timeout 21760
65-
Requests/sec: 50517.09
66-
Transfer/sec: 26.84MB
95+
Latency 16.76s 6.59s 27.56s 56.98%
96+
Req/Sec 1.97k 165.05 2.33k 89.90%
97+
14680146 requests in 1.00m, 7.62GB read
98+
Socket errors: connect 1264, read 0, write 0, timeout 1748
99+
Requests/sec: 244449.62
100+
Transfer/sec: 129.90MB
67101
```
68102

69-
### Baseline on Grace bare-metal (access logging disabled)
103+
### Baseline on Arm Neoverse bare-metal (access logging disabled)
70104
To disable the access logging, use a text editor to modify the `server.xml` file by commenting out or removing the **`org.apache.catalina.valves.AccessLogValve`** configuration.
71105

72106
The file is at:
73107
```bash
74-
vi ~/apache-tomcat-11.0.9/conf/server.xml
108+
vi ~/apache-tomcat-11.0.10/conf/server.xml
75109
```
76110

77111
The configuratin is at the end of the file, and common out or remove it.
@@ -83,10 +117,10 @@ The configuratin is at the end of the file, and common out or remove it.
83117
-->
84118
```
85119

86-
1. Use the following command on the Grace bare-metal where `Tomcat` is on
120+
1. Use the following command on the Arm Neoverse bare-metal where `Tomcat` is on
87121
```bash
88-
~/apache-tomcat-11.0.9/bin/shutdown.sh 2>/dev/null
89-
ulimit -n 65535 && ~/apache-tomcat-11.0.9/bin/startup.sh
122+
~/apache-tomcat-11.0.10/bin/shutdown.sh 2>/dev/null
123+
ulimit -n 65535 && ~/apache-tomcat-11.0.10/bin/startup.sh
90124
```
91125

92126
2. And use the following command on the `x86_64` bare-metal where `wrk2` is on
@@ -97,15 +131,15 @@ ulimit -n 65535 && wrk -c1280 -t128 -R500000 -d60 http://${tomcat_ip}:8080/examp
97131
The result of access logging disabled is:
98132
```bash
99133
Thread Stats Avg Stdev Max +/- Stdev
100-
Latency 12.66s 3.05s 17.87s 57.47%
101-
Req/Sec 433.69 524.91 1.18k 66.67%
102-
3572200 requests in 1.00m, 1.85GB read
103-
Socket errors: connect 1280, read 0, write 0, timeout 21760
104-
Requests/sec: 59451.85
105-
Transfer/sec: 31.59MB
134+
Latency 16.16s 6.45s 28.26s 57.85%
135+
Req/Sec 2.16k 5.91 2.17k 77.50%
136+
16291136 requests in 1.00m, 8.45GB read
137+
Socket errors: connect 0, read 0, write 0, timeout 75
138+
Requests/sec: 271675.12
139+
Transfer/sec: 144.36MB
106140
```
107141

108-
### Baseline on Grace bare-metal (optimal thread count)
142+
### Baseline on Arm Neoverse bare-metal (optimal thread count)
109143
To minimize resource contention between threads and overhead from thread context switching, the number of CPU-intensive threads in Tomcat should be aligned with the number of CPU cores.
110144

111145
1. When using `wrk` to perform pressure testing on `Tomcat`:
@@ -115,23 +149,39 @@ top -H -p$(pgrep java)
115149

116150
You can see the below information
117151
```bash
118-
top - 12:12:45 up 1 day, 7:04, 5 users, load average: 7.22, 3.46, 1.75
119-
Threads: 79 total, 8 running, 71 sleeping, 0 stopped, 0 zombie
120-
%Cpu(s): 3.4 us, 1.9 sy, 0.0 ni, 94.1 id, 0.0 wa, 0.0 hi, 0.5 si, 0.0 st
121-
MiB Mem : 964975.5 total, 602205.6 free, 12189.5 used, 356708.3 buff/cache
122-
MiB Swap: 0.0 total, 0.0 free, 0.0 used. 952786.0 avail Mem
152+
top - 08:57:29 up 20 min, 1 user, load average: 4.17, 2.35, 1.22
153+
Threads: 231 total, 8 running, 223 sleeping, 0 stopped, 0 zombie
154+
%Cpu(s): 31.7 us, 20.2 sy, 0.0 ni, 31.0 id, 0.0 wa, 0.0 hi, 17.2 si, 0.0 st
155+
MiB Mem : 386127.8 total, 380676.0 free, 4040.7 used, 2801.1 buff/cache
156+
MiB Swap: 0.0 total, 0.0 free, 0.0 used. 382087.0 avail Mem
123157

124158
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
125-
53254 yinyu01 20 0 38.0g 1.4g 28288 R 96.7 0.1 2:30.70 http-nio-8080-e
126-
53255 yinyu01 20 0 38.0g 1.4g 28288 R 96.7 0.1 2:30.62 http-nio-8080-e
127-
53256 yinyu01 20 0 38.0g 1.4g 28288 R 96.7 0.1 2:30.64 http-nio-8080-e
128-
53258 yinyu01 20 0 38.0g 1.4g 28288 R 96.7 0.1 2:30.62 http-nio-8080-e
129-
53260 yinyu01 20 0 38.0g 1.4g 28288 R 96.7 0.1 2:30.69 http-nio-8080-e
130-
53257 yinyu01 20 0 38.0g 1.4g 28288 R 96.3 0.1 2:30.59 http-nio-8080-e
131-
53259 yinyu01 20 0 38.0g 1.4g 28288 R 96.3 0.1 2:30.63 http-nio-8080-e
132-
53309 yinyu01 20 0 38.0g 1.4g 28288 R 95.3 0.1 2:29.69 http-nio-8080-P
133-
53231 yinyu01 20 0 38.0g 1.4g 28288 S 0.3 0.1 0:00.10 VM Thread
134-
53262 yinyu01 20 0 38.0g 1.4g 28288 S 0.3 0.1 0:00.12 GC Thread#2
159+
4677 ubuntu 20 0 36.0g 1.4g 24452 R 89.0 0.4 1:18.71 http-nio-8080-P
160+
4685 ubuntu 20 0 36.0g 1.4g 24452 R 4.7 0.4 0:04.42 http-nio-8080-A
161+
4893 ubuntu 20 0 36.0g 1.4g 24452 S 3.3 0.4 0:00.60 http-nio-8080-e
162+
4963 ubuntu 20 0 36.0g 1.4g 24452 S 3.3 0.4 0:00.66 http-nio-8080-e
163+
4924 ubuntu 20 0 36.0g 1.4g 24452 S 3.0 0.4 0:00.59 http-nio-8080-e
164+
4955 ubuntu 20 0 36.0g 1.4g 24452 S 3.0 0.4 0:00.60 http-nio-8080-e
165+
5061 ubuntu 20 0 36.0g 1.4g 24452 S 3.0 0.4 0:00.61 http-nio-8080-e
166+
4895 ubuntu 20 0 36.0g 1.4g 24452 S 2.7 0.4 0:00.58 http-nio-8080-e
167+
4907 ubuntu 20 0 36.0g 1.4g 24452 S 2.7 0.4 0:00.59 http-nio-8080-e
168+
4940 ubuntu 20 0 36.0g 1.4g 24452 S 2.7 0.4 0:00.58 http-nio-8080-e
169+
4946 ubuntu 20 0 36.0g 1.4g 24452 S 2.7 0.4 0:00.59 http-nio-8080-e
170+
4956 ubuntu 20 0 36.0g 1.4g 24452 S 2.7 0.4 0:00.65 http-nio-8080-e
171+
4959 ubuntu 20 0 36.0g 1.4g 24452 S 2.7 0.4 0:00.59 http-nio-8080-e
172+
4960 ubuntu 20 0 36.0g 1.4g 24452 R 2.7 0.4 0:00.60 http-nio-8080-e
173+
4962 ubuntu 20 0 36.0g 1.4g 24452 S 2.7 0.4 0:00.57 http-nio-8080-e
174+
4982 ubuntu 20 0 36.0g 1.4g 24452 S 2.7 0.4 0:00.63 http-nio-8080-e
175+
4983 ubuntu 20 0 36.0g 1.4g 24452 S 2.7 0.4 0:00.58 http-nio-8080-e
176+
4996 ubuntu 20 0 36.0g 1.4g 24452 S 2.7 0.4 0:00.60 http-nio-8080-e
177+
5033 ubuntu 20 0 36.0g 1.4g 24452 S 2.7 0.4 0:00.59 http-nio-8080-e
178+
5036 ubuntu 20 0 36.0g 1.4g 24452 S 2.7 0.4 0:00.66 http-nio-8080-e
179+
5056 ubuntu 20 0 36.0g 1.4g 24452 S 2.7 0.4 0:00.61 http-nio-8080-e
180+
5065 ubuntu 20 0 36.0g 1.4g 24452 S 2.7 0.4 0:00.56 http-nio-8080-e
181+
5068 ubuntu 20 0 36.0g 1.4g 24452 S 2.7 0.4 0:00.61 http-nio-8080-e
182+
5070 ubuntu 20 0 36.0g 1.4g 24452 S 2.7 0.4 0:00.60 http-nio-8080-e
183+
5071 ubuntu 20 0 36.0g 1.4g 24452 S 2.7 0.4 0:00.61 http-nio-8080-e
184+
...
135185
```
136186

137187
It can be observed that **`http-nio-8080-e`** and **`http-nio-8080-P`** threads are CPU-intensive.
@@ -141,7 +191,7 @@ To configure the `http-nio-8080-e` thread count, use a text editor to modify the
141191

142192
The file is at:
143193
```bash
144-
vi ~/apache-tomcat-11.0.9/conf/server.xml
194+
vi ~/apache-tomcat-11.0.10/conf/server.xml
145195
```
146196

147197

@@ -164,10 +214,10 @@ vi ~/apache-tomcat-11.0.9/conf/server.xml
164214
/>
165215
```
166216

167-
2. Use the following command on the Grace bare-metal where `Tomcat` is on
217+
2. Use the following command on the Arm Neoverse bare-metal where `Tomcat` is on
168218
```bash
169-
~/apache-tomcat-11.0.9/bin/shutdown.sh 2>/dev/null
170-
ulimit -n 65535 && ~/apache-tomcat-11.0.9/bin/startup.sh
219+
~/apache-tomcat-11.0.10/bin/shutdown.sh 2>/dev/null
220+
ulimit -n 65535 && ~/apache-tomcat-11.0.10/bin/startup.sh
171221
```
172222

173223
3. And use the following command on the `x86_64` bare-metal where `wrk2` is on
@@ -178,9 +228,9 @@ ulimit -n 65535 && wrk -c1280 -t128 -R500000 -d60 http://${tomcat_ip}:8080/examp
178228
The result of optimal thread count is:
179229
```bash
180230
Thread Stats Avg Stdev Max +/- Stdev
181-
Latency 24.34s 9.91s 41.81s 57.77%
182-
Req/Sec 1.22k 4.29 1.23k 71.09%
183-
9255672 requests in 1.00m, 4.80GB read
184-
Requests/sec: 154479.07
185-
Transfer/sec: 82.06MB
231+
Latency 10.26s 4.55s 19.81s 62.51%
232+
Req/Sec 2.86k 89.49 3.51k 77.06%
233+
21458421 requests in 1.00m, 11.13GB read
234+
Requests/sec: 357835.75
235+
Transfer/sec: 190.08MB
186236
```

0 commit comments

Comments
 (0)