Skip to content

Commit 786c5cd

Browse files
authored
Update example build steps and measurement results
1 parent 789398e commit 786c5cd

File tree

1 file changed

+88
-88
lines changed

1 file changed

+88
-88
lines changed

jsoniter-scala-examples/README.md

Lines changed: 88 additions & 88 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@
55
Here you will learn how to build and run the simplest example and measure the startup time on Linux using
66
Debian packages
77

8-
### Build uber jar, print its size, and measure its start up time (tested with Oracle GraalVM 24-dev)
8+
### Build uber jar, print its size, and measure its start up time (tested with Oracle GraalVM 25-dev)
99

1010
```sh
1111
sudo apt install linux-tools-common linux-tools-generic
@@ -18,23 +18,23 @@ Expected output:
1818
```text
1919
Performance counter stats for './example01.jar' (100 runs):
2020
21-
275.05 msec task-clock # 2.067 CPUs utilized ( +- 0.19% )
22-
1,970 context-switches # 7.162 K/sec ( +- 0.48% )
23-
15 cpu-migrations # 54.536 /sec ( +- 2.23% )
24-
22,159 page-faults # 80.565 K/sec ( +- 0.20% )
25-
1,101,372,203 cycles # 4.004 GHz ( +- 0.20% )
26-
1,504,873,660 instructions # 1.37 insn per cycle ( +- 0.10% )
27-
297,952,673 branches # 1.083 G/sec ( +- 0.10% )
28-
9,110,277 branch-misses # 3.06% of all branches ( +- 0.11% )
29-
TopdownL1 # 14.1 % tma_backend_bound
30-
# 35.6 % tma_bad_speculation
31-
# 30.0 % tma_frontend_bound
32-
# 20.3 % tma_retiring ( +- 0.13% )
21+
202.75 msec task-clock # 2.286 CPUs utilized ( +- 0.27% )
22+
2,191 context-switches # 10.806 K/sec ( +- 1.53% )
23+
39 cpu-migrations # 192.356 /sec ( +- 2.49% )
24+
24,775 page-faults # 122.195 K/sec ( +- 0.27% )
25+
303,999,880 cpu_atom/cycles/ # 1.499 GHz ( +- 3.51% ) (71.71%)
26+
1,005,678,639 cpu_core/cycles/ # 4.960 GHz ( +- 0.44% ) (77.24%)
27+
431,508,692 cpu_atom/instructions/ # 1.42 insn per cycle ( +- 3.69% ) (71.71%)
28+
1,684,651,998 cpu_core/instructions/ # 5.54 insn per cycle ( +- 0.47% ) (77.24%)
29+
83,740,906 cpu_atom/branches/ # 413.027 M/sec ( +- 3.78% ) (71.71%)
30+
335,230,949 cpu_core/branches/ # 1.653 G/sec ( +- 0.49% ) (77.24%)
31+
2,017,405 cpu_atom/branch-misses/ # 2.41% of all branches ( +- 4.20% ) (71.71%)
32+
9,450,892 cpu_core/branch-misses/ # 11.29% of all branches ( +- 0.47% ) (77.24%)
3333
34-
0.133067 +- 0.000190 seconds time elapsed ( +- 0.14% )
34+
0.088700 +- 0.000135 seconds time elapsed ( +- 0.15% )
3535
```
3636

37-
### Build Scala JS output, print its size, and measure its start up time with `node`
37+
### Build Scala JS output, print its size, and measure its start up time with `node` (tested with node.js 22)
3838

3939
```sh
4040
curl https://raw.githubusercontent.com/creationix/nvm/master/install.sh | bash
@@ -49,23 +49,23 @@ Expected output:
4949
```text
5050
Performance counter stats for 'node ./example01.js' (100 runs):
5151
52-
21.58 msec task-clock # 0.996 CPUs utilized ( +- 0.16% )
53-
19 context-switches # 880.402 /sec ( +- 1.65% )
54-
0 cpu-migrations # 0.000 /sec
55-
2,705 page-faults # 125.341 K/sec ( +- 0.04% )
56-
94,278,347 cycles # 4.369 GHz ( +- 0.14% )
57-
168,171,694 instructions # 1.78 insn per cycle ( +- 0.01% )
58-
28,793,274 branches # 1.334 G/sec ( +- 0.02% )
59-
580,994 branch-misses # 2.02% of all branches ( +- 0.06% )
60-
TopdownL1 # 17.7 % tma_backend_bound
61-
# 14.4 % tma_bad_speculation
62-
# 31.8 % tma_frontend_bound
63-
# 36.2 % tma_retiring ( +- 0.14% )
52+
14.90 msec task-clock # 1.014 CPUs utilized ( +- 0.30% )
53+
20 context-switches # 1.343 K/sec ( +- 1.60% )
54+
2 cpu-migrations # 134.254 /sec ( +- 5.08% )
55+
2,748 page-faults # 184.466 K/sec ( +- 0.00% )
56+
45,371,808 cpu_atom/cycles/ # 3.046 GHz ( +- 2.61% ) (20.12%)
57+
75,772,761 cpu_core/cycles/ # 5.086 GHz ( +- 0.32% ) (85.22%)
58+
97,892,289 cpu_atom/instructions/ # 2.16 insn per cycle ( +- 3.38% ) (20.12%)
59+
165,775,873 cpu_core/instructions/ # 3.65 insn per cycle ( +- 0.29% ) (85.22%)
60+
16,573,542 cpu_atom/branches/ # 1.113 G/sec ( +- 3.35% ) (20.12%)
61+
28,245,004 cpu_core/branches/ # 1.896 G/sec ( +- 0.27% ) (85.22%)
62+
97,278 cpu_atom/branch-misses/ # 0.59% of all branches ( +- 11.08% ) (20.12%)
63+
573,362 cpu_core/branch-misses/ # 3.46% of all branches ( +- 0.72% ) (85.22%)
6464
65-
0.0216665 +- 0.0000338 seconds time elapsed ( +- 0.16% )
65+
0.0146864 +- 0.0000435 seconds time elapsed ( +- 0.30% )
6666
```
6767

68-
### Build Scala JS Wasm output, print its size, and measure its start up time with `node`
68+
### Build Scala JS Wasm output, print its size, and measure its start up time with `node` (tested with node.js 22)
6969

7070
```sh
7171
curl https://raw.githubusercontent.com/creationix/nvm/master/install.sh | bash
@@ -81,78 +81,78 @@ Expected output:
8181
```text
8282
Performance counter stats for 'node --experimental-wasm-exnref ./example01.js/main.js' (100 runs):
8383
84-
33.51 msec task-clock # 0.997 CPUs utilized ( +- 0.15% )
85-
66 context-switches # 1.970 K/sec ( +- 0.44% )
86-
0 cpu-migrations # 0.000 /sec
87-
3,102 page-faults # 92.567 K/sec ( +- 0.04% )
88-
146,180,631 cycles # 4.362 GHz ( +- 0.13% )
89-
239,361,867 instructions # 1.64 insn per cycle ( +- 0.01% )
90-
41,631,362 branches # 1.242 G/sec ( +- 0.01% )
91-
1,112,970 branch-misses # 2.67% of all branches ( +- 0.04% )
92-
TopdownL1 # 14.6 % tma_backend_bound
93-
# 23.2 % tma_bad_speculation
94-
# 32.0 % tma_frontend_bound
95-
# 30.3 % tma_retiring ( +- 0.12% )
84+
23.36 msec task-clock # 1.036 CPUs utilized ( +- 0.23% )
85+
61 context-switches # 2.611 K/sec ( +- 0.47% )
86+
3 cpu-migrations # 128.413 /sec ( +- 4.53% )
87+
3,156 page-faults # 135.090 K/sec ( +- 0.01% )
88+
81,924,942 cpu_atom/cycles/ # 3.507 GHz ( +- 0.93% ) (10.81%)
89+
115,425,581 cpu_core/cycles/ # 4.941 GHz ( +- 0.30% )
90+
151,891,461 cpu_atom/instructions/ # 1.85 insn per cycle ( +- 1.65% ) (10.81%)
91+
228,699,784 cpu_core/instructions/ # 2.79 insn per cycle ( +- 0.23% )
92+
26,745,975 cpu_atom/branches/ # 1.145 G/sec ( +- 1.49% ) (10.81%)
93+
39,592,483 cpu_core/branches/ # 1.695 G/sec ( +- 0.24% )
94+
459,225 cpu_atom/branch-misses/ # 1.72% of all branches ( +- 1.76% ) (10.81%)
95+
1,062,312 cpu_core/branch-misses/ # 3.97% of all branches ( +- 0.30% )
9696
97-
0.0336197 +- 0.0000488 seconds time elapsed ( +- 0.15% )
97+
0.0225519 +- 0.0000530 seconds time elapsed ( +- 0.23% )
9898
```
9999

100-
### Build GraalVM native image, print its size, and measure its start up time
100+
### Build GraalVM native image, print its size, and measure its start up time (tested with Oracle GraalVM 24)
101101

102102
```sh
103103
sudo apt install linux-tools-common linux-tools-generic gcc zlib1g-dev
104104
sudo sysctl kernel.perf_event_paranoid=1
105-
scala-cli --power package --graalvm-jvm-id graalvm-oracle:23 --native-image example01.sc --force -o example01_graalvm.bin -- --no-fallback -O3 -H:+UnlockExperimentalVMOptions -R:MaxHeapSize=16m -H:-GenLoopSafepoints -H:-ParseRuntimeOptions -H:-IncludeMethodData --initialize-at-build-time
105+
scala-cli --power package --graalvm-jvm-id graalvm-oracle:24 --native-image example01.sc --force -o example01_graalvm.bin -- --no-fallback -O3 -H:+UnlockExperimentalVMOptions -R:MaxHeapSize=16m -H:-GenLoopSafepoints -H:-ParseRuntimeOptions -H:-IncludeMethodData --initialize-at-build-time
106106
ls -l ./example01_graalvm.bin
107107
perf stat -r 100 ./example01_graalvm.bin > /dev/null
108108
```
109109
Expected output:
110110
```text
111111
Performance counter stats for './example01_graalvm.bin' (100 runs):
112112
113-
1.54 msec task-clock # 0.929 CPUs utilized ( +- 2.17% )
114-
1 context-switches # 647.939 /sec ( +- 2.93% )
113+
1.08 msec task-clock # 0.911 CPUs utilized ( +- 0.73% )
114+
1 context-switches # 927.547 /sec ( +- 4.79% )
115115
0 cpu-migrations # 0.000 /sec
116-
535 page-faults # 346.647 K/sec ( +- 0.01% )
117-
6,194,402 cycles # 4.014 GHz ( +- 0.32% )
118-
8,776,339 instructions # 1.42 insn per cycle ( +- 0.04% )
119-
1,661,644 branches # 1.077 G/sec ( +- 0.03% )
120-
18,496 branch-misses # 1.11% of all branches ( +- 0.87% )
121-
TopdownL1 # 32.3 % tma_backend_bound
122-
# 6.6 % tma_bad_speculation
123-
# 30.4 % tma_frontend_bound
124-
# 30.7 % tma_retiring ( +- 0.31% )
116+
477 page-faults # 442.440 K/sec ( +- 0.01% )
117+
4,315,458 cpu_atom/cycles/ # 4.003 GHz ( +- 1.85% ) (19.75%)
118+
3,872,466 cpu_core/cycles/ # 3.592 GHz ( +- 5.11% )
119+
10,829,542 cpu_atom/instructions/ # 2.51 insn per cycle ( +- 2.48% ) (19.75%)
120+
5,351,563 cpu_core/instructions/ # 1.24 insn per cycle ( +- 5.21% )
121+
1,856,118 cpu_atom/branches/ # 1.722 G/sec ( +- 2.33% ) (19.75%)
122+
1,000,400 cpu_core/branches/ # 927.918 M/sec ( +- 5.18% )
123+
7,651 cpu_atom/branch-misses/ # 0.41% of all branches ( +- 6.90% ) (19.75%)
124+
11,792 cpu_core/branch-misses/ # 0.64% of all branches ( +- 5.28% )
125125
126-
0.0016622 +- 0.0000371 seconds time elapsed ( +- 2.23% )
126+
0.00118361 +- 0.00000667 seconds time elapsed ( +- 0.56% )
127127
```
128128

129-
### Build Scala Native image, print its size, and measure its start up time
129+
### Build Scala Native image, print its size, and measure its start up time (tested with Scala Native 0.5.7)
130130

131131
```sh
132132
sudo apt install linux-tools-common linux-tools-generic clang libstdc++-12-dev libgc-dev
133133
sudo sysctl kernel.perf_event_paranoid=1
134-
scala-cli --power package --native-version 0.5.5 --native example01.sc --native-mode release-full --force -o example01_native.bin
134+
scala-cli --power package --native-version 0.5.7 --native example01.sc --native-mode release-full --force -o example01_native.bin
135135
ls -l ./example01_native.bin
136136
perf stat -r 100 ./example01_native.bin > /dev/null
137137
```
138138
Expected output:
139139
```text
140140
Performance counter stats for './example01_native.bin' (100 runs):
141141
142-
0.80 msec task-clock # 0.823 CPUs utilized ( +- 0.41% )
142+
0.71 msec task-clock # 0.855 CPUs utilized ( +- 0.64% )
143143
0 context-switches # 0.000 /sec
144144
0 cpu-migrations # 0.000 /sec
145-
246 page-faults # 306.115 K/sec ( +- 0.04% )
146-
3,345,318 cycles # 4.163 GHz ( +- 0.49% )
147-
5,077,992 instructions # 1.52 insn per cycle ( +- 0.05% )
148-
924,515 branches # 1.150 G/sec ( +- 0.05% )
149-
16,240 branch-misses # 1.76% of all branches ( +- 0.44% )
150-
TopdownL1 # 26.8 % tma_backend_bound
151-
# 11.7 % tma_bad_speculation
152-
# 29.4 % tma_frontend_bound
153-
# 32.1 % tma_retiring ( +- 0.48% )
145+
731 page-faults # 1.024 M/sec ( +- 0.01% )
146+
<not counted> cpu_atom/cycles/ ( +-100.00% ) (0.00%)
147+
3,769,012 cpu_core/cycles/ # 5.280 GHz ( +- 1.17% )
148+
<not counted> cpu_atom/instructions/ ( +-100.00% ) (0.00%)
149+
5,907,572 cpu_core/instructions/ # 139.11 insn per cycle ( +- 1.02% )
150+
<not counted> cpu_atom/branches/ ( +-100.01% ) (0.00%)
151+
1,055,882 cpu_core/branches/ # 1.479 G/sec ( +- 1.02% )
152+
<not counted> cpu_atom/branch-misses/ ( +-100.62% ) (0.00%)
153+
9,104 cpu_core/branch-misses/ # 84.85% of all branches ( +- 1.35% )
154154
155-
0.00097599 +- 0.00000574 seconds time elapsed ( +- 0.59% )
155+
0.00083490 +- 0.00000519 seconds time elapsed ( +- 0.62% )
156156
```
157157

158158
## RFC-8259 validation (example02)
@@ -175,16 +175,16 @@ gunzip 2023_06_430_65B0_in_network_rates.json.gz
175175
Speed of validation highly depends on performance of file reading on the particular disk system. Use the following command
176176
to measure reading speed on Linux:
177177
```sh
178-
sudo hdparm -Tt /dev/sda
178+
sudo hdparm -Tt /dev/nvme0n1p4
179179
```
180180
Here is an example of expected output:
181181
```text
182-
/dev/sda:
183-
Timing cached reads: 45532 MB in 2.00 seconds = 22806.82 MB/sec
184-
Timing buffered disk reads: 2180 MB in 3.04 seconds = 716.07 MB/sec
182+
/dev/nvme0n1p4:
183+
Timing cached reads: 48588 MB in 2.00 seconds = 24334.91 MB/sec
184+
Timing buffered disk reads: 4456 MB in 3.00 seconds = 1485.08 MB/sec
185185
```
186186

187-
### Build uber jar, print its size, and measure its running time (tested with Oracle GraalVM 24-dev)
187+
### Build uber jar, print its size, and measure its running time (tested with Oracle GraalVM 25-dev)
188188

189189
```sh
190190
scala-cli --power package --assembly example02.sc --force -o example02.jar
@@ -193,12 +193,12 @@ time ./example02.jar -J-XX:+UnlockExperimentalVMOptions -J-XX:+UseEpsilonGC -J-X
193193
```
194194
Expected output:
195195
```text
196-
real 1m8.991s
197-
user 1m5.239s
198-
sys 0m4.102s
196+
real 0m37.615s
197+
user 0m35.187s
198+
sys 0m2.662s
199199
```
200200

201-
### Build GraalVM native image, print its size, and measure its running time
201+
### Build GraalVM native image, print its size, and measure its running time (tested with Oracle GraalVM 23)
202202

203203
```sh
204204
scala-cli --power package --graalvm-jvm-id graalvm-oracle:23 --native-image example02.sc --force -o example02_graalvm.bin -- --no-fallback --gc=epsilon -O3 -H:+UnlockExperimentalVMOptions -R:MaxHeapSize=16m -H:-GenLoopSafepoints -H:-ParseRuntimeOptions -H:-IncludeMethodData --initialize-at-build-time
@@ -207,9 +207,9 @@ time ./example02_graalvm.bin < 2023_06_430_65B0_in_network_rates.json 2> /dev/nu
207207
```
208208
Expected output:
209209
```text
210-
real 1m34.353s
211-
user 1m29.199s
212-
sys 0m5.151s
210+
real 0m44.963s
211+
user 0m42.471s
212+
sys 0m2.486s
213213
```
214214

215215
You can use profile guided optimization (PGO) to improve performance of Oracle GraalVM native image, for that you need:
@@ -233,21 +233,21 @@ time ./example02_graalvm_optimized.bin < 2023_06_430_65B0_in_network_rates.json
233233
```
234234
Expected output:
235235
```text
236-
real 1m20.185s
237-
user 1m15.976s
238-
sys 0m4.207s
236+
real 0m43.674s
237+
user 0m41.213s
238+
sys 0m2.461s
239239
```
240240

241241
### Build Scala Native image, print its size, and measure its running time
242242

243243
```sh
244-
scala-cli --power package --native-version 0.5.5 --native example02.sc --native-mode release-full --native-gc none --native-lto thin --native-multithreading=false --force -o example02_native.bin
244+
scala-cli --power package --native-version 0.5.7 --native example02.sc --native-mode release-full --native-gc none --native-lto thin --native-multithreading=false --force -o example02_native.bin
245245
ls -l ./example02_native.bin
246246
time ./example02_native.bin < 2023_06_430_65B0_in_network_rates.json 2> /dev/null
247247
```
248248
Expected output:
249249
```text
250-
real 1m27.275s
251-
user 1m23.233s
252-
sys 0m4.038s
250+
real 0m49.232s
251+
user 0m46.838s
252+
sys 0m2.389s
253253
```

0 commit comments

Comments
 (0)