5
5
Here you will learn how to build and run the simplest example and measure the startup time on Linux using
6
6
Debian packages
7
7
8
- ### Build uber jar, print its size, and measure its start up time (tested with Oracle GraalVM 24 -dev)
8
+ ### Build uber jar, print its size, and measure its start up time (tested with Oracle GraalVM 25 -dev)
9
9
10
10
``` sh
11
11
sudo apt install linux-tools-common linux-tools-generic
@@ -18,23 +18,23 @@ Expected output:
18
18
``` text
19
19
Performance counter stats for './example01.jar' (100 runs):
20
20
21
- 275.05 msec task-clock # 2.067 CPUs utilized ( +- 0.19 % )
22
- 1,970 context-switches # 7.162 K/sec ( +- 0.48 % )
23
- 15 cpu-migrations # 54.536 /sec ( +- 2.23 % )
24
- 22,159 page-faults # 80.565 K/sec ( +- 0.20 % )
25
- 1,101,372,203 cycles # 4.004 GHz ( +- 0.20% )
26
- 1,504,873,660 instructions # 1.37 insn per cycle ( +- 0.10% )
27
- 297,952,673 branches # 1.083 G/sec ( +- 0.10% )
28
- 9,110,277 branch-misses # 3.06% of all branches ( +- 0.11% )
29
- TopdownL1 # 14.1 % tma_backend_bound
30
- # 35.6 % tma_bad_speculation
31
- # 30.0 % tma_frontend_bound
32
- # 20.3 % tma_retiring ( +- 0.13% )
21
+ 202.75 msec task-clock # 2.286 CPUs utilized ( +- 0.27 % )
22
+ 2,191 context-switches # 10.806 K/sec ( +- 1.53 % )
23
+ 39 cpu-migrations # 192.356 /sec ( +- 2.49 % )
24
+ 24,775 page-faults # 122.195 K/sec ( +- 0.27 % )
25
+ 303,999,880 cpu_atom/ cycles/ # 1.499 GHz ( +- 3.51% ) (71.71% )
26
+ 1,005,678,639 cpu_core/cycles/ # 4.960 GHz ( +- 0.44% ) (77.24% )
27
+ 431,508,692 cpu_atom/instructions/ # 1.42 insn per cycle ( +- 3.69% ) (71.71% )
28
+ 1,684,651,998 cpu_core/instructions/ # 5.54 insn per cycle ( +- 0.47% ) (77.24% )
29
+ 83,740,906 cpu_atom/branches/ # 413.027 M/sec ( +- 3.78% ) (71.71%)
30
+ 335,230,949 cpu_core/branches/ # 1.653 G/sec ( +- 0.49% ) (77.24%)
31
+ 2,017,405 cpu_atom/branch-misses/ # 2.41% of all branches ( +- 4.20% ) (71.71%)
32
+ 9,450,892 cpu_core/branch-misses/ # 11.29% of all branches ( +- 0.47% ) (77.24% )
33
33
34
- 0.133067 +- 0.000190 seconds time elapsed ( +- 0.14 % )
34
+ 0.088700 +- 0.000135 seconds time elapsed ( +- 0.15 % )
35
35
```
36
36
37
- ### Build Scala JS output, print its size, and measure its start up time with ` node `
37
+ ### Build Scala JS output, print its size, and measure its start up time with ` node ` (tested with node.js 22)
38
38
39
39
``` sh
40
40
curl https://raw.githubusercontent.com/creationix/nvm/master/install.sh | bash
@@ -49,23 +49,23 @@ Expected output:
49
49
``` text
50
50
Performance counter stats for 'node ./example01.js' (100 runs):
51
51
52
- 21.58 msec task-clock # 0.996 CPUs utilized ( +- 0.16 % )
53
- 19 context-switches # 880.402 /sec ( +- 1.65 % )
54
- 0 cpu-migrations # 0.000 /sec
55
- 2,705 page-faults # 125.341 K/sec ( +- 0.04 % )
56
- 94,278,347 cycles # 4.369 GHz ( +- 0.14% )
57
- 168,171,694 instructions # 1.78 insn per cycle ( +- 0.01% )
58
- 28,793,274 branches # 1.334 G/sec ( +- 0.02% )
59
- 580,994 branch-misses # 2.02% of all branches ( +- 0.06% )
60
- TopdownL1 # 17.7 % tma_backend_bound
61
- # 14.4 % tma_bad_speculation
62
- # 31.8 % tma_frontend_bound
63
- # 36.2 % tma_retiring ( +- 0.14% )
52
+ 14.90 msec task-clock # 1.014 CPUs utilized ( +- 0.30 % )
53
+ 20 context-switches # 1.343 K /sec ( +- 1.60 % )
54
+ 2 cpu-migrations # 134.254 /sec ( +- 5.08% )
55
+ 2,748 page-faults # 184.466 K/sec ( +- 0.00 % )
56
+ 45,371,808 cpu_atom/ cycles/ # 3.046 GHz ( +- 2.61% ) (20.12% )
57
+ 75,772,761 cpu_core/cycles/ # 5.086 GHz ( +- 0.32% ) (85.22% )
58
+ 97,892,289 cpu_atom/instructions/ # 2.16 insn per cycle ( +- 3.38% ) (20.12% )
59
+ 165,775,873 cpu_core/instructions/ # 3.65 insn per cycle ( +- 0.29% ) (85.22% )
60
+ 16,573,542 cpu_atom/branches/ # 1.113 G/sec ( +- 3.35% ) (20.12%)
61
+ 28,245,004 cpu_core/branches/ # 1.896 G/sec ( +- 0.27% ) (85.22%)
62
+ 97,278 cpu_atom/branch-misses/ # 0.59% of all branches ( +- 11.08% ) (20.12%)
63
+ 573,362 cpu_core/branch-misses/ # 3.46% of all branches ( +- 0.72% ) (85.22% )
64
64
65
- 0.0216665 +- 0.0000338 seconds time elapsed ( +- 0.16 % )
65
+ 0.0146864 +- 0.0000435 seconds time elapsed ( +- 0.30 % )
66
66
```
67
67
68
- ### Build Scala JS Wasm output, print its size, and measure its start up time with ` node `
68
+ ### Build Scala JS Wasm output, print its size, and measure its start up time with ` node ` (tested with node.js 22)
69
69
70
70
``` sh
71
71
curl https://raw.githubusercontent.com/creationix/nvm/master/install.sh | bash
@@ -81,78 +81,78 @@ Expected output:
81
81
``` text
82
82
Performance counter stats for 'node --experimental-wasm-exnref ./example01.js/main.js' (100 runs):
83
83
84
- 33.51 msec task-clock # 0.997 CPUs utilized ( +- 0.15 % )
85
- 66 context-switches # 1.970 K/sec ( +- 0.44 % )
86
- 0 cpu-migrations # 0.000 /sec
87
- 3,102 page-faults # 92.567 K/sec ( +- 0.04 % )
88
- 146,180,631 cycles # 4.362 GHz ( +- 0.13% )
89
- 239,361,867 instructions # 1.64 insn per cycle ( +- 0.01 % )
90
- 41,631,362 branches # 1.242 G/sec ( +- 0.01% )
91
- 1,112,970 branch-misses # 2.67% of all branches ( +- 0.04 % )
92
- TopdownL1 # 14.6 % tma_backend_bound
93
- # 23.2 % tma_bad_speculation
94
- # 32.0 % tma_frontend_bound
95
- # 30.3 % tma_retiring ( +- 0.12 % )
84
+ 23.36 msec task-clock # 1.036 CPUs utilized ( +- 0.23 % )
85
+ 61 context-switches # 2.611 K/sec ( +- 0.47 % )
86
+ 3 cpu-migrations # 128.413 /sec ( +- 4.53% )
87
+ 3,156 page-faults # 135.090 K/sec ( +- 0.01 % )
88
+ 81,924,942 cpu_atom/ cycles/ # 3.507 GHz ( +- 0.93% ) (10.81% )
89
+ 115,425,581 cpu_core/cycles/ # 4.941 GHz ( +- 0.30 % )
90
+ 151,891,461 cpu_atom/instructions/ # 1.85 insn per cycle ( +- 1.65% ) (10.81% )
91
+ 228,699,784 cpu_core/instructions/ # 2.79 insn per cycle ( +- 0.23 % )
92
+ 26,745,975 cpu_atom/branches/ # 1.145 G/sec ( +- 1.49% ) (10.81%)
93
+ 39,592,483 cpu_core/branches/ # 1.695 G/sec ( +- 0.24% )
94
+ 459,225 cpu_atom/branch-misses/ # 1.72% of all branches ( +- 1.76% ) (10.81%)
95
+ 1,062,312 cpu_core/branch-misses/ # 3.97% of all branches ( +- 0.30 % )
96
96
97
- 0.0336197 +- 0.0000488 seconds time elapsed ( +- 0.15 % )
97
+ 0.0225519 +- 0.0000530 seconds time elapsed ( +- 0.23 % )
98
98
```
99
99
100
- ### Build GraalVM native image, print its size, and measure its start up time
100
+ ### Build GraalVM native image, print its size, and measure its start up time (tested with Oracle GraalVM 24)
101
101
102
102
``` sh
103
103
sudo apt install linux-tools-common linux-tools-generic gcc zlib1g-dev
104
104
sudo sysctl kernel.perf_event_paranoid=1
105
- scala-cli --power package --graalvm-jvm-id graalvm-oracle:23 --native-image example01.sc --force -o example01_graalvm.bin -- --no-fallback -O3 -H:+UnlockExperimentalVMOptions -R:MaxHeapSize=16m -H:-GenLoopSafepoints -H:-ParseRuntimeOptions -H:-IncludeMethodData --initialize-at-build-time
105
+ scala-cli --power package --graalvm-jvm-id graalvm-oracle:24 --native-image example01.sc --force -o example01_graalvm.bin -- --no-fallback -O3 -H:+UnlockExperimentalVMOptions -R:MaxHeapSize=16m -H:-GenLoopSafepoints -H:-ParseRuntimeOptions -H:-IncludeMethodData --initialize-at-build-time
106
106
ls -l ./example01_graalvm.bin
107
107
perf stat -r 100 ./example01_graalvm.bin > /dev/null
108
108
```
109
109
Expected output:
110
110
``` text
111
111
Performance counter stats for './example01_graalvm.bin' (100 runs):
112
112
113
- 1.54 msec task-clock # 0.929 CPUs utilized ( +- 2.17 % )
114
- 1 context-switches # 647.939 /sec ( +- 2.93 % )
113
+ 1.08 msec task-clock # 0.911 CPUs utilized ( +- 0.73 % )
114
+ 1 context-switches # 927.547 /sec ( +- 4.79 % )
115
115
0 cpu-migrations # 0.000 /sec
116
- 535 page-faults # 346.647 K/sec ( +- 0.01% )
117
- 6,194,402 cycles # 4.014 GHz ( +- 0.32% )
118
- 8,776,339 instructions # 1.42 insn per cycle ( +- 0.04 % )
119
- 1,661,644 branches # 1.077 G/sec ( +- 0.03% )
120
- 18,496 branch-misses # 1.11% of all branches ( +- 0.87 % )
121
- TopdownL1 # 32.3 % tma_backend_bound
122
- # 6.6 % tma_bad_speculation
123
- # 30.4 % tma_frontend_bound
124
- # 30.7 % tma_retiring ( +- 0.31 % )
116
+ 477 page-faults # 442.440 K/sec ( +- 0.01% )
117
+ 4,315,458 cpu_atom/ cycles/ # 4.003 GHz ( +- 1.85% ) (19.75% )
118
+ 3,872,466 cpu_core/cycles/ # 3.592 GHz ( +- 5.11 % )
119
+ 10,829,542 cpu_atom/instructions/ # 2.51 insn per cycle ( +- 2.48% ) (19.75% )
120
+ 5,351,563 cpu_core/instructions/ # 1.24 insn per cycle ( +- 5.21 % )
121
+ 1,856,118 cpu_atom/branches/ # 1.722 G/sec ( +- 2.33% ) (19.75%)
122
+ 1,000,400 cpu_core/branches/ # 927.918 M/sec ( +- 5.18% )
123
+ 7,651 cpu_atom/branch-misses/ # 0.41% of all branches ( +- 6.90% ) (19.75%)
124
+ 11,792 cpu_core/branch-misses/ # 0.64% of all branches ( +- 5.28 % )
125
125
126
- 0.0016622 +- 0.0000371 seconds time elapsed ( +- 2.23 % )
126
+ 0.00118361 +- 0.00000667 seconds time elapsed ( +- 0.56 % )
127
127
```
128
128
129
- ### Build Scala Native image, print its size, and measure its start up time
129
+ ### Build Scala Native image, print its size, and measure its start up time (tested with Scala Native 0.5.7)
130
130
131
131
``` sh
132
132
sudo apt install linux-tools-common linux-tools-generic clang libstdc++-12-dev libgc-dev
133
133
sudo sysctl kernel.perf_event_paranoid=1
134
- scala-cli --power package --native-version 0.5.5 --native example01.sc --native-mode release-full --force -o example01_native.bin
134
+ scala-cli --power package --native-version 0.5.7 --native example01.sc --native-mode release-full --force -o example01_native.bin
135
135
ls -l ./example01_native.bin
136
136
perf stat -r 100 ./example01_native.bin > /dev/null
137
137
```
138
138
Expected output:
139
139
``` text
140
140
Performance counter stats for './example01_native.bin' (100 runs):
141
141
142
- 0.80 msec task-clock # 0.823 CPUs utilized ( +- 0.41 % )
142
+ 0.71 msec task-clock # 0.855 CPUs utilized ( +- 0.64 % )
143
143
0 context-switches # 0.000 /sec
144
144
0 cpu-migrations # 0.000 /sec
145
- 246 page-faults # 306.115 K /sec ( +- 0.04 % )
146
- 3,345,318 cycles # 4.163 GHz ( +- 0.49% )
147
- 5,077,992 instructions # 1.52 insn per cycle ( +- 0.05 % )
148
- 924,515 branches # 1.150 G/sec ( +- 0.05% )
149
- 16,240 branch-misses # 1.76% of all branches ( +- 0.44 % )
150
- TopdownL1 # 26.8 % tma_backend_bound
151
- # 11.7 % tma_bad_speculation
152
- # 29.4 % tma_frontend_bound
153
- # 32.1 % tma_retiring ( +- 0.48 % )
145
+ 731 page-faults # 1.024 M /sec ( +- 0.01 % )
146
+ <not counted> cpu_atom/ cycles/ ( +-100.00% ) (0.00% )
147
+ 3,769,012 cpu_core/cycles/ # 5.280 GHz ( +- 1.17 % )
148
+ <not counted> cpu_atom/instructions/ ( +-100.00% ) (0.00% )
149
+ 5,907,572 cpu_core/instructions/ # 139.11 insn per cycle ( +- 1.02 % )
150
+ <not counted> cpu_atom/branches/ ( +-100.01% ) (0.00%)
151
+ 1,055,882 cpu_core/branches/ # 1.479 G/sec ( +- 1.02% )
152
+ <not counted> cpu_atom/branch-misses/ ( +-100.62% ) (0.00%)
153
+ 9,104 cpu_core/branch-misses/ # 84.85% of all branches ( +- 1.35 % )
154
154
155
- 0.00097599 +- 0.00000574 seconds time elapsed ( +- 0.59 % )
155
+ 0.00083490 +- 0.00000519 seconds time elapsed ( +- 0.62 % )
156
156
```
157
157
158
158
## RFC-8259 validation (example02)
@@ -175,16 +175,16 @@ gunzip 2023_06_430_65B0_in_network_rates.json.gz
175
175
Speed of validation highly depends on performance of file reading on the particular disk system. Use the following command
176
176
to measure reading speed on Linux:
177
177
``` sh
178
- sudo hdparm -Tt /dev/sda
178
+ sudo hdparm -Tt /dev/nvme0n1p4
179
179
```
180
180
Here is an example of expected output:
181
181
``` text
182
- /dev/sda :
183
- Timing cached reads: 45532 MB in 2.00 seconds = 22806.82 MB/sec
184
- Timing buffered disk reads: 2180 MB in 3.04 seconds = 716.07 MB/sec
182
+ /dev/nvme0n1p4 :
183
+ Timing cached reads: 48588 MB in 2.00 seconds = 24334.91 MB/sec
184
+ Timing buffered disk reads: 4456 MB in 3.00 seconds = 1485.08 MB/sec
185
185
```
186
186
187
- ### Build uber jar, print its size, and measure its running time (tested with Oracle GraalVM 24 -dev)
187
+ ### Build uber jar, print its size, and measure its running time (tested with Oracle GraalVM 25 -dev)
188
188
189
189
``` sh
190
190
scala-cli --power package --assembly example02.sc --force -o example02.jar
@@ -193,12 +193,12 @@ time ./example02.jar -J-XX:+UnlockExperimentalVMOptions -J-XX:+UseEpsilonGC -J-X
193
193
```
194
194
Expected output:
195
195
``` text
196
- real 1m8.991s
197
- user 1m5.239s
198
- sys 0m4.102s
196
+ real 0m37.615s
197
+ user 0m35.187s
198
+ sys 0m2.662s
199
199
```
200
200
201
- ### Build GraalVM native image, print its size, and measure its running time
201
+ ### Build GraalVM native image, print its size, and measure its running time (tested with Oracle GraalVM 23)
202
202
203
203
``` sh
204
204
scala-cli --power package --graalvm-jvm-id graalvm-oracle:23 --native-image example02.sc --force -o example02_graalvm.bin -- --no-fallback --gc=epsilon -O3 -H:+UnlockExperimentalVMOptions -R:MaxHeapSize=16m -H:-GenLoopSafepoints -H:-ParseRuntimeOptions -H:-IncludeMethodData --initialize-at-build-time
@@ -207,9 +207,9 @@ time ./example02_graalvm.bin < 2023_06_430_65B0_in_network_rates.json 2> /dev/nu
207
207
```
208
208
Expected output:
209
209
``` text
210
- real 1m34.353s
211
- user 1m29.199s
212
- sys 0m5.151s
210
+ real 0m44.963s
211
+ user 0m42.471s
212
+ sys 0m2.486s
213
213
```
214
214
215
215
You can use profile guided optimization (PGO) to improve performance of Oracle GraalVM native image, for that you need:
@@ -233,21 +233,21 @@ time ./example02_graalvm_optimized.bin < 2023_06_430_65B0_in_network_rates.json
233
233
```
234
234
Expected output:
235
235
``` text
236
- real 1m20.185s
237
- user 1m15.976s
238
- sys 0m4.207s
236
+ real 0m43.674s
237
+ user 0m41.213s
238
+ sys 0m2.461s
239
239
```
240
240
241
241
### Build Scala Native image, print its size, and measure its running time
242
242
243
243
``` sh
244
- scala-cli --power package --native-version 0.5.5 --native example02.sc --native-mode release-full --native-gc none --native-lto thin --native-multithreading=false --force -o example02_native.bin
244
+ scala-cli --power package --native-version 0.5.7 --native example02.sc --native-mode release-full --native-gc none --native-lto thin --native-multithreading=false --force -o example02_native.bin
245
245
ls -l ./example02_native.bin
246
246
time ./example02_native.bin < 2023_06_430_65B0_in_network_rates.json 2> /dev/null
247
247
```
248
248
Expected output:
249
249
``` text
250
- real 1m27.275s
251
- user 1m23.233s
252
- sys 0m4.038s
250
+ real 0m49.232s
251
+ user 0m46.838s
252
+ sys 0m2.389s
253
253
```
0 commit comments