Skip to content

Commit 178a668

Browse files
committed
bst vs heap: move in fully from cpp-cheat
1 parent d37344a commit 178a668

File tree

7 files changed

+266
-91
lines changed

7 files changed

+266
-91
lines changed

README.adoc

Lines changed: 69 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -975,7 +975,7 @@ This setup:
975975
+
976976
--
977977
** can run most examples, including those for other CPU architectures, with the notable exception of examples that rely on kernel modules
978-
** can run reproducible approximate performance experiments with gem5, see e.g. <<bst-vs-heap>>
978+
** can run reproducible approximate performance experiments with gem5, see e.g. <<bst-vs-heap-vs-hashmap>>
979979
--
980980
* from full system simulation as shown at: <<qemu-buildroot-setup-getting-started>>.
981981
+
@@ -9961,7 +9961,7 @@ Now you can play a fun little game with your friends:
99619961
* make a program that solves the computation problem, and outputs output to stdout
99629962
* write the code that runs the correct computation in the smallest number of cycles possible
99639963

9964-
To find out why your program is slow, a good first step is to have a look at <<stats-txt>> file.
9964+
To find out why your program is slow, a good first step is to have a look at <<gem5-stats-txt>> file.
99659965

99669966
==== Skip extra benchmark instructions
99679967

@@ -10210,36 +10210,79 @@ Buildroot built-in libraries, mostly under Libraries > Other:
1021010210

1021110211
There are not yet enabled, but it should be easy to so, see: <<add-new-buildroot-packages>>
1021210212

10213-
===== BST vs heap
10213+
===== BST vs heap vs hashmap
1021410214

10215-
https://stackoverflow.com/questions/6147242/heap-vs-binary-search-tree-bst/29548834#29548834
10215+
The following benchmark setup works both:
1021610216

10217-
First we build it with <<m5ops-instructions>> enabled, and then we extract the stats:
10217+
* on host through timers + link:https://stackoverflow.com/questions/51952471/why-do-i-get-a-constant-instead-of-logarithmic-curve-for-an-insert-time-benchmar/51953081#51953081[granule]
10218+
* gem5 with <<m5ops-instructions,dumpstats>>, which can get more precise results with `granule == 1`
10219+
10220+
It has been used to answer:
10221+
10222+
* BST vs heap: https://stackoverflow.com/questions/6147243/heap-vs-binary-search-tree-bst/29548834#29548834
10223+
* `std::set`: https://stackoverflow.com/questions/2558153/what-is-the-underlying-data-structure-of-a-stl-set-in-c/51944661#51944661
10224+
* `std::map`: https://stackoverflow.com/questions/18414579/what-data-structure-is-inside-stdmap-in-c/51945119#51945119
10225+
10226+
To benchmark on the host, we do:
10227+
10228+
....
10229+
./build-userland-in-tree --force-rebuild --optimization-level 3 ./userland/cpp/bst_vs_heap_vs_hashmap.cpp
10230+
./userland/cpp/bst_vs_heap_vs_hashmap.out | tee bst_vs_heap_vs_hashmap.dat
10231+
gnuplot \
10232+
-e 'input_noext="bst_vs_heap_vs_hashmap"' \
10233+
-e 'heap_zoom_max=50' \
10234+
-e 'hashmap_zoom_max=400' \
10235+
./bst-vs-heap-vs-hashmap.gnuplot \
10236+
;
10237+
xdg-open bst_vs_heap_vs_hashmap.tmp.png
10238+
....
10239+
10240+
The parameters `heap_zoom_max` and `hashmap_zoom_max` are chosen manually interactively to best showcase the regions of interest in those plots.
10241+
10242+
First we build the benchmark with <<m5ops-instructions>> enabled, and then we run it and extract the stats:
1021810243

1021910244
....
1022010245
./build-userland \
10221-
--arch aarch64 \
10246+
--arch x86_64 \
1022210247
--ccflags='-DLKMC_M5OPS_ENABLE=1' \
10223-
--force-rebuild cpp/bst_vs_heap \
10248+
--force-rebuild userland/cpp/bst_vs_heap_vs_hashmap.cpp \
1022410249
--static \
10250+
--optimization-level 3 \
1022510251
;
1022610252
./run \
10227-
--arch aarch64 \
10253+
--arch x86_64 \
1022810254
--emulator gem5 \
1022910255
--static \
10230-
--userland userland/cpp/bst_vs_heap.cpp \
10231-
--userland-args='1000' \
10256+
--userland userland/cpp/bst_vs_heap_vs_hashmap.cpp \
10257+
--userland-args='100000' \
10258+
-- \
10259+
--cpu-type=DerivO3CPU \
10260+
--caches \
10261+
--l2cache \
10262+
--l1d_size=32kB \
10263+
--l1i_size=32kB \
10264+
--l2_size=256kB \
10265+
--l3_size=20MB \
1023210266
;
10233-
./bst-vs-heap --arch aarch64 > bst_vs_heap.dat
10234-
./bst-vs-heap.gnuplot
10235-
xdg-open bst-vs-heap.tmp.png
10267+
./bst-vs-heap-vs-hashmap-gem5-stats --arch x86_64 | tee bst_vs_heap_vs_hashmap_gem5.dat
10268+
gnuplot \
10269+
-e 'input_noext="bst_vs_heap_vs_hashmap_gem5"' \
10270+
-e 'heap_zoom_max=500' \
10271+
-e 'hashmap_zoom_max=400' \
10272+
./bst-vs-heap-vs-hashmap.gnuplot \
10273+
;
10274+
xdg-open bst_vs_heap_vs_hashmap_gem5.tmp.png
1023610275
....
1023710276

10277+
The cache sizes were chosen to match the host <<p51>> to improve the comparison. Ideally we sould also use the same standard library.
10278+
10279+
Note that this will take a long time, and will produce a humongous ~40Gb stats file due to: <<gem5-only-dump-selected-stats>>
10280+
1023810281
Sources:
1023910282

10240-
* link:userland/cpp/bst_vs_heap.cpp[]
10241-
* link:bst-vs-heap[]
10242-
* link:bst-vs-heap.gnuplot[]
10283+
* link:userland/cpp/bst_vs_heap_vs_hashmap.cpp[]
10284+
* link:bst-vs-heap-vs-hashmap-gem5-stats[]
10285+
* link:bst-vs-heap-vs-hashmap.gnuplot[]
1024310286

1024410287
===== BLAS
1024510288

@@ -11110,7 +11153,7 @@ Contains UART output, both from the Linux kernel or from the baremetal system.
1111011153

1111111154
Can also be seen live on <<m5term>>.
1111211155

11113-
==== stats.txt
11156+
==== gem5 stats.txt
1111411157

1111511158
This file contains important statistics about the run:
1111611159

@@ -11136,6 +11179,14 @@ system.cpu.dtb.inst_hits
1113611179

1113711180
For x86, it is interesting to try and correlate `numCycles` with:
1113811181

11182+
===== gem5 only dump selected stats
11183+
11184+
TODO
11185+
11186+
https://stackoverflow.com/questions/52014953/how-to-dump-only-a-single-or-certain-selected-stats-in-gem5
11187+
11188+
To prevent the stats file from becoming humongous.
11189+
1113911190
==== config.ini
1114011191

1114111192
The `config.ini` file, contains a very good high level description of the system:
@@ -12974,7 +13025,7 @@ RDTSC stores its output to EDX:EAX, even in 64-bit mode, top bits are zeroed out
1297413025

1297513026
TODO: review this section, make a more controlled userland experiment with <<m5ops>> instrumentation.
1297613027

12977-
Let's have some fun and try to correlate the gem5 <<stats-txt>> `system.cpu.numCycles` cycle count with the link:https://en.wikipedia.org/wiki/Time_Stamp_Counter[x86 RDTSC instruction] that is supposed to do the same thing:
13028+
Let's have some fun and try to correlate the gem5 <<gem5-stats-txt>> `system.cpu.numCycles` cycle count with the link:https://en.wikipedia.org/wiki/Time_Stamp_Counter[x86 RDTSC instruction] that is supposed to do the same thing:
1297813029

1297913030
....
1298013031
./build-userland --static userland/arch/x86_64/inline_asm/rdtsc.S

bst-vs-heap renamed to bst-vs-heap-vs-hashmap-gem5-stats

Lines changed: 10 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -18,13 +18,19 @@ Convert a BST vs heap stat file into a gnuplot input
1818
stats = self.get_stats()
1919
it = iter(stats)
2020
i = 1
21-
for stat in it:
21+
for heap_num_cycles in it:
2222
try:
23-
next_stat = next(it)
23+
bst_num_cycles = next(it)
24+
hashmap_num_cycles = next(it)
2425
except StopIteration:
25-
# Automatic dumpstats at end may lead to odd number of stats.
26+
# Automatic dumpstats at end may lead to one extra stat at the end.
2627
break
27-
print('{} {} {}'.format(i, stat, next_stat))
28+
print('{} {} {} {}'.format(
29+
i,
30+
heap_num_cycles,
31+
bst_num_cycles,
32+
hashmap_num_cycles,
33+
))
2834
i += 1
2935

3036
if __name__ == '__main__':

bst-vs-heap-vs-hashmap.gnuplot

Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,27 @@
1+
#!/usr/bin/env gnuplot
2+
3+
set terminal png noenhanced size 800, 1400
4+
set output input_noext . ".tmp.png"
5+
set multiplot layout 5,1 title "\nC++ Heap vs BST vs Hash map insert time" font ",22"
6+
set xlabel "container size"
7+
set ylabel "insert time (ns)"
8+
set title font ",16"
9+
10+
set title "Heap (std::priority_queue)"
11+
plot input_noext . ".dat" using 1:2 notitle
12+
13+
set title "Heap (zoom)"
14+
set yrange [0:heap_zoom_max]
15+
plot input_noext . ".dat" using 1:2 notitle
16+
17+
set title "BST (std::set)"
18+
set yrange [*:*]
19+
plot input_noext . ".dat" using 1:3 notitle
20+
21+
set title "Hash map (std::unordered_set)"
22+
set yrange [*:*]
23+
plot input_noext . ".dat" using 1:4 notitle
24+
25+
set title "Hash map zoom"
26+
set yrange [0:hashmap_zoom_max]
27+
plot input_noext . ".dat" using 1:4 notitle

bst-vs-heap.gnuplot

Lines changed: 0 additions & 25 deletions
This file was deleted.

path_properties.py

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -474,6 +474,12 @@ def get(path):
474474
'return2.c': {'exit_status': 2},
475475
}
476476
),
477+
'cpp': (
478+
{},
479+
{
480+
'bst_vs_heap_vs_hashmap.cpp': {'more_than_1s': True},
481+
},
482+
),
477483
'gcc': (
478484
gnu_extension_properties,
479485
{

userland/cpp/bst_vs_heap.cpp

Lines changed: 0 additions & 44 deletions
This file was deleted.

0 commit comments

Comments
 (0)