-
Notifications
You must be signed in to change notification settings - Fork 0
Description
Joseph
The script used for benchmarking benchrunner on Callisto passes the benchrunner to criterion-external as follows:
os.system("criterion-external benchrunner %s Seq %d -- --csv %s.csv" % (name, size, name))
criterion-external passes benchrunner its first command line argument (iters), then passes it the following arguments up to --. Past that is the flag to criterion to instruct it to produce a .csv file. The benchrunner starts and stops a timer before doing iters many calls to the sort (on iters many pre-allocated arrays), and criterion reads from stdout.
There is an issue where many of the datapoints for the vectorsorts at lower array sizes are coming up as NaNs. This occurred with the verified sorts as well, but not nearly as severely, and the issue was resolved by attempting up to 2 reruns. Adding reruns to the vector sorts (I've tried up to 8) does not help.
Here are two examples where the problem persists:
VectorSortQuicksort_out.csv
VectorSortMergesort_out.csv
Joseph
I ran the sorts again several times on Callisto in order to investigate the problem further, and somehow did not run into the issue at all. I was able to produce full sets of data.
VectorSortInsertionsort_out3.csv
VectorSortMergesort_out3.csv
VectorSortQuicksort_out3.csv
VectorSortQuicksort_out4.csv
Artem
umh, heisenbug...
Is it possible to have possible reproduction instruction anyway? Like: "starting from a clean checkout of the repository, do X, Y, Z and you get a csv file named blah with NaNs (maybe)"...
Joseph
Instructions:
- Build the benchrunner
- Build criterion-external
- Run the below script with the two executables on path or in the current directory
GitHub apparently bans uploading code files as attachments, so I'll paste below:
#!/usr/bin/env python3
import os
import numpy as np
names = ["'VectorSort Quicksort'", "'VectorSort Mergesort'", "'VectorSort Insertionsort'"]
DENSITY = 12
def bounds(name):
match name:
case "'VectorSort Insertionsort'":
lo = 3 # 2**n ...
hi = 16
case "'VectorSort Quicksort'":
lo = 3
hi = 24
case "'VectorSort Mergesort'":
lo = 3
hi = 24
return lo, hi, (hi-lo)*DENSITY+1
def sanitize_name(name):
return name.replace('"', '').replace("'", '').replace(" ", '')
def dotrial(name, size):
safe_name = sanitize_name(name)
os.system('criterion-external benchrunner "%s" Seq %d -- --csv %s.csv' % (name, size, safe_name))
with open("%s.csv" % safe_name, "r") as f:
tmp = np.loadtxt(f, delimiter=",", skiprows=1, usecols=(1,4))
os.remove("%s.csv" % safe_name)
return tmp
def dotrial_robust(name, size):
for _ in range(3**2):
t = dotrial(name, size)
s = tuple(t.tolist())
if len(s) == 2:
return t
return None
if __name__ == "__main__":
for name in names:
safe_name = sanitize_name(name)
lo, hi, pts = bounds(name)
with open("%s_out.csv" % safe_name, "w") as f:
f.write("# size\tmean\tstddev\n")
for i in np.logspace(lo, hi, pts, base=2):
with open("%s_out.csv" % safe_name, "a") as f:
try:
f.write("%d" % int(i) + "\t%f\t%f\n" % tuple(dotrial_robust(name, i).tolist()))
except:
passThe NaNs appear in "<name>_out.csv". You may be able to get more of them by replacing dotrail_robust with dotrial.
Copied from https://github.com/michaelborkowski/lh-array-sort/issues/28.