Skip to content

Benchmarking vector sorts with criterion-external produces NaNs for many smaller array inputs #6

@Mikah-Kainen

Description

@Mikah-Kainen

Joseph

The script used for benchmarking benchrunner on Callisto passes the benchrunner to criterion-external as follows:

os.system("criterion-external benchrunner %s Seq %d -- --csv %s.csv" % (name, size, name))

criterion-external passes benchrunner its first command line argument (iters), then passes it the following arguments up to --. Past that is the flag to criterion to instruct it to produce a .csv file. The benchrunner starts and stops a timer before doing iters many calls to the sort (on iters many pre-allocated arrays), and criterion reads from stdout.

There is an issue where many of the datapoints for the vectorsorts at lower array sizes are coming up as NaNs. This occurred with the verified sorts as well, but not nearly as severely, and the issue was resolved by attempting up to 2 reruns. Adding reruns to the vector sorts (I've tried up to 8) does not help.

Here are two examples where the problem persists:
VectorSortQuicksort_out.csv
VectorSortMergesort_out.csv


Joseph

I ran the sorts again several times on Callisto in order to investigate the problem further, and somehow did not run into the issue at all. I was able to produce full sets of data.

VectorSortInsertionsort_out3.csv
VectorSortMergesort_out3.csv
VectorSortQuicksort_out3.csv
VectorSortQuicksort_out4.csv


Artem

umh, heisenbug...

Is it possible to have possible reproduction instruction anyway? Like: "starting from a clean checkout of the repository, do X, Y, Z and you get a csv file named blah with NaNs (maybe)"...


Joseph

Instructions:

  1. Build the benchrunner
  2. Build criterion-external
  3. Run the below script with the two executables on path or in the current directory
    GitHub apparently bans uploading code files as attachments, so I'll paste below:
#!/usr/bin/env python3
import os
import numpy as np

names = ["'VectorSort Quicksort'", "'VectorSort Mergesort'", "'VectorSort Insertionsort'"]

DENSITY = 12
def bounds(name): 
    match name: 
        case "'VectorSort Insertionsort'": 
            lo = 3  # 2**n ...
            hi = 16
        case "'VectorSort Quicksort'": 
            lo = 3
            hi = 24
        case "'VectorSort Mergesort'": 
            lo = 3
            hi = 24
    return lo, hi, (hi-lo)*DENSITY+1

def sanitize_name(name): 
    return name.replace('"', '').replace("'", '').replace(" ", '')

def dotrial(name, size):
    safe_name = sanitize_name(name)
    os.system('criterion-external benchrunner "%s" Seq %d -- --csv %s.csv' % (name, size, safe_name))
    with open("%s.csv" % safe_name, "r") as f:
        tmp = np.loadtxt(f, delimiter=",", skiprows=1, usecols=(1,4))
        os.remove("%s.csv" % safe_name)
        return tmp

def dotrial_robust(name, size): 
    for _ in range(3**2): 
        t = dotrial(name, size)
        s = tuple(t.tolist())
        if len(s) == 2: 
            return t
    return None
    

if __name__ == "__main__": 
    for name in names: 
        safe_name = sanitize_name(name)
        lo, hi, pts = bounds(name)
        with open("%s_out.csv" % safe_name, "w") as f: 
            f.write("# size\tmean\tstddev\n")
        for i in np.logspace(lo, hi, pts, base=2): 
            with open("%s_out.csv" % safe_name, "a") as f: 
                try: 
                    f.write("%d" % int(i) + "\t%f\t%f\n" % tuple(dotrial_robust(name, i).tolist()))
                except: 
                    pass

The NaNs appear in "<name>_out.csv". You may be able to get more of them by replacing dotrail_robust with dotrial.


Copied from https://github.com/michaelborkowski/lh-array-sort/issues/28.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions