Benchmarking vector sorts with `criterion-external` produces `NaNs` for many smaller array inputs

_Joseph_

The [script used for benchmarking](https://github.com/michaelborkowski/lh-array-sort/blob/a19a1886b9b0b66230fe45f3afe6fcad189dd687/benchmarks/scripts/sweep_seq.py) `benchrunner` on Callisto passes the benchrunner to `criterion-external` as follows: 
```
os.system("criterion-external benchrunner %s Seq %d -- --csv %s.csv" % (name, size, name))
```
`criterion-external` passes benchrunner its first command line argument (iters), then passes it the following arguments up to `--`. Past that is the flag to criterion to instruct it to produce a `.csv` file. The benchrunner starts and stops a timer before doing `iters` many calls to the sort (on `iters` many pre-allocated arrays), and criterion reads from `stdout`. 

There is an issue where many of the datapoints for the vectorsorts at lower array sizes are coming up as NaNs. This occurred with the verified sorts as well, but not nearly as severely, and the issue was resolved by attempting up to 2 reruns. Adding reruns to the vector sorts (I've tried up to 8) does not help. 

Here are two examples where the problem persists: 
[VectorSortQuicksort_out.csv](https://github.com/user-attachments/files/19105546/VectorSortQuicksort_out.csv)
[VectorSortMergesort_out.csv](https://github.com/user-attachments/files/19105545/VectorSortMergesort_out.csv)

---

_Joseph_

I ran the sorts again several times on Callisto in order to investigate the problem further, and somehow did not run into the issue at all. I was able to produce full sets of data. 

[VectorSortInsertionsort_out3.csv](https://github.com/user-attachments/files/19108175/VectorSortInsertionsort_out3.csv)
[VectorSortMergesort_out3.csv](https://github.com/user-attachments/files/19108173/VectorSortMergesort_out3.csv)
[VectorSortQuicksort_out3.csv](https://github.com/user-attachments/files/19108172/VectorSortQuicksort_out3.csv)
[VectorSortQuicksort_out4.csv](https://github.com/user-attachments/files/19108174/VectorSortQuicksort_out4.csv)

---

_Artem_

umh, heisenbug...

Is it possible to have possible reproduction instruction anyway? Like: "starting from a clean checkout of the repository, do X, Y, Z and you get a csv file named blah with NaNs (maybe)"...

---

_Joseph_

Instructions: 

1. Build the benchrunner
2. Build [criterion-external](https://github.com/rrnewton/criterion-external)
3. Run the below script with the two executables on path or in the current directory
GitHub apparently bans uploading code files as attachments, so I'll paste below: 
```python
#!/usr/bin/env python3
import os
import numpy as np

names = ["'VectorSort Quicksort'", "'VectorSort Mergesort'", "'VectorSort Insertionsort'"]

DENSITY = 12
def bounds(name): 
    match name: 
        case "'VectorSort Insertionsort'": 
            lo = 3  # 2**n ...
            hi = 16
        case "'VectorSort Quicksort'": 
            lo = 3
            hi = 24
        case "'VectorSort Mergesort'": 
            lo = 3
            hi = 24
    return lo, hi, (hi-lo)*DENSITY+1

def sanitize_name(name): 
    return name.replace('"', '').replace("'", '').replace(" ", '')

def dotrial(name, size):
    safe_name = sanitize_name(name)
    os.system('criterion-external benchrunner "%s" Seq %d -- --csv %s.csv' % (name, size, safe_name))
    with open("%s.csv" % safe_name, "r") as f:
        tmp = np.loadtxt(f, delimiter=",", skiprows=1, usecols=(1,4))
        os.remove("%s.csv" % safe_name)
        return tmp

def dotrial_robust(name, size): 
    for _ in range(3**2): 
        t = dotrial(name, size)
        s = tuple(t.tolist())
        if len(s) == 2: 
            return t
    return None
    

if __name__ == "__main__": 
    for name in names: 
        safe_name = sanitize_name(name)
        lo, hi, pts = bounds(name)
        with open("%s_out.csv" % safe_name, "w") as f: 
            f.write("# size\tmean\tstddev\n")
        for i in np.logspace(lo, hi, pts, base=2): 
            with open("%s_out.csv" % safe_name, "a") as f: 
                try: 
                    f.write("%d" % int(i) + "\t%f\t%f\n" % tuple(dotrial_robust(name, i).tolist()))
                except: 
                    pass
```
The NaNs appear in `"<name>_out.csv"`. You may be able to get more of them by replacing `dotrail_robust` with `dotrial`. 

---

_Copied from https://github.com/michaelborkowski/lh-array-sort/issues/28_.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Benchmarking vector sorts with `criterion-external` produces `NaNs` for many smaller array inputs #6

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Benchmarking vector sorts with criterion-external produces NaNs for many smaller array inputs #6

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

Benchmarking vector sorts with `criterion-external` produces `NaNs` for many smaller array inputs #6