Skip to content

Conversation

@jazullo
Copy link
Collaborator

@jazullo jazullo commented Mar 29, 2025

This branch implements 5 things in 5 commits:

  1. using the direct implementation of allocScratch instead of the messy construction with flattenCallback for all variations of mergesort with the allocation proposal. 277f6a1

  2. using the alloc proposal in parallel mergesort ace12b5

  3. a cilksort using the same fixes applied to the rest of the sorts including the necessary INLIN(ABL)E pragmas and worker patterns. This cilksort was readded to the benchrunner. 36b3978

  4. scripts to run benchrunner and collect stats into CSV c05e64e

  5. Light performance-optimization-minded update to PiecewiseFallbackSort along what was done before to other, core sorts (Merge in particular). 7c19278

TODO:

  1. perform the necessary liquid proofs wherever they are broken due to the par fix and the new cilksort function.
  2. the cilksort does not parallelize like the mergesort does (despite that the parallel merge(sort)s are identical).

Profiling and further parallelization fixes would help the paper.

@jazullo jazullo marked this pull request as draft March 29, 2025 01:21
@ulysses4ever
Copy link
Collaborator

I'd be happy to review but for this to happen I'd hope that we could first merge the other two PRs that this one is based on, so that the diff only had new stuff. If this is not possible I can do it anyways. @michaelborkowski do you have an ETA for cleaning up your monad-par PR? Or maybe you want help?

In general, please, try to avoid mix several things in one PR (e.g. cilksort and allocSctatch).

@michaelborkowski
Copy link
Owner

@ulysses4ever I'll get that cleaned up by tomorrow so we can move forward with the PRs.

Copy link
Collaborator

@ulysses4ever ulysses4ever left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you squash commits to have just three: cilksort-related, alloc-related, py-criterion-related?

@@ -0,0 +1,5 @@
The script `criterionmethodology.py` is my implementation of a benchrunner-runner that uses the criterion methodology. We take as input some program which takes `iters` as a command-line argument, times a function of interest in a tight loop which repeats `iters` many times, and then prints to stdout the batchtime (total loop time) and selftimed (total loop time divided by iters). The essense of criterion is then to sweep `iters` and perform a linear regression against iters and batchtime. The slope is the mean and the y-intercept represents some notion of shared overhead, insensitive to `iters`. Ultimately, criterion serves as a way to benchmark tasks with very short execution times, as startup overhead can be ignored.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's very text-heavy. What would help with it is examples of how you run it and what outputs you expect.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you leave a comment somewhere as to why we need another QuickSort? It'd be great to get the main one to work for cilksort.

in quickSortBtw (cpy ? lem_equal_slice_bag xs2 cpy 0 n) 0 n
else let (Ur hd, xs2) = A.get2 0 xs1
tmp = makeArray n hd in
A.copy2 0 0 n xs2 tmp ? lem_copy_equal_slice xs2 0 tmp 0 n & \(xs2', cpy0) ->
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we use let here?


MAKE_PLOT = False

def linear_regression_with_std(x, y):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd expect Python to have something like this in one of the libraries (should be easy to Google).

@ulysses4ever
Copy link
Collaborator

One more thing: any new sort should also be exercised in CI, which means adding a line analogous to what we have here:

cabal run benchrunner -- 5 Insertionsort Seq 100
cabal run benchrunner -- 5 Mergesort Seq 100
cabal run benchrunner -- 5 Mergesort Par 100 +RTS -N2
cabal run benchrunner -- 5 "VectorSort Insertionsort" Seq 100
cabal run benchrunner -- 5 "VectorSort Mergesort" Seq 100
cabal run benchrunner -- 5 "VectorSort Quicksort" Seq 100
cabal run benchrunner -- 5 "CSort Insertionsort" Seq 100
cabal run benchrunner -- 5 "CSort Mergesort" Seq 100
cabal run benchrunner -- 5 "CSort Quicksort" Seq 100

@ulysses4ever ulysses4ever force-pushed the benchmark-ready-par branch from d87c3f3 to eafa2b1 Compare April 2, 2025 14:31
@ulysses4ever
Copy link
Collaborator

One important consideration when switching to allocScratch: you have to double-check that performance doesn't degrade. I think it very well may. Especially if the new combinators are not marked as INLINABLE.

@ulysses4ever
Copy link
Collaborator

My local experiments don't show any performance degradation in Mergesort Seq or Par on this branch.

@ulysses4ever
Copy link
Collaborator

I modified the description of this PR to have a list of 5 things this PR implements, which map nicely on the commits from this branch (thanks a lot to Joseph for creating a clean Git history in particular). I plan to merge some of these things in separate PRs and move on. (Some things may need to keep lingering here.) First up is the updated Cilksort: #27

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants