Skip to content

Conversation

@L0neGamer
Copy link

continuing the work of #493

Not yet ready for merging, I want to try and run the benchmarks in a reliable way.

@L0neGamer L0neGamer changed the title Hascallstack partial Adding HasCallStack to Partial functions Sep 10, 2025
@L0neGamer
Copy link
Author

Do you have any recommendations for running benchmarks? I ran all of the benchmarks three times and got wildly different results each time. Ideally I'd run all of them one set at a time... I guess I can manually stack the commands for that. I await further guidance.

@meooow25
Copy link
Contributor

Hi, I need some time to review this, since I need to first understand how HasCallstack works and the current consensus on using them in boot libraries.

About the benchmarks, what have you tried? I would run the benchmarks for the changed functions using tasty patterns, e.g.

$ cabal run map-benchmarks -- -p lookup

(See also CONTRIBUTING.md)

@L0neGamer
Copy link
Author

I ran cabal bench all before the changes, after, and then checked out master again and reran, and got very different results.

I can certainly try running individual benchmarks; in that case I'll probably try doing "run each benchmark component individually".

@treeowl
Copy link
Contributor

treeowl commented Sep 14, 2025

@meooow25 , the constraints are used in base, so there shouldn't be any issue using them in other boot libraries.

@Lysxia
Copy link
Contributor

Lysxia commented Sep 15, 2025

One common source of benchmark noise is other processes taking up CPU, like a browser. In case you haven't tried already, running benchmarks on a fresh boot helps reduce interference from background processes.

And if you think there could be something weird going on you can ask for someone else to try it too to gather more data. (I'd be happy to volunteer here.)

@treeowl
Copy link
Contributor

treeowl commented Sep 15, 2025

FWIW, I always run killall -STOP firefox before running benchmarks and killall -CONT firefox after, but I don't claim to be good at achieving a "quiet" system.

@L0neGamer
Copy link
Author

Here are the results of benchmarking all the benchmarks in order one after the other: https://gist.github.com/L0neGamer/d0722ef0aea167684d87b63f786bb33f

Here are the percentage differences of the end results of each:

intmap-benchmarks: +17.8%
intset-benchmarks: -2.08%
map-benchmarks: -20.9%
tree-benchmarks: +5.33%
sequence-benchmarks: -4%
set-benchmarks: ~0%
graph-benchmarks: -31.1%
set-operations-intmap: -3.73%
set-operations-intset: -1.94%
set-operations-map: -3.69%
set-operations-set: -7.43%
lookupge-intmap: +4.22%
lookupge-map: +40.5%

However, the last result was a difference of 3 seconds so is probably just a little swingy, especially as all of the results in that one are better in the hascallstack run.

Benchmarks are performed with a bunch of things killed, and running the below script:

(cabal bench intmap-benchmarks && \
  cabal bench intset-benchmarks && \
  cabal bench map-benchmarks && \
  cabal bench tree-benchmarks && \
  cabal bench sequence-benchmarks && \
  cabal bench set-benchmarks && \
  cabal bench graph-benchmarks && \
  cabal bench set-operations-intmap && \
  cabal bench set-operations-intset && \
  cabal bench set-operations-map && \
  cabal bench set-operations-set && \
  cabal bench lookupge-intmap && \
  cabal bench lookupge-map) > bench.out

@L0neGamer L0neGamer marked this pull request as ready for review September 15, 2025 21:01
@jappeace
Copy link

It needs to calculate the standard deviations (also known as expected error, or volatility) to see if they're within margin per bench suite.

But I think these are kind of excessive. Nobody complained about adding callstack to unordered containers, so I doubt they will if it's added here. And if they do just revert the ones they complain about 😅

@Bodigrim
Copy link
Contributor

Here are the percentage differences of the end results of each:

What do you mean by "end results"? Are you comparing total time as reported in "All XX tests passed"? This would make little sense.

Could you use --csv and --baseline?

@L0neGamer
Copy link
Author

I'll look into running each of those later.

Would the maintainers be interested if I took this opportunity to add the x-partial warning to all of these functions as well?

@meooow25
Copy link
Contributor

Would the maintainers be interested if I took this opportunity to add the x-partial warning to all of these functions as well?

A no from me on that, sorry. I find the warnings on head and tail annoying enough.
I'm all for documenting clearly which functions are partial and the conditions under which they error. This was done in #1140, if I missed any do let me know. The rest is on the user, if they use the function they know what they are doing and there is no reason to annoy them when they compile their code.

@L0neGamer
Copy link
Author

I don't think people should have to program with the documentation open necessarily.

I think the thing with these warnings is that they're guards for beginner work or corporate codebases. For the average project or small proof of concept of a competent haskeller, they're annoying, but they help ensure code quality by stopping exceptions before they can happen.

In any case where you "know a value should be in a map" or similar, that function is always one refactor away from having a ticking time bomb of an exception going off. This doesn't feel very "fearless refactoring" to me.
Additionally, I don't think the haskell answer to "this is a function that can fail" is "code better", it's "ensure the user of the code is aware of the failure so they can handle it", be that by ignoring the warning or using more explicit control flow.

If a user wants to use these potentially unsafe and throwy functions, they are more than able to ignore the warnings or turn them off. But for those that want to program safely, we have no choice but to try and stay vigilant all the time.

@L0neGamer
Copy link
Author

Been a bit busy, here are the results from running a baseline on master, then running against that baseline on this PR:
https://gist.github.com/L0neGamer/d0722ef0aea167684d87b63f786bb33f#file-vs-baseline-out

I count 1008 "same as baseline", 97 "more than baseline" (average ~16%), and 71 "less than baseline" (average ~14%).

in both cases, I cabal cleaned, cabal built all, then ran a script similar to what I posted before but modified to output the csvs, and to then measure against the baseline, and in neither case was I doing anything else on my computer (and nothing else was open).

@L0neGamer
Copy link
Author

As much as I would love to add x-partial warnings onto these functions, I don't think adding HasCallStack should be blocked on that. What do I need to do to keep this moving?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants