-
Notifications
You must be signed in to change notification settings - Fork 13
Valencia jets 🍊 #202
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Valencia jets 🍊 #202
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #202 +/- ##
==========================================
+ Coverage 80.22% 81.12% +0.89%
==========================================
Files 21 21
Lines 1315 1388 +73
==========================================
+ Hits 1055 1126 +71
- Misses 260 262 +2 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
Cool! I think the beam distance was not quite correct before. After expressing it differently, I now get much better closure in the test vs. fastjet. The test does not pass at the same level of precision as the other tests, but I think they're still quite close. Please let me know if there's anything obvious that I can do to improve on this: I am not super happy with the introduction of In any case I'd say this is now ready for people to take a look! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clean up inbounds related stuff
|
Hello @mattleblanc Great stuff - I am very happy to see this more background resistant e+e- algorithm here. A quick test does show a significant regression for the speed of the other e+e- algorithms, viz., from ➜ examples git:(main) ✗ julia --project instrumented-jetreco.jl --algorithm=Durham ../test/data/events.eeH.hepmc3.zst -m 64
Processed 100 events 64 times
Average time per event 56.561605937499984 ± 15.908230984544156 μs
Lowest time per event 38.716570000000004 μs
➜ examples git:(main) ✗ julia --project instrumented-jetreco.jl --algorithm=EEKt -p 1 -R 1 ../test/data/events.eeH.hepmc3.zst -m 64
Processed 100 events 64 times
Average time per event 53.38911937499999 ± 17.78962680436358 μs
Lowest time per event 41.09395000000001 μsto this PR: ➜ examples git:(c0b228b) ✗ julia --project instrumented-jetreco.jl --algorithm=Durham ../test/data/events.eeH.hepmc3.zst -m 64
Processed 100 events 64 times
Average time per event 76.76526390624997 ± 13.610044399883003 μs
Lowest time per event 61.225550000000005 μs
➜ examples git:(c0b228b) ✗ julia --project instrumented-jetreco.jl --algorithm=EEKt -p 1 -R 1 ../test/data/events.eeH.hepmc3.zst -m 64
Processed 100 events 64 times
Average time per event 76.19724343749999 ± 13.723145317291415 μs
Lowest time per event 65.72833 μsWhich is about x1.58 slower (it's on an old i7 MacBook Pro). It's also a bit concerning why the results are only good at the |
|
@graeme-a-stewart Thanks for pointing out that regression -- I've been running the benchmarks over the weekend, and on my machine (M3 macbook air) the Durham and EEKt performance now ~matches the state before this PR. I see the following: Main: This PR: For completeness, the current VLC benchmark is consistently a bit slower, but not too bad: I don't love the duplication of code that these changes introduce when selecting functions based on |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I missed quite a bit and I'm now confussed why there are different implementations _ee_genkt_algorithm and _ee_genkt_algorithm_durham when previously we had only _ee_genkt_algorithm. These function seems to be very similar with a few differences which I'm not sure are intentional
Also, what's the particular benefit with replacing if-else case checking enum with singleton type dispatch for helper functions? Some of the duplication can be for sure removed but it still looks like extra boiler-plate.
|
Hi all -- thanks for the feedback. I had tried out the Val-based approach after one of the initial points from @Moelf, but I agree that was not the right path to follow. I have implemented everything related to VLC into I have also been running the benchmarking as I have gone: I do not believe that this implementation should be significantly slower for Durham & EEKt than the existing version. The VLC implementation is a bit slower than the benchmark I posted above, but I would prefer at this point to try moving forward with this MR and to optimise things further in the future. |
| Valencia metric (independent of `dij_factor`). | ||
| """ | ||
| @inline function dij_dist(eereco, i, j, dij_factor, algorithm, R = 4.0) | ||
| if !(algorithm isa JetAlgorithm.Algorithm) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function is really hot, so adding a conditional check here is bad. By the time we are this deep in the algorithm we should be able to assume that algorithm isa JetAlgorithm.Algorithm.
Removing this check gains about 5% on my machine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting, how are you benchmarking?
Removing this check doesn't change the performance on my end by an amount that seems significant:
With the check:
> julia --project=examples/ /Users/mleblanc/dev/durham/JetReconstruction.jl/examples/instrumented-jetreco.jl /Users/mleblanc/dev/durham/JetReconstruction.jl/test/data/events.eeH.hepmc3.zst -m 1000 --algorithm=Durham
Processed 100 events 1000 times
Average time per event 23.182571699999997 ± 1.9527639664864915 μs
Lowest time per event 20.9175 μs
Without the check:
> julia --project=examples/ /Users/mleblanc/dev/durham/JetReconstruction.jl/examples/instrumented-jetreco.jl /Users/mleblanc/dev/durham/JetReconstruction.jl/test/data/events.eeH.hepmc3.zst -m 1000 --algorithm=Durham
Processed 100 events 1000 times
Average time per event 22.837387519999996 ± 1.3930053222057048 μs
Lowest time per event 21.34125 μs
|
Performance definitely got a lot better now, but we're still losing about 10% in the Durham algorithm. There are a lot of It may be that we need a refactorisation that separates Valencia at a higher level, so that in all of the tighter loops the flow path has less branching. |
Hi @graeme-a-stewart, thanks for the quick feedback. I've made the simple changes, but I am puzzled by your comments about performance. As above, I see consistent performance between an unmodified version of the code. Profiling an unmodified version at commit eb63e6c, I see this: Happy to make more changes, but I just want to check whether this is consistent with the way you're doing things before changing anything major. |
|
are we bench marking on same architecture? I fear this is one of thus x86_64 vs. macOS M-series problem FWIW, on an AMD Zen4: # this PR
(vlc)> julia --project=examples/ ./examples/instrumented-jetreco.jl test/data/events.eeH.hepmc3.zst -m 1000 --algorithm=Durham
Processed 100 events 1000 times
Average time per event 29.74854201999998 ± 2.9077949972547454 μs
Lowest time per event 27.48308 μs
# main
(main)> julia --project=examples/ ./examples/instrumented-jetreco.jl test/data/events.eeH.hepmc3.zst -m 1000 --algorithm=Durham
Processed 100 events 1000 times
Average time per event 29.813233310000008 ± 4.091690424744847 μs
Lowest time per event 26.534920000000003 μs |
Thanks @mattleblanc What do you get comparing the head of your branch with the head of main? I am seeing ~8-10% regression, but it's an old x86 MacBook Pro I'm using. Tomorrow I'll test on other platforms - maybe this is just an oddity... |
|
I'm running on an M3 macbook air, I see the following after rebasing my reference setup with the latest comments on |
|
So here is what I get, comparing your branch Matt ( julia --project instrumented-jetreco.jl --algorithm=Durham ../test/data/events.eeH.hepmc3.zst -m 1000in each case and looking at the lowest time achieved (though the average shows the same pattern).
So I do maintain there is a 4-5% regression in the speed of the Durham algorithm with the current Valencia implementation. Likewise for
The slowdown on the old MacBook is a bit worse; however, the fairly modern Linux machine is a pretty typical production machine and it's suffering too. |
|
OK, thanks for confirming these numbers. What's your preferred solution? I can try to factorise the VLC code away without duplicating too much code under the hood. I'll benchmark future changes on an alma9 machine so that I'm more likely to catch any differences in performance. |
|
I have some ideas I'd like to try tomorrow to try and better understand the problem and mitigations. Hopefully we can avoid a total separation of the algorithm logic! |
Down at these edges of squeezing out all the performance we can we do see different behaviour between Apple Silicon and Intel chips as @Moelf noted. See also, e.g., #151. So having a spread of architectures is certainly useful. |
|
Just getting back to this again, I took some benchmarks again with Julia 1.11.7 on an AMD Ryzen 7 5700G and on a MacBook Pro M4:
So the M4 doesn't show a regression (and is also blazingly fast 🚀 ). I'd still like to understand/recover the regression on Linux x86. |
…-- VLC tests pass now with higher precision
…arders to avoid branching, fast path when p isa int
…recover performance
It seemed cleanest to just open a new PR. The original description is below, I've also included the initial feedback from @Moelf.
This PR aims to implement the Valencia (VLC) jet clustering algorithm as another option to study in the context of lepton colliders. Many thanks to my summer student @EthanLynn916 for the initial julia implementation that this PR is based on!
The implementation follows the description in 1404.4294, and so depends on several parameters:
The extra flexibility given by the VLC algorithm provides some handles that allow for additional suppression of backgrounds due to ISR / BIB / etc. that present themselves in high-energy lepton collisions. It was originally developed for use at CLIC, and is of interest to study in the context of a future muon collider.
This PR touches EEAlgorithm to add the necessary distance functions and parameters. I think this new code is factorized and does not change the performance of the existing algorithms, but am continuing to validate the work.
In particular, some of the new tests are not yet passing due to larger differences in jet rapidity than I'd expected. I'm hopeful that #198 might help bring things slightly closer to the C++ output, so am waiting to test with that merged ... in the meantime, I'm happy to get feedback on this draft PR so that it can be improved.
🍻 MLB