Skip to content

Conversation

@mattleblanc
Copy link
Contributor

It seemed cleanest to just open a new PR. The original description is below, I've also included the initial feedback from @Moelf.


This PR aims to implement the Valencia (VLC) jet clustering algorithm as another option to study in the context of lepton colliders. Many thanks to my summer student @EthanLynn916 for the initial julia implementation that this PR is based on!

The implementation follows the description in 1404.4294, and so depends on several parameters:

R: The jet radius parameter.
β: Corresponds to the existing power p of the algorithm.
γ: The angular exponent parameter used in the Valencia beam distance.

The extra flexibility given by the VLC algorithm provides some handles that allow for additional suppression of backgrounds due to ISR / BIB / etc. that present themselves in high-energy lepton collisions. It was originally developed for use at CLIC, and is of interest to study in the context of a future muon collider.

This PR touches EEAlgorithm to add the necessary distance functions and parameters. I think this new code is factorized and does not change the performance of the existing algorithms, but am continuing to validate the work.

In particular, some of the new tests are not yet passing due to larger differences in jet rapidity than I'd expected. I'm hopeful that #198 might help bring things slightly closer to the C++ output, so am waiting to test with that merged ... in the meantime, I'm happy to get feedback on this draft PR so that it can be improved.

🍻 MLB

@codecov
Copy link

codecov bot commented Aug 13, 2025

Codecov Report

❌ Patch coverage is 97.72727% with 3 lines in your changes missing coverage. Please review.
✅ Project coverage is 81.12%. Comparing base (ae0ac95) to head (c4ec5cc).

Files with missing lines Patch % Lines
src/EEAlgorithm.jl 97.65% 3 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #202      +/-   ##
==========================================
+ Coverage   80.22%   81.12%   +0.89%     
==========================================
  Files          21       21              
  Lines        1315     1388      +73     
==========================================
+ Hits         1055     1126      +71     
- Misses        260      262       +2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@mattleblanc mattleblanc changed the title DRAFT: Valencia jets 🍊 Valencia jets 🍊 Aug 13, 2025
@mattleblanc
Copy link
Contributor Author

Cool! I think the beam distance was not quite correct before. After expressing it differently, I now get much better closure in the test vs. fastjet.

The test does not pass at the same level of precision as the other tests, but I think they're still quite close. Please let me know if there's anything obvious that I can do to improve on this:

rap_test = isapprox(jet.rap, rap_ref; atol = 1e-4)
phi_test = isapprox(norm_phi, normalised_phi_ref; atol = 1e-4)
pt_test = isapprox(jet.pt, pt_ref; rtol = 1e-5)

I am not super happy with the introduction of ComparisonTestValencia to _common.jl, but it seemed preferable to changing the footprint of ComparisonTest to take the extra arguments ...

In any case I'd say this is now ready for people to take a look!

Copy link
Member

@Moelf Moelf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clean up inbounds related stuff

@graeme-a-stewart
Copy link
Member

Hello @mattleblanc

Great stuff - I am very happy to see this more background resistant e+e- algorithm here.

A quick test does show a significant regression for the speed of the other e+e- algorithms, viz., from main:

➜  examples git:(main) ✗ julia --project instrumented-jetreco.jl --algorithm=Durham ../test/data/events.eeH.hepmc3.zst -m  64
Processed 100 events 64 times
Average time per event 56.561605937499984 ± 15.908230984544156 μs
Lowest time per event 38.716570000000004 μs
➜  examples git:(main) ✗ julia --project instrumented-jetreco.jl --algorithm=EEKt -p 1 -R 1 ../test/data/events.eeH.hepmc3.zst -m 64
Processed 100 events 64 times
Average time per event 53.38911937499999 ± 17.78962680436358 μs
Lowest time per event 41.09395000000001 μs

to this PR:

➜  examples git:(c0b228b) ✗ julia --project instrumented-jetreco.jl --algorithm=Durham ../test/data/events.eeH.hepmc3.zst -m 64 
Processed 100 events 64 times
Average time per event 76.76526390624997 ± 13.610044399883003 μs
Lowest time per event 61.225550000000005 μs
➜  examples git:(c0b228b) ✗ julia --project instrumented-jetreco.jl --algorithm=EEKt -p 1 -R 1 ../test/data/events.eeH.hepmc3.zst -m 64
Processed 100 events 64 times
Average time per event 76.19724343749999 ± 13.723145317291415 μs
Lowest time per event 65.72833 μs

Which is about x1.58 slower (it's on an old i7 MacBook Pro).

It's also a bit concerning why the results are only good at the $10^{-4}$ level, which is x100 worse than with other algorithms. We should try and see why that is - do we have some sensitive non-optimal cancellations in the maths? There would be a good case here for trying Anton's PrecisionCarriers.jl package (https://indico.cern.ch/event/1488852/timetable/#88-precisioncarriersjl-easy-de).

@mattleblanc mattleblanc changed the title Valencia jets 🍊 WIP: Valencia jets 🍊 Aug 18, 2025
@mattleblanc mattleblanc changed the title WIP: Valencia jets 🍊 Valencia jets 🍊 Aug 18, 2025
@mattleblanc
Copy link
Contributor Author

@graeme-a-stewart Thanks for pointing out that regression -- I've been running the benchmarks over the weekend, and on my machine (M3 macbook air) the Durham and EEKt performance now ~matches the state before this PR. I see the following:

Main:

m3leblanc:examples mleblanc$ julia --project=/Users/mleblanc/dev/baseline/JetReconstruction.jl/examples/ --startup-file=no /Users/mleblanc/dev/baseline/JetReconstruction.jl/examples/instrumented-jetreco.jl --algorithm=Durham /Users/mleblanc/dev/durham/JetReconstruction.jl/test/data/events.eeH.hepm
c3.zst -m 640 
Precompiling JetReconstruction...
  1 dependency successfully precompiled in 2 seconds. 66 already precompiled.
Processed 100 events 640 times
Average time per event 22.380134656249982 ± 1.3076253982945982 μs
Lowest time per event 20.94416 μs

This PR:

m3leblanc:JetReconstruction.jl mleblanc$ julia --project=/Users/mleblanc/dev/durham/JetReconstruction.jl/examples/ --startup-file=no /Users/mleblanc/dev/durham/JetReconstruction.jl/examples/instrumented-jetreco.jl --algorithm=Durham /Users/mleblanc/dev/durham/JetReconstruction.jl/test/data/events.eeH.hepmc3.zst -m 640 -p=1.0
Precompiling JetReconstruction...
  1 dependency successfully precompiled in 2 seconds. 66 already precompiled.
Processed 100 events 640 times
Average time per event 23.147814531249995 ± 2.522429492157901 μs
Lowest time per event 20.16833 μs

For completeness, the current VLC benchmark is consistently a bit slower, but not too bad:

m3leblanc:JetReconstruction.jl mleblanc$ julia --project=/Users/mleblanc/dev/durham/JetReconstruction.jl/examples/ --startup-file=no /Users/mleblanc/dev/durham/JetReconstruction.jl/examples/instrumented-jetreco.jl --algorithm=Valencia /Users/mleblanc/dev/durham/JetReconstruction.jl/test/data/events.eeH.hepmc3.zst -m 640 -p=1.0
Processed 100 events 640 times
Average time per event 27.71035881249998 ± 2.689635681328953 μs
Lowest time per event 24.585 μs

I don't love the duplication of code that these changes introduce when selecting functions based on ::Val{JetAlgorithm.), so I'm happy to try another approach if you'd prefer that and have a suggestion.

Copy link
Member

@m-fila m-fila left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I missed quite a bit and I'm now confussed why there are different implementations _ee_genkt_algorithm and _ee_genkt_algorithm_durham when previously we had only _ee_genkt_algorithm. These function seems to be very similar with a few differences which I'm not sure are intentional

Also, what's the particular benefit with replacing if-else case checking enum with singleton type dispatch for helper functions? Some of the duplication can be for sure removed but it still looks like extra boiler-plate.

@mattleblanc
Copy link
Contributor Author

Hi all -- thanks for the feedback. I had tried out the Val-based approach after one of the initial points from @Moelf, but I agree that was not the right path to follow. I have implemented everything related to VLC into EEAlgorithm.jl in a way that now more closely follows the original implementation.

I have also been running the benchmarking as I have gone: I do not believe that this implementation should be significantly slower for Durham & EEKt than the existing version. The VLC implementation is a bit slower than the benchmark I posted above, but I would prefer at this point to try moving forward with this MR and to optimise things further in the future.

# Valencia
Processed 100 events 640 times
Average time per event 46.44745081249997 ± 2.30957344599381 μs
Lowest time per event 44.26292 μs

# Durham
Processed 100 events 640 times
Average time per event 23.304417093749965 ± 1.6160987724851137 μs
Lowest time per event 21.285 μs

@mattleblanc mattleblanc requested review from Moelf and m-fila August 24, 2025 23:07
Valencia metric (independent of `dij_factor`).
"""
@inline function dij_dist(eereco, i, j, dij_factor, algorithm, R = 4.0)
if !(algorithm isa JetAlgorithm.Algorithm)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is really hot, so adding a conditional check here is bad. By the time we are this deep in the algorithm we should be able to assume that algorithm isa JetAlgorithm.Algorithm.

Removing this check gains about 5% on my machine.

Copy link
Contributor Author

@mattleblanc mattleblanc Aug 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting, how are you benchmarking?

Removing this check doesn't change the performance on my end by an amount that seems significant:

With the check:

> julia --project=examples/  /Users/mleblanc/dev/durham/JetReconstruction.jl/examples/instrumented-jetreco.jl /Users/mleblanc/dev/durham/JetReconstruction.jl/test/data/events.eeH.hepmc3.zst -m 1000 --algorithm=Durham
Processed 100 events 1000 times
Average time per event 23.182571699999997 ± 1.9527639664864915 μs
Lowest time per event 20.9175 μs

Without the check:

> julia --project=examples/  /Users/mleblanc/dev/durham/JetReconstruction.jl/examples/instrumented-jetreco.jl /Users/mleblanc/dev/durham/JetReconstruction.jl/test/data/events.eeH.hepmc3.zst -m 1000 --algorithm=Durham
Processed 100 events 1000 times
Average time per event 22.837387519999996 ± 1.3930053222057048 μs
Lowest time per event 21.34125 μs

@graeme-a-stewart
Copy link
Member

Performance definitely got a lot better now, but we're still losing about 10% in the Durham algorithm.

There are a lot of if algorithm == decision points now, which are all quite deep in loops. Even removing unneeded checks I can't recover the last 5-7% of performance. The function signatures got a little more complex as well.

It may be that we need a refactorisation that separates Valencia at a higher level, so that in all of the tighter loops the flow path has less branching.

@mattleblanc
Copy link
Contributor Author

mattleblanc commented Aug 27, 2025

Performance definitely got a lot better now, but we're still losing about 10% in the Durham algorithm.

There are a lot of if algorithm == decision points now, which are all quite deep in loops. Even removing unneeded checks I can't recover the last 5-7% of performance. The function signatures got a little more complex as well.

It may be that we need a refactorisation that separates Valencia at a higher level, so that in all of the tighter loops the flow path has less branching.

Hi @graeme-a-stewart, thanks for the quick feedback. I've made the simple changes, but I am puzzled by your comments about performance. As above, I see consistent performance between an unmodified version of the code. Profiling an unmodified version at commit eb63e6c, I see this:

m3leblanc:JetReconstruction.jl mleblanc$ julia --project=examples/  examples/instrumented-jetreco.jl test/dat
a/events.eeH.hepmc3.zst -m 1000 --algorithm=Durham
Processed 100 events 1000 times
Average time per event 22.186680800000023 ± 1.5031461324091082 μs
Lowest time per event 20.755 μs

Happy to make more changes, but I just want to check whether this is consistent with the way you're doing things before changing anything major.

@Moelf
Copy link
Member

Moelf commented Aug 27, 2025

are we bench marking on same architecture? I fear this is one of thus x86_64 vs. macOS M-series problem

FWIW, on an AMD Zen4:

# this PR
(vlc)> julia --project=examples/ ./examples/instrumented-jetreco.jl test/data/events.eeH.hepmc3.zst -m 1000 --algorithm=Durham
Processed 100 events 1000 times
Average time per event 29.74854201999998 ± 2.9077949972547454 μs
Lowest time per event 27.48308 μs

# main
(main)> julia --project=examples/ ./examples/instrumented-jetreco.jl test/data/events.eeH.hepmc3.zst -m 1000 --algorithm=Durham
Processed 100 events 1000 times
Average time per event 29.813233310000008 ± 4.091690424744847 μs
Lowest time per event 26.534920000000003 μs

@graeme-a-stewart
Copy link
Member

Hi @graeme-a-stewart, thanks for the quick feedback. I've made the simple changes, but I am puzzled by your comments about performance. As above, I see consistent performance between an unmodified version of the code. Profiling an unmodified version at commit eb63e6c, I see this:

m3leblanc:JetReconstruction.jl mleblanc$ julia --project=examples/  examples/instrumented-jetreco.jl test/dat
a/events.eeH.hepmc3.zst -m 1000 --algorithm=Durham
Processed 100 events 1000 times
Average time per event 22.186680800000023 ± 1.5031461324091082 μs
Lowest time per event 20.755 μs

Happy to make more changes, but I just want to check whether this is consistent with the way you're doing things before changing anything major.

Thanks @mattleblanc

What do you get comparing the head of your branch with the head of main? I am seeing ~8-10% regression, but it's an old x86 MacBook Pro I'm using.

Tomorrow I'll test on other platforms - maybe this is just an oddity...

@mattleblanc
Copy link
Contributor Author

I'm running on an M3 macbook air, I see the following after rebasing my reference setup with the latest comments on main (up to d3c97f3): it's not obvious that the performance changed.

m3leblanc:JetReconstruction.jl mleblanc$ julia --project=examples/  examples/instrumented-jetreco.jl test/data/events.eeH.hepmc3.zst -m 1000 --algorithm=Durham
Precompiling JetReconstruction...
  1 dependency successfully precompiled in 2 seconds. 66 already precompiled.
Processed 100 events 1000 times
Average time per event 23.20044566999999 ± 1.9812033289848472 μs
Lowest time per event 20.8875 μs

@graeme-a-stewart
Copy link
Member

So here is what I get, comparing your branch Matt (7155768) with current main (d3c97f39). This is running

julia --project instrumented-jetreco.jl --algorithm=Durham ../test/data/events.eeH.hepmc3.zst -m 1000

in each case and looking at the lowest time achieved (though the average shows the same pattern).

Machine vlc main Ratio
MacBook Pro x86_64 2.2GHz i7 40.4 38.3 0.95
Alma 9 Ryzen 7 5700G 3.8GHz 35.6 34.1 0.96

So I do maintain there is a 4-5% regression in the speed of the Durham algorithm with the current Valencia implementation.

Likewise for EEKt with R=1.0 p=1:

Machine vlc main Ratio
MacBook Pro x86_64 2.2GHz i7 44.3 40.6 0.92
Alma 9 Ryzen 7 5700G 3.8GHz 38.8 36.8 0.95

The slowdown on the old MacBook is a bit worse; however, the fairly modern Linux machine is a pretty typical production machine and it's suffering too.

@mattleblanc
Copy link
Contributor Author

OK, thanks for confirming these numbers. What's your preferred solution? I can try to factorise the VLC code away without duplicating too much code under the hood.

I'll benchmark future changes on an alma9 machine so that I'm more likely to catch any differences in performance.

@graeme-a-stewart
Copy link
Member

I have some ideas I'd like to try tomorrow to try and better understand the problem and mitigations. Hopefully we can avoid a total separation of the algorithm logic!

@graeme-a-stewart
Copy link
Member

I'll benchmark future changes on an alma9 machine so that I'm more likely to catch any differences in performance.

Down at these edges of squeezing out all the performance we can we do see different behaviour between Apple Silicon and Intel chips as @Moelf noted. See also, e.g., #151. So having a spread of architectures is certainly useful.

@graeme-a-stewart
Copy link
Member

Just getting back to this again, I took some benchmarks again with Julia 1.11.7 on an AMD Ryzen 7 5700G and on a MacBook Pro M4:

Machine Main branch Valencia branch Ratio
Ryzen 5700G 34.1 37.1 0.92
M4 17.1 17.1 1.00

So the M4 doesn't show a regression (and is also blazingly fast 🚀 ).

I'd still like to understand/recover the regression on Linux x86.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants