Skip to content

Conversation

@stephenswat
Copy link
Member

@stephenswat stephenswat commented Jul 23, 2025

This commit improves the performance of our seed finding by computing the linear circles for the doublets only once rather than computing them multiple times. Also tweaks launch parameters to improve occupancy.

@stephenswat stephenswat added the performance Performance-relevant changes label Jul 23, 2025
@stephenswat stephenswat changed the title Cache linearised circles in seed finding Improve performance of triplet finding Jul 23, 2025
@stephenswat

This comment was marked as outdated.

@krasznaa
Copy link
Member

To write it down for myself as well... Here my thinking is to wait for #1087 to be sorted out first. To get an even better view of how well this re-organization will do for us. (With memory vs. runtime.) As I keep worrying about memory use...

@stephenswat
Copy link
Member Author

To write it down for myself as well... Here my thinking is to wait for #1087 to be sorted out first. To get an even better view of how well this re-organization will do for us. (With memory vs. runtime.) As I keep worrying about memory use...

Okay... 😕

@stephenswat stephenswat force-pushed the perf/cache_lincircles branch from 948d3ad to 1d331cd Compare August 18, 2025 11:46
@sonarqubecloud
Copy link

@stephenswat

This comment was marked as outdated.

This commit improves the performance of our seed finding by computing
the linear circles for the doublets only once rather than computing them
multiple times. Also tweaks launch parameters to improve occupancy.
@stephenswat stephenswat force-pushed the perf/cache_lincircles branch from 1d331cd to aa94d94 Compare November 5, 2025 10:39
@sonarqubecloud
Copy link

sonarqubecloud bot commented Nov 5, 2025

@stephenswat
Copy link
Member Author

Physics performance summary

Here is a summary of the physics performance effects of this PR. Command used:

traccc_seeding_example_cuda --input-directory=/data/Acts/odd-simulations-20240506/geant4_ttbar_mu200 --digitization-file=geometries/odd/odd-digi-geometric-config.json --detector-file=geometries/odd/odd-detray_geometry_detray.json --grid-file=geometries/odd/odd-detray_surface_grids_detray.json --material-file=geometries/odd/odd-detray_material_detray.json --input-events=10 --use-acts-geom-source=on --check-performance --truth-finding-min-track-candidates=5 --truth-finding-min-pt=1.0 --truth-finding-min-z=-150 --truth-finding-max-z=150 --truth-finding-max-r=10 --seed-matching-ratio=0.99 --track-matching-ratio=0.5 --track-candidates-range=5:100 --seedfinder-vertex-range=-150:150

Seeding performance

Total number of seeds went from 298344 to 298342 (-0.0%)

Seeding plots



Track finding performance

Total number of found tracks went from 55977 to 55968 (-0.0%)

Finding plots









Track fitting performance

Total number of fitted tracks went from 55977 to 55968 (-0.0%)

Fitting plots












Note

This is an automated message produced on the explicit request of a human being.

@stephenswat
Copy link
Member Author

Performance summary

Here is a summary of the performance effects of this PR:

Graphical

Tabular

KernelReciprocal ThroughputParallelism
d668f0faa94d94Deltad668f0faa94d94
propagate_to_next_surface7.95 ms7.95 ms-0.1%3.403.40
fit_forward2.37 ms2.37 ms0.2%6.176.17
fit_backward1.28 ms1.28 ms0.1%4.584.58
find_tracks1.14 ms1.14 ms0.4%1.871.87
ccl_kernel831.33 μs832.57 μs0.1%1.371.37
Thrust::sort708.30 μs709.51 μs0.2%3.763.76
count_doublets629.28 μs632.30 μs0.5%1.611.61
find_doublets448.77 μs451.65 μs0.6%3.083.08
count_triplets589.81 μs306.94 μs-48.0%1.021.03
find_triplets172.21 μs116.70 μs-32.2%1.311.49
make_mid_bot_lincircles101.25 μsnan1.03
select_seeds53.49 μs53.27 μs-0.4%1.341.34
make_mid_top_lincircles30.34 μsnan1.12
remove_duplicates27.04 μs27.31 μs1.0%22.5422.36
populate_grid23.30 μs23.39 μs0.4%1.221.22
count_grid_capacities22.12 μs22.14 μs0.1%1.221.22
update_triplet_weights15.07 μs15.40 μs2.2%1.271.27
apply_interaction15.15 μs15.15 μs0.0%6.516.51
estimate_track_params14.38 μs14.36 μs-0.1%2.152.15
fit_prelude13.42 μs13.41 μs-0.1%18.2618.25
form_spacepoints13.16 μs13.24 μs0.6%1.481.48
fill_finding_propagation_sort_keys9.33 μs9.32 μs-0.2%7.317.31
reduce_triplet_counts6.29 μs6.31 μs0.3%3.083.08
build_tracks4.80 μs4.85 μs1.1%12.2712.28
unknown2.06 μs2.06 μs0.0%9.859.85
fill_finding_duplicate_removal_sort_keys1.86 μs1.85 μs-0.4%32.6032.70
fill_fitting_sort_keys186.85 ns187.65 ns0.4%18.3718.36
fill_prefix_sum171.98 ns171.98 ns-0.0%341.30341.30
Total16.34 ms16.15 ms-1.2%3.553.59

Important

All metrics in this report are given as reciprocal throughput, not as wallclock runtime.

Note

This is an automated message produced upon the explicit request of a human being.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance Performance-relevant changes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants