-
Notifications
You must be signed in to change notification settings - Fork 56
Track Fit/State SoA, main branch (2025.08.05.) #1112
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Track Fit/State SoA, main branch (2025.08.05.) #1112
Conversation
core/include/traccc/finding/details/combinatorial_kalman_filter.hpp
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a very concerning amount of added complexity... I really hope you manage to make this a bit simpler.
| /// @param bound_params bound parameter | ||
| /// | ||
| /// @return true if the update succeeds | ||
| template <typename track_state_backend_t> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is this track_state_backend_t here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is the BASE type from track_state_collection.hpp. "Backend" seems like an appropriate term for it.
The templating is needed here because in the CKF this code gets called on a "standalone" traccc::edm::track_state object. While in the KF it is called on an element of an SoA container. The "backend" describes that in one case the trk_state is operating on values, and in the other it operates on references.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah let's not do that. Simply accept a template parameter track_state_t and use that. This more complicated system of accepting template arguments serves no purpose.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We had this exact discussion on a previous PR already.
If not for this type of templating, then we'd need a concept for traccc::edm::track_state<T>. But the intention here is really that the user should only be able to provide traccc::edm::track_state objects here. Not something that maybe has the same accessor functions as it does.
9b98ec3 to
e204f3d
Compare
|
The code should now be working correctly. 🤔 It took me a bit embarrassing amount of time to debug it, but the reason that there were two tests failing earlier is that with the new EDM I changed the behaviour of While the device code has always produced tracks with failed fits as well. (Since it's much easier to just allocate the memory for all found tracks, and then just flag the ones that failed the fit.) Since I believe the device code's behaviour is actually more correct, during the update I switched the host code to do the same. But the host tests were written with the assumption that all tracks received from the fitting algorithm would've been successfully fitted. Even though some of the test fits are known to fail. Long story short, I disabled some of the detailed checks in the tests for the failed fits. I think I also finished with every other part of the update at this point, so this is generally ready for serious review. |
stephenswat
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me just slap on one of these to make sure we don't make any hasty decisions.
In the meanwhile, could you help me understand how this change affects #1052? I.e. what kinds of changes will be necessary following this to get that PR in order?
Performance summaryHere is a summary of the performance effects of this PR: GraphicalTabular
Important All metrics in this report are given as reciprocal throughput, not as wallclock runtime. Note This is an automated message produced on the explicit request of a human being. |
I looked at that PR just now. I believe the changes to it will be pretty mechanical. This PR doesn't fundamentally change the data model (yet). The current track state "container" (composed of a 1D + a jagged vector) is split into 2 SoA collections. But the type is still different from the type used as the output of the track finding. So the changes needed for #1052 will follow a recipe. My eventual goal here is to come up with a https://github.com/acts-project/acts/blob/main/Core/include/Acts/TrackFitting/KalmanFitter.hpp#L1062 (The fit there rather modifies the track container in place, which I'm not super keen on, but we'll see.) But I believe your changes with track finding will need to go in before that fundamental of a rewrite of the GPU EDM. |
This is a bit disappointing. 😦 As normal, there's a noticeable wiggle on kernels that this code doesn't touch, but it still seems that the fitting functions are a little slower. 😕 I'll do some tests of my own as well then. Edit: Actually, it's "just" |
|
Unfortunately the situation is quite a bit worse than I thought. 😭 Running So, this is definitely no bueno like this as is. |
Wait so this does not yet deduplicate the data model between finding and fitting? Will that be a follow-up PR? |
d8b94dc to
89174cd
Compare
Yes, it will be. Now... to merge Note though that there is a whole lot less of data duplication with this PR than is the current status quo. However the interfaces for the output of track finding and track fitting stay separated in this PR. |
| // filtered_params | ||
| vecmem::edm::type::vector<bound_track_parameters<ALGEBRA>>, | ||
| // smoothed_params | ||
| vecmem::edm::type::vector<bound_track_parameters<ALGEBRA>>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While you are at it, do we need to keep both the filtered and the smoothed parameters? If we deduplicate these we can save a lot of space. We can use additional memory in the fitting algorithm to keep track of any additional parameters we need, but the final EDM should not need both.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤔 Are you saying that the "state" should just be "chi2" and "params", with the main track container associating as many states to the different measurements as many are needed?
That does sound interesting. But in this specific PR I would really prefer to do just the AoS->SoA conversion. If we mix in more fundamental re-designs as well, things will be even harder to review. 🤔
| // ndf | ||
| vecmem::edm::type::vector<detray::dscalar<ALGEBRA>>, | ||
| // chi2 | ||
| vecmem::edm::type::vector<detray::dscalar<ALGEBRA>>, | ||
| // pval | ||
| vecmem::edm::type::vector<detray::dscalar<ALGEBRA>>, | ||
| // nholes | ||
| vecmem::edm::type::vector<unsigned int>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the rationale behind removing the fit quality object here, as the fit quality values are almost always used together? You are essentially quadrupling the number of memory accesses required.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's really not because of the access pattern that @beomki-yeo created traccc::fit_quality back in the day. The reason that this struct was added was because the same quality variables are used in the output of track finding and it track fitting.
So if you want to complain about something, you should've asked why we have duplication between these variable definitions between track_fit_collection and track_candidate_collection.
This duplication is one of the main reasons that we'll need to merge these two data models together.
But I'm really not convinced that using a struct for those variables instead of a fully SoA layout, makes much of a difference. With detailed profiling we may decide in the future that some of these individual float-s should be merged into small structs. But I don't think that even then we will group the exact same variables together that are in traccc::fit_quality at the moment.
| struct view { | ||
| /// The fitted tracks | ||
| track_fit_collection<ALGEBRA>::view tracks; | ||
| /// The track states used for the fit | ||
| track_state_collection<ALGEBRA>::view states; | ||
| /// The measurements used for the fit | ||
| measurement_collection_types::const_view measurements; | ||
| }; | ||
|
|
||
| struct const_view { | ||
| /// The fitted tracks | ||
| track_fit_collection<ALGEBRA>::const_view tracks; | ||
| /// The track states used for the fit | ||
| track_state_collection<ALGEBRA>::const_view states; | ||
| /// The measurements used for the fit | ||
| measurement_collection_types::const_view measurements; | ||
| }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I deeply dislike this design, as any algorithm that requires e.g. finding data and fitting data will need to receive the measurement data twice. I appreciate that you want to enforce consistency between the data, but then you actually need to embed the accessor methods in this struct. Otherwise, as you are doing now, you are just wasting space, complicating the code, and risking pointer aliasing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right now no part of the code needs track_candidate_container and track_fit_container at the same time.
For the longer future, this is something that the merging of the data models will take care of. With that in place, we shouldn't have duplication anymore.
I do very strongly believe though that having such simple structs for grouping together collections that belong together, is a pretty good way of organising the data.
| auto result_tracks_view = vecmem::get_data(result.tracks); | ||
| typename edm::track_fit_collection<algebra_t>::device | ||
| result_tracks_device{result_tracks_view}; | ||
| typename edm::track_fit_collection<algebra_t>::device::proxy_type | ||
| fitted_track_device = | ||
| result_tracks_device.at(result_tracks_device.size() - 1); | ||
| auto result_states_view = vecmem::get_data(result.states); | ||
| typename fitter_t::state fitter_state( | ||
| fitted_track_device, | ||
| typename edm::track_state_collection<algebra_t>::device{ | ||
| result_states_view}, | ||
| measurements, seqs_buffer); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this be simplified? This amount of boilerplate code really isn't acceptable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One of the long names could be replaced by auto, though I fear that it will make it even harder to understand what types are being used exactly.
The code here is largely due to the design decision we/I made many years ago. That "view" and "device" objects would be separate types. So now we need to always create device objects out of the views.
As long as we don't change that (we can have a discussion about it, but it's definitely not a black or white question whether we should), we will need to be explicit about all these conversions. As they are not made implicitly. (Again, to hopefully make it a bit easier to debug code. As auto-conversions could really get out of hand with things like this.)
| result.push_back(std::move(fitter_state.m_fit_res), | ||
| std::move(input_states)); | ||
| } else { | ||
| if (fit_status != kalman_fitter_status::SUCCESS) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
??
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please elaborate. Don't understand the question.
I commented about exactly his in: #1112 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the device code also records all tracks, this might be OK
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only thing that I'm not sure about all of a sudden is whether the host algorithm correctly sets the failure mode on the tracks. It might be that with the current setup of the PR we just get a generic UNKNOWN (the default value) for the track fit state.
So at least kalman_fitter_status may still need to be translated to a track_fit_outcome (as it is called in this PR) value. 🤔
| /// @param bound_params bound parameter | ||
| /// | ||
| /// @return true if the update succeeds | ||
| template <typename track_state_backend_t> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah let's not do that. Simply accept a template parameter track_state_t and use that. This more complicated system of accepting template arguments serves no purpose.
| template <typename T> | ||
| void stat_plot_tool::fill(stat_plot_cache& cache, | ||
| const edm::track_fit<T>& fit_res) const { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is T supposed to be algebra type here? If so please name it more appropriately.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
T here is the same "backend" that you asked about in gain_matric_updater.
I can of course rename T into track_state_backend_t here as well. 🤔
I think I more or less understand the situation by now. But maybe let me start with things that were, as it turns out, not clear to me previously:
So, once I use just The reason that the overall application ridiculously slows down is something that may be best to fix in a follow-up PR. 🤔 Since there are actually multiple problems with how copies are made by the "full chain algorithms" at the moment. 😦
So, as bad as this is, now that at least I think I understand what the situation is, I think it may be best to power on with the PR as it is. 🤔 |
Indeed. 👍 This is also the reason that our throughput benchmarks have been based on the single-threaded example from day one. It's the only meaningful way to measure the performance of individual kernels. |
89174cd to
ff6f57a
Compare
…::device. To make it a bit easier to deal with the slightly complex EDMs of track finding and fitting in the code.
Taking into account that the updated code now returns tracks that have not been successfully fitted. Previously it would not return the tracks for which the fit failed.
But now as a function in the examples code. Not as part of traccc::core.
Based on the PR feedback.
ff6f57a to
ca7c1a0
Compare
|
stephenswat
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Give the number of important changes blocked by this, let's just get it in... 😞


















It's time for the next iteration of the SoA migration. 🤔
I replaced the current
traccc::track_state_container_typestypes with 2 new collection, and a "helper type".traccc::edm::track_fit_collection<ALGEBRA>is a collection that stores overall information about the fitted tracks. Similar to what the "header" oftraccc::track_state_container_typesdoes;state_indices) which points at which track states belong to the tracks;traccc::edm::track_state_collection<ALGEBRA>is a 1D collection of track states. Every element describes one track state, with an index (measurement_index) to the measurement that the state is defined on top of;traccc::edm::track_fit_container<ALGEBRA>is a type similar totraccc::edm::track_candidate_container<ALGEBRA>. It defines simple structs that collect objects that need to be used together to be able to make sense of them.The code by now is in a semi-functional state. All of the code "runs", but seemingly the host code is producing buggy output. (Two host unit tests fail, while the device unit tests all succeed. Plus the comparisons with
traccc_seq_example_cudaandtraccc_seq_example_alpakashow differences between the host and device results.) This issue absolutely needs to be figured out before this could go in. But I anyway wanted to open a PR already, to make everyone start getting used to the new classes.Once/if this PR is sorted out, I plan the following developments:
traccc::measurementto an SoA layout should be fairly trivial once this goes in, so I will do that next;I didn't do any performance checks yet, since the code's behaviour is questionable at the moment. But hopefully with these changes we will both need less memory (no copies of the measurement properties) and will speed up a little.