Throughput Updates, main branch (2025.08.18.) #1130
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.



Following #1112, here I try to restore the overall throughput of the GPU applications. To jump to the chase already, I see the following throughput with the code as it is just before #1112 would have been merged in:
And with this PR's code I see:
There is unfortunately still a slight drop, which I intend to look a bit more at still, but the code is creating a more representative description of the reconstructed tracks in this new version in host code than was available before #1112. (The tracks to states jagged indices are copied back to the host in the new version, while in the old version all that info was left on the device.)
Finally, about the PR:
vecmem::host_memory_resource. Leaving anything more specific to the full chain algorithm classes.But as I started, I'll still look a bit more at this, to see if it could be made yet a little faster / more efficient.