-
Notifications
You must be signed in to change notification settings - Fork 13.5k
Consolidate multiple tensor copies to reduce API overhead #15750
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
This also helps for the Vulkan backend, when HVV is disabled. When HVV is enabled, I suspect it is a marginal slowdown (extra memcopies) but in a very quick test I couldn't measure it. |
|
Here's the performance improvement I'm seeing from this change on RTX 5090 with HVV disabled: Would be nice to get this merged. |
|
This cannot be merged in this way, it's too hacky and not guaranteed to work every buffer type or in more complex scenarios, such as multiple GPUs. I am also currently making some significant changes to |
|
Thanks for reviewing @slaren , closing |
Fixes #15749
Make sure to read the contributing guidelines before submitting a PR