-
Notifications
You must be signed in to change notification settings - Fork 15k
[libc] Polish GPU benchmarking #153900
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
[libc] Polish GPU benchmarking #153900
Changes from all commits
Commits
Show all changes
4 commits
Select commit
Hold shift + click to select a range
45e974b
Clean up headers and CMake deps
leandrolcampos 607f4e7
Remove dead `BenchmarkLogger`
leandrolcampos 6642c9a
Fix precision-loss warning in NVPTX version of `latency()`
leandrolcampos 71dd568
Round up scaled iteration count to ensure growth
leandrolcampos File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file was deleted.
Oops, something went wrong.
This file was deleted.
Oops, something went wrong.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Weird that we need this here and not in the AMDGPU version, seems weird that these wouldn't be the same type. What were you seeing here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason I had to touch the NVPTX version (and not the AMDGPU one) is simply that only the NVPTX
latency()
had this instruction at the end:In the ctype benches, we instantiate
latency<int (*)(int), char>
. The function returnsint
, but the template parameterT
(the input type) ischar
. That line therefore assigns anint
to avolatile char
, which produces:On AMDGPU, there is no such instruction in
latency()
(it seems to rely on thev_or_b32
asm to use the value), and it also has a small-type special-case for the asm operand (char/bool), so there’s no narrowing assignment there.I didn’t dive deeper here because
latency()
isn’t used by the math benchmarks; I only wanted to make the ctype benches warning-free.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I vaguely remember needing that because the PTX optimizer would defy the inline assembly so I still needed it. That's fine it needed.