-
Notifications
You must be signed in to change notification settings - Fork 76
Return n_regs for binaries compiled explicitly with a register size mode option #2391
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Contributor
Author
|
@etiotto this is ready for review |
9b64373 to
3187248
Compare
etiotto
reviewed
Oct 15, 2024
victor-eds
reviewed
Oct 15, 2024
Contributor
victor-eds
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks good but NITS
3187248 to
6bd80e4
Compare
etiotto
approved these changes
Oct 17, 2024
aa37c82 to
6a606b5
Compare
victor-eds
approved these changes
Oct 18, 2024
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Intel Data Center Max GPUs will dynamically scale the number of hardware threads available per XVE depending on the specified GRF mode. With small GRF mode (default), a single hardware thread can access 128 GRF registers and each XVE engine has 8 hardware threads. In large GRF mode, a single hardware thread can access 256 GRF registers but each XVE engine only has 4 hardware threads. There is also an auto mode.
(see the docs for more info)
This PR adds support for populating the
n_regsparameter returned from loading a binary with information about the selected GRF mode. Because L0 does not return the number of registers and our register size info does not work like NVIDIA, the semantics are a bit different from upstream Triton. We only return a value if the user has specified a small or large GRF mode build flag. The purpose of returningn_regsin upstream Triton/Torch Inductor is b/c NVIDIA can dynamically adjust occupancy of a SM based on the register pressure per warp. This means high register pressure can result in fewer running warps which reduces parallelism and performance. Theoretically, you can have many different "GRF modes" on a NVIDIA GPU as you adjust SM occupancy. For Intel GPUs, the choice is binary - large or small - and the performance penalty for register spills in small always outweighs any parallelism gains (at least, in our testing so far). It is not clear that returning 128 is actionable as further reductions in register usage will not effect occupancy - only the large GRF mode effects occupancy. So, I focused on making sure large GRF mode was properly handled and other cases were handled as we were able, with any ambiguous case returning 0 (which will cause torch inductor to skip any register-specific optimization).The approach to returning GRF size is dependent on parsing the build flags passed to the binary loader. Because the build flags are modified in the
make_spvstep during generation of native code instead of a SPIRV file, this approach should work for the native code POC recently merged in #2148.Note that I had to introduce exceptions into our
driver.ccode to make the error handling acceptable. This cleaned up a lot of the code, and I believe should be acceptable both because we already depend on c++ indriver.c(just not in the external signatures) and because exceptions are used in other parts of the Triton codebase.I marked this as a draft PR because I would like to do a bit more testing, but it is ready for review.
Close #1641