Return n_regs for binaries compiled explicitly with a register size mode option #2391

alexbaden · 2024-10-01T02:04:35Z

Intel Data Center Max GPUs will dynamically scale the number of hardware threads available per XVE depending on the specified GRF mode. With small GRF mode (default), a single hardware thread can access 128 GRF registers and each XVE engine has 8 hardware threads. In large GRF mode, a single hardware thread can access 256 GRF registers but each XVE engine only has 4 hardware threads. There is also an auto mode.

(see the docs for more info)

This PR adds support for populating the n_regs parameter returned from loading a binary with information about the selected GRF mode. Because L0 does not return the number of registers and our register size info does not work like NVIDIA, the semantics are a bit different from upstream Triton. We only return a value if the user has specified a small or large GRF mode build flag. The purpose of returning n_regs in upstream Triton/Torch Inductor is b/c NVIDIA can dynamically adjust occupancy of a SM based on the register pressure per warp. This means high register pressure can result in fewer running warps which reduces parallelism and performance. Theoretically, you can have many different "GRF modes" on a NVIDIA GPU as you adjust SM occupancy. For Intel GPUs, the choice is binary - large or small - and the performance penalty for register spills in small always outweighs any parallelism gains (at least, in our testing so far). It is not clear that returning 128 is actionable as further reductions in register usage will not effect occupancy - only the large GRF mode effects occupancy. So, I focused on making sure large GRF mode was properly handled and other cases were handled as we were able, with any ambiguous case returning 0 (which will cause torch inductor to skip any register-specific optimization).

The approach to returning GRF size is dependent on parsing the build flags passed to the binary loader. Because the build flags are modified in the make_spv step during generation of native code instead of a SPIRV file, this approach should work for the native code POC recently merged in #2148.

Note that I had to introduce exceptions into our driver.c code to make the error handling acceptable. This cleaned up a lot of the code, and I believe should be acceptable both because we already depend on c++ in driver.c (just not in the external signatures) and because exceptions are used in other parts of the Triton codebase.

I marked this as a draft PR because I would like to do a bit more testing, but it is ready for review.

Close #1641

alexbaden · 2024-10-09T20:23:49Z

@etiotto this is ready for review

third_party/intel/backend/driver.c

victor-eds

Overall looks good but NITS

third_party/intel/backend/driver.c

alexbaden requested review from etiotto and whitneywhtsang October 1, 2024 02:04

alexbaden force-pushed the alex/n_regs branch from 3ff57b5 to 92d26dc Compare October 9, 2024 15:23

alexbaden marked this pull request as ready for review October 9, 2024 20:23

alexbaden force-pushed the alex/n_regs branch from 92d26dc to 9b64373 Compare October 9, 2024 20:23

alexbaden force-pushed the alex/n_regs branch from 9b64373 to 3187248 Compare October 14, 2024 13:08

whitneywhtsang requested review from a team and Sarbojit2019 October 14, 2024 22:37

etiotto reviewed Oct 15, 2024

View reviewed changes

third_party/intel/backend/driver.c Outdated Show resolved Hide resolved

third_party/intel/backend/driver.c Show resolved Hide resolved

third_party/intel/backend/driver.c Outdated Show resolved Hide resolved

victor-eds reviewed Oct 15, 2024

View reviewed changes

third_party/intel/backend/driver.c Show resolved Hide resolved

third_party/intel/backend/driver.c Outdated Show resolved Hide resolved

third_party/intel/backend/driver.c Show resolved Hide resolved

alexbaden force-pushed the alex/n_regs branch from 3187248 to 6bd80e4 Compare October 17, 2024 00:56

etiotto approved these changes Oct 17, 2024

View reviewed changes

alexbaden added 5 commits October 17, 2024 19:33

Refactor loadBinary to support n_regs

897b125

Return n_regs using grf size build str flag

390eaae

small fixups

d7b0d98

Address review comments

eeedeeb

small fixup

6a606b5

alexbaden force-pushed the alex/n_regs branch from aa37c82 to 6a606b5 Compare October 17, 2024 19:37

victor-eds approved these changes Oct 18, 2024

View reviewed changes

alexbaden merged commit de5302d into main Oct 18, 2024
4 checks passed

alexbaden deleted the alex/n_regs branch October 18, 2024 13:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Return n_regs for binaries compiled explicitly with a register size mode option #2391

Return n_regs for binaries compiled explicitly with a register size mode option #2391

Uh oh!

alexbaden commented Oct 1, 2024 •

edited

Loading

Uh oh!

alexbaden commented Oct 9, 2024

Uh oh!

Uh oh!

Uh oh!

Uh oh!

victor-eds left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Return n_regs for binaries compiled explicitly with a register size mode option #2391

Return n_regs for binaries compiled explicitly with a register size mode option #2391

Uh oh!

Conversation

alexbaden commented Oct 1, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alexbaden commented Oct 9, 2024

Uh oh!

Uh oh!

Uh oh!

Uh oh!

victor-eds left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

alexbaden commented Oct 1, 2024 •

edited

Loading