Narrow blanket SPIR-V legalization work in optimizer recipes#6612
Open
AnastaZIuk wants to merge 5 commits intoKhronosGroup:mainfrom
Open
Narrow blanket SPIR-V legalization work in optimizer recipes#6612AnastaZIuk wants to merge 5 commits intoKhronosGroup:mainfrom
AnastaZIuk wants to merge 5 commits intoKhronosGroup:mainfrom
Conversation
s-perron
reviewed
Mar 26, 2026
Collaborator
s-perron
left a comment
There was a problem hiding this comment.
I have responded on the corresponding DXC pr: microsoft/DirectXShaderCompiler#8283 (review).
Comment on lines
133
to
134
| // Make sure uses and definitions are in the same function. | ||
| .RegisterPass(CreateInlineExhaustivePass()) |
There was a problem hiding this comment.
whats the purpose here?
Collaborator
There was a problem hiding this comment.
Inline enables many other optimizations. We do not implement inter-procedural-optimizations. If you are going to copy-propagate something written to in one function, and used in another function, the have to be inlined.
There was a problem hiding this comment.
oh I just hacked a NBL_REF_ARG via expanding vk::ext_reference on regular function inout parameters, so we have less copies, but yes makes sense.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
SSARewriteModeRegisterLegalizationPasses(bool preserve_interface, bool include_loop_unroll, SSARewriteMode ssa_rewrite_mode)ssa_rewrite_modeinclude_loop_unrollLoopControl::UnrollSSARewriteMode::SpecialTypesOpImageTexelPointerimage operands inLocalSingleStoreElimRoot cause
The current SPIR-V optimizer recipes still carry old blanket unroll decisions that inflate the module and then pay for expensive cleanup over that self-inflated IR.
LoopControl::Unrollas an IR hint is not the problem. The expensive part is treating that hint as a blanket request to immediately materialize full unroll in the generic optimizer path even when legality does not require it.The same pattern existed in the generic SSA cleanup path. The hard legality constraints are narrow and concentrated around special cases such as opaque or resource-like objects, but the old recipe was still paying for broader cleanup over generic IR.
DXC can still provide targeted producer-side signals for the narrower correctness-sensitive cases in microsoft/DirectXShaderCompiler#8283.
The narrower recipe also exposed one existing image-atomic cleanup dependency. In that path local single-store elimination could rewrite through copied image values and leave
OpImageTexelPointerwith a non-pointer image operand. This branch now fixes that directly instead of restoring blanket cleanup.Spec basis
The core SPIR-V specification is direct here:
Spec:
Khronos guidance on offline SPIR-V transforms is also direct:
Whitepaper:
The same spec split also matters for SSA cleanup. The hard legality rules are narrow:
Spec:
And core SPIR-V is explicit that generic storage-class reasoning does not automatically apply to intermediate SSA values:
Spec:
For the image-atomic follow-up,
OpImageTexelPointeris also explicit:Spec:
Benchmark
19.161 s -> 3.02282 s(median of 3 runs)Validation
CodeGenSPIRVon the companion DXC branch passes with1438expected passes,2expected failures, and0unexpected failuresCompanion DXC PR:
microsoft/DirectXShaderCompiler#8283