-
Notifications
You must be signed in to change notification settings - Fork 15.2k
[AMDGPU] Try to reuse in v_cndmask the register with constant from compare. #148236
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds an AMDGPU DAG combiner optimization that detects when a SELECT reuses the same constant as a preceding SETCC and folds it to avoid redundant constant materialization. It introduces a helper to extract the constant and argument, implements foldCmpSelectWithSharedConstant, and hooks it into performSelectCombine. Extensive new FileCheck tests for integer and floating-point types verify both foldable and non-foldable patterns.
Reviewed Changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp | Implement foldCmpSelectWithSharedConstant and integrate it |
| llvm/test/CodeGen/AMDGPU/select-cmp-shared-constant-int.ll | Add Integer SELECT/SETCC folding tests |
| llvm/test/CodeGen/AMDGPU/select-cmp-shared-constant-fp.ll | Add Floating-point SELECT/SETCC folding tests |
Comments suppressed due to low confidence (1)
llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp:4947
- This optimization is tested for scalar integer and floating-point types. Consider adding tests for vector types (e.g., v4i32, v2f32) to ensure the combiner handles vector SELECT patterns correctly.
if (SDValue Folded = foldCmpSelectWithSharedConstant(N, DCI, Subtarget))
1ef8b84 to
0617b6e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Adds a DAG combine to merge SETCC+SELECT patterns that reuse the same constant, so the register loaded for the compare can be reused in the conditional mask. Also includes a comprehensive set of new tests covering integer and floating-point types (I8/I16/I32/I64, F16, F32, F64, BF16) to validate both folded and non-folded cases.
- Introduce
foldCmpSelectWithSharedConstantto detect and fold SELECTs that share a constant with their SETCC. - Hook the new fold into
performSelectCombinein AMDGPUISelLowering. - Add new LLVM IR tests in
select-cmp-shared-constant-int.llandselect-cmp-shared-constant-fp.ll.
Reviewed Changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp | Added foldCmpSelectWithSharedConstant and integrate it into select combining logic. |
| llvm/test/CodeGen/AMDGPU/select-cmp-shared-constant-int.ll | New tests for integer constant reuse in cndmask folding. |
| llvm/test/CodeGen/AMDGPU/select-cmp-shared-constant-fp.ll | New tests for floating-point constant reuse in cndmask folding. |
Comments suppressed due to low confidence (1)
llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp:4845
- [nitpick] The comment refers to patterns like "setccinv", but the function currently only handles ISD::SETCC with equality and inequality codes. Consider updating the comment to accurately describe the supported patterns or extending the code to explicitly handle inverted SETCC nodes if needed.
// Detect when CMP and SELECT use the same constant and fold them to avoid
Copilot
AI
Jul 14, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] Casting the AMDGPUSubtarget pointer to GCNSubtarget may obscure the actual type relationship. Consider using the AMDGPUSubtarget interface directly or adding a runtime check/assertion to ensure the cast is safe, improving clarity for future maintainers.
| const GCNSubtarget *GCNST = static_cast<const GCNSubtarget *>(ST); | |
| const GCNSubtarget *GCNST = dyn_cast<GCNSubtarget>(ST); | |
| assert(GCNST && "ST is not of type GCNSubtarget"); |
Copilot
AI
Jul 14, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] The code duplicates logic for floating-point and integer immediate eligibility checks. Refactoring the common pattern into a helper or template function could reduce duplication and improve readability.
For some targets, the optimization X == Const ? X : Y → X == Const ? Const : Y can cause extra register usage or redundant immediate encoding for the constant in cndmask generated from the ternary operation. This patch detects such cases and reuses the register from the compare instruction that already holds the constant, instead of materializing it again for cndmask. The optimization avoids immediates that can be encoded into cndmask instruction (including +-0.0), as well as !isNormal() constants.
0617b6e to
57436f3
Compare
For some targets, the optimization X == Const ? X : Y → X == Const ? Const : Y
can cause extra register usage or redundant immediate encoding for the constant
in cndmask generated from the ternary operation.
This patch detects such cases and reuses the register from the compare instruction
that already holds the constant, instead of materializing it again for cndmask.
The optimization avoids immediates that can be encoded into cndmask instruction
(including +-0.0), as well as !isNormal() constants.