Skip to content

[DirectX] Support ConstExpr GEPs #148986

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
27 changes: 27 additions & 0 deletions llvm/lib/Target/DirectX/DXILDataScalarization.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -305,6 +305,33 @@ bool DataScalarizerVisitor::visitGetElementPtrInst(GetElementPtrInst &GEPI) {
Type *OrigGEPType = GEPI.getSourceElementType();
Type *NewGEPType = OrigGEPType;
bool NeedsTransform = false;
// Check if the pointer operand is a ConstantExpr GEP
if (auto *PtrOpGEPCE = dyn_cast<ConstantExpr>(PtrOperand);
PtrOpGEPCE && PtrOpGEPCE->getOpcode() == Instruction::GetElementPtr) {
if (GlobalVariable *NewGlobal =
Copy link
Contributor

@Icohedron Icohedron Jul 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: The pointer operand of a ConstantExpr GEP can be another ConstantExpr GEP.

Example:

define void @global_nested_geps() {
; CHECK-LABEL: define void @global_nested_geps(
; CHECK: {{.*}} = load i32, ptr getelementptr inbounds ([24 x i32], ptr @a.1dim, i32 0, i32 6), align 4
; CHECK-NEXT: ret void
%1 = load i32, i32* getelementptr inbounds ([4 x i32], [4 x i32]* getelementptr inbounds ([3 x [4 x i32]], [3 x [4 x i32]]* getelementptr inbounds ([2 x [3 x [4 x i32]]], [2 x [3 x [4 x i32]]]* @a, i32 0, i32 0), i32 0, i32 1), i32 0, i32 2), align 4
ret void
}

Copy link
Member

@farzonl farzonl Jul 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was afraid of that. Then we will have to make this recursive for constexprs. Which sucks that we will have to do this twice once for data scalarization and then again for flattening. I'm wondering if it wouldn't just make more sense to undo all the constExprs before these passes?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it will ever naturally occur though. It may suffice to assert that the ConstantExpr GEP's pointer operand is a global variable and deal with it later if it does become an actual problem.

AFAIK there isn't a good reason to codegen a multiply-nested ConstantExpr GEP in the first place when you can codegen a single ConstantExpr GEP instead because all the indices are ConstantInts.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR supercedes this one and will fix the multilple nesting case: #150082

lookupReplacementGlobal(PtrOpGEPCE->getOperand(0))) {
GetElementPtrInst *NestedGEP =
cast<GetElementPtrInst>(PtrOpGEPCE->getAsInstruction());
NestedGEP->insertBefore(GEPI.getIterator());

// Create a new GEP with the replaced global directly
IRBuilder<> Builder(&GEPI);
Type *NewNestedGEPType = NewGlobal->getValueType();

// Extract indices from the ConstantExpr GEP
SmallVector<Value *, MaxVecSize> NestedIndices(NestedGEP->indices());
Value *NewNestedGEP =
Builder.CreateGEP(NewNestedGEPType, NewGlobal, NestedIndices,
NestedGEP->getName(), NestedGEP->getNoWrapFlags());

// Update the outer GEP to use the new nested GEP
GEPI.setOperand(GEPI.getPointerOperandIndex(), NewNestedGEP);
NestedGEP->replaceAllUsesWith(NewNestedGEP);
NestedGEP->eraseFromParent();
// Return true to indicate that we've modified the instruction
return true;
}
}

if (GlobalVariable *NewGlobal = lookupReplacementGlobal(PtrOperand)) {
NewGEPType = NewGlobal->getValueType();
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
; RUN: opt -S -passes='dxil-data-scalarization' -mtriple=dxil-pc-shadermodel6.4-library %s | FileCheck %s --check-prefixes=SCHECK,CHECK
; RUN: opt -S -passes='dxil-data-scalarization,dxil-flatten-arrays' -mtriple=dxil-pc-shadermodel6.4-library %s | FileCheck %s --check-prefixes=FCHECK,CHECK

@aTile = hidden addrspace(3) global [10 x [10 x <4 x i32>]] zeroinitializer, align 16
@bTile = hidden addrspace(3) global [10 x [10 x i32]] zeroinitializer, align 16

define void @CSMain() {
; CHECK-LABEL: define void @CSMain() {
; CHECK-NEXT: [[ENTRY:.*:]]
; CHECK-NEXT: [[AFRAGPACKED_I_SCALARIZE:%.*]] = alloca [4 x i32], align 16
; SCHECK-NEXT: [[TMP0:%.*]] = getelementptr inbounds [10 x <4 x i32>], ptr addrspace(3) getelementptr inbounds ([10 x [10 x [4 x i32]]], ptr addrspace(3) @aTile.scalarized, i32 0, i32 1), i32 0, i32 2
; FCHECK-NEXT: [[TMP0:%.*]] = load <4 x i32>, ptr addrspace(3) getelementptr inbounds ([400 x i32], ptr addrspace(3) @aTile.scalarized.1dim, i32 0, i32 48), align 16
; SCHECK-NEXT: [[TMP1:%.*]] = load <4 x i32>, ptr addrspace(3) [[TMP0]], align 16
; SCHECK-NEXT: store <4 x i32> [[TMP1]], ptr [[AFRAGPACKED_I_SCALARIZE]], align 16
; SCHECK-NEXT: ret void
;
entry:
%aFragPacked.i = alloca <4 x i32>, align 16
%0 = load <4 x i32>, ptr addrspace(3) getelementptr inbounds ([10 x <4 x i32>], ptr addrspace(3) getelementptr inbounds ([10 x [10 x <4 x i32>]], ptr addrspace(3) @aTile, i32 0, i32 1), i32 0, i32 2), align 16
store <4 x i32> %0, ptr %aFragPacked.i, align 16
ret void
}

define void @Main() {
; CHECK-LABEL: define void @Main() {
; CHECK-NEXT: [[ENTRY:.*:]]
; CHECK-NEXT: [[BFRAGPACKED_I:%.*]] = alloca i32, align 16
; SCHECK-NEXT: [[TMP0:%.*]] = getelementptr inbounds [10 x i32], ptr addrspace(3) getelementptr inbounds ([10 x [10 x i32]], ptr addrspace(3) @bTile, i32 0, i32 1), i32 0, i32 2
; FCHECK-NEXT: [[TMP0:%.*]] = load i32, ptr addrspace(3) getelementptr inbounds ([100 x i32], ptr addrspace(3) @bTile.1dim, i32 0, i32 12), align 16
; SCHECK-NEXT: [[TMP1:%.*]] = load i32, ptr addrspace(3) [[TMP0]], align 16
; SCHECK-NEXT: store i32 [[TMP1]], ptr [[BFRAGPACKED_I]], align 16
; SCHECK-NEXT: ret void
;
entry:
%bFragPacked.i = alloca i32, align 16
%0 = load i32, ptr addrspace(3) getelementptr inbounds ([10 x i32], ptr addrspace(3) getelementptr inbounds ([10 x [10 x i32]], ptr addrspace(3) @bTile, i32 0, i32 1), i32 0, i32 2), align 16
store i32 %0, ptr %bFragPacked.i, align 16
ret void
}
Loading