-
Notifications
You must be signed in to change notification settings - Fork 15.2k
[Draft][DAGCombiner] Preserve debug location of original load in fold (conv (load x)) #160236
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Thank you for submitting a Pull Request (PR) to the LLVM Project! This PR will be automatically labeled and the relevant teams will be notified. If you wish to, you can add reviewers by using the "Reviewers" section on this page. If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers. If you have further questions, they may be answered by the LLVM GitHub User Guide. You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums. |
@llvm/pr-subscribers-debuginfo @llvm/pr-subscribers-backend-amdgpu Author: None (jwu10003) Changes[DAGCombiner] Preserve debug location of original load in fold (conv (load x)) This patch fixes a debug information loss issue during the combine of a conversion (e.g., bitcast) with a load into a new load: The newly created load node was incorrectly using the debug location ( This change ensures the new load inherits the debug location from Full diff: https://github.com/llvm/llvm-project/pull/160236.diff 2 Files Affected:
diff --git a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
index a6ba6e518899f..4cb0a35aa7b25 100644
--- a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
@@ -16703,7 +16703,7 @@ SDValue DAGCombiner::visitBITCAST(SDNode *N) {
}
}
SDValue Load =
- DAG.getLoad(VT, SDLoc(N), LN0->getChain(), LN0->getBasePtr(),
+ DAG.getLoad(VT, SDLoc(LN0), LN0->getChain(), LN0->getBasePtr(),
LN0->getMemOperand());
DAG.ReplaceAllUsesOfValueWith(N0.getValue(1), Load.getValue(1));
return Load;
diff --git a/llvm/test/CodeGen/AMDGPU/combine-conv-load.ll b/llvm/test/CodeGen/AMDGPU/combine-conv-load.ll
new file mode 100644
index 0000000000000..900c973b712ae
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/combine-conv-load.ll
@@ -0,0 +1,41 @@
+; RUN: llc -mtriple=amdgcn -mcpu=gfx942 < %s | FileCheck %s
+
+; CHECK-LABEL: test:
+; CHECK: .loc 1 8 16 ; test.py:8:16
+; CHECK-NEXT: s_load_dword
+
+; Function Attrs: alwaysinline mustprogress nofree norecurse nosync nounwind willreturn memory(argmem: readwrite)
+define amdgpu_kernel void @test(ptr addrspace(1) inreg readonly captures(none) %0, ptr addrspace(1) inreg writeonly captures(none) %1, ptr addrspace(1) inreg readnone captures(none) %2, ptr addrspace(1) inreg readnone captures(none) %3) local_unnamed_addr #0 !dbg !4 {
+ %5 = tail call i32 @llvm.amdgcn.workitem.id.x(), !dbg !7
+ %6 = and i32 %5, 255, !dbg !7
+ %7 = icmp eq i32 %6, 0, !dbg !7
+ br i1 %7, label %8, label %10, !dbg !7
+
+8: ; preds = %4
+ %9 = load <1 x float>, ptr addrspace(1) %0, align 4, !dbg !8, !amdgpu.noclobber !6
+ store <1 x float> %9, ptr addrspace(1) %1, align 4, !dbg !7
+ br label %10, !dbg !7
+
+10: ; preds = %8, %4
+ ret void, !dbg !9
+}
+
+; Function Attrs: alwaysinline nocallback nofree nosync nounwind speculatable willreturn memory(none)
+declare noundef range(i32 0, 1024) i32 @llvm.amdgcn.workitem.id.x() #1
+
+attributes #0 = { alwaysinline mustprogress nofree norecurse nosync nounwind willreturn memory(argmem: readwrite) "amdgpu-agpr-alloc"="0" "amdgpu-flat-work-group-size"="1,256" "amdgpu-no-completion-action" "amdgpu-no-default-queue" "amdgpu-no-dispatch-id" "amdgpu-no-dispatch-ptr" "amdgpu-no-flat-scratch-init" "amdgpu-no-heap-ptr" "amdgpu-no-hostcall-ptr" "amdgpu-no-implicitarg-ptr" "amdgpu-no-lds-kernel-id" "amdgpu-no-multigrid-sync-arg" "amdgpu-no-queue-ptr" "amdgpu-no-workgroup-id-x" "amdgpu-no-workgroup-id-y" "amdgpu-no-workgroup-id-z" "amdgpu-no-workitem-id-x" "amdgpu-no-workitem-id-y" "amdgpu-no-workitem-id-z" "amdgpu-waves-per-eu"="1,1" "denormal-fp-math-f32"="ieee" "uniform-work-group-size"="false" }
+attributes #1 = { alwaysinline nocallback nofree nosync nounwind speculatable willreturn memory(none) }
+
+!llvm.dbg.cu = !{!0}
+!llvm.module.flags = !{!2, !3}
+
+!0 = distinct !DICompileUnit(language: DW_LANG_C, file: !1, producer: "triton", isOptimized: true, runtimeVersion: 0, emissionKind: LineTablesOnly)
+!1 = !DIFile(filename: "test.py", directory: "/path")
+!2 = !{i32 2, !"Debug Info Version", i32 3}
+!3 = !{i32 1, !"amdhsa_code_object_version", i32 500}
+!4 = distinct !DISubprogram(name: "test", linkageName: "test", scope: !1, file: !1, line: 7, type: !5, scopeLine: 7, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !0)
+!5 = !DISubroutineType(cc: DW_CC_normal, types: !6)
+!6 = !{}
+!7 = !DILocation(line: 9, column: 20, scope: !4)
+!8 = !DILocation(line: 8, column: 16, scope: !4)
+!9 = !DILocation(line: 9, column: 4, scope: !4)
|
@@ -16703,7 +16703,7 @@ SDValue DAGCombiner::visitBITCAST(SDNode *N) { | |||
} | |||
} | |||
SDValue Load = | |||
DAG.getLoad(VT, SDLoc(N), LN0->getChain(), LN0->getBasePtr(), | |||
DAG.getLoad(VT, SDLoc(LN0), LN0->getChain(), LN0->getBasePtr(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would getMergedLocation work better?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
||
; Function Attrs: alwaysinline mustprogress nofree norecurse nosync nounwind willreturn memory(argmem: readwrite) | ||
define amdgpu_kernel void @test(ptr addrspace(1) inreg readonly captures(none) %0, ptr addrspace(1) inreg writeonly captures(none) %1, ptr addrspace(1) inreg readnone captures(none) %2, ptr addrspace(1) inreg readnone captures(none) %3) local_unnamed_addr #0 !dbg !4 { | ||
%5 = tail call i32 @llvm.amdgcn.workitem.id.x(), !dbg !7 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use named values in tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Delete redundant IR, retaining only the necessary load and store operations.
store <1 x float> %9, ptr addrspace(1) %1, align 4, !dbg !7 | ||
br label %10, !dbg !7 | ||
|
||
10: ; preds = %8, %4 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't need multiple blocks?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't need multiple blocks?
done
@@ -0,0 +1,41 @@ | |||
; RUN: llc -mtriple=amdgcn -mcpu=gfx942 < %s | FileCheck %s |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
; RUN: llc -mtriple=amdgcn -mcpu=gfx942 < %s | FileCheck %s | |
; RUN: llc -mtriple=amdgcn -mcpu=gfx942 < %s | FileCheck %s |
Test belongs in test/DebugInfo
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Test belongs in test/DebugInfo
done
; CHECK: .loc 1 8 16 ; test.py:8:16 | ||
; CHECK-NEXT: s_load_dword | ||
|
||
; Function Attrs: alwaysinline mustprogress nofree norecurse nosync nounwind willreturn memory(argmem: readwrite) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
; Function Attrs: alwaysinline mustprogress nofree norecurse nosync nounwind willreturn memory(argmem: readwrite) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
attributes #0 = { alwaysinline mustprogress nofree norecurse nosync nounwind willreturn memory(argmem: readwrite) "amdgpu-agpr-alloc"="0" "amdgpu-flat-work-group-size"="1,256" "amdgpu-no-completion-action" "amdgpu-no-default-queue" "amdgpu-no-dispatch-id" "amdgpu-no-dispatch-ptr" "amdgpu-no-flat-scratch-init" "amdgpu-no-heap-ptr" "amdgpu-no-hostcall-ptr" "amdgpu-no-implicitarg-ptr" "amdgpu-no-lds-kernel-id" "amdgpu-no-multigrid-sync-arg" "amdgpu-no-queue-ptr" "amdgpu-no-workgroup-id-x" "amdgpu-no-workgroup-id-y" "amdgpu-no-workgroup-id-z" "amdgpu-no-workitem-id-x" "amdgpu-no-workitem-id-y" "amdgpu-no-workitem-id-z" "amdgpu-waves-per-eu"="1,1" "denormal-fp-math-f32"="ieee" "uniform-work-group-size"="false" } | ||
attributes #1 = { alwaysinline nocallback nofree nosync nounwind speculatable willreturn memory(none) } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can drop all of this
attributes #0 = { alwaysinline mustprogress nofree norecurse nosync nounwind willreturn memory(argmem: readwrite) "amdgpu-agpr-alloc"="0" "amdgpu-flat-work-group-size"="1,256" "amdgpu-no-completion-action" "amdgpu-no-default-queue" "amdgpu-no-dispatch-id" "amdgpu-no-dispatch-ptr" "amdgpu-no-flat-scratch-init" "amdgpu-no-heap-ptr" "amdgpu-no-hostcall-ptr" "amdgpu-no-implicitarg-ptr" "amdgpu-no-lds-kernel-id" "amdgpu-no-multigrid-sync-arg" "amdgpu-no-queue-ptr" "amdgpu-no-workgroup-id-x" "amdgpu-no-workgroup-id-y" "amdgpu-no-workgroup-id-z" "amdgpu-no-workitem-id-x" "amdgpu-no-workitem-id-y" "amdgpu-no-workitem-id-z" "amdgpu-waves-per-eu"="1,1" "denormal-fp-math-f32"="ieee" "uniform-work-group-size"="false" } | |
attributes #1 = { alwaysinline nocallback nofree nosync nounwind speculatable willreturn memory(none) } | |
attributes #0 = { nounwind } | |
attributes #1 = { alwaysinline nocallback nofree nosync nounwind speculatable willreturn memory(none) } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can drop all of this
done
define void @test(ptr addrspace(1) inreg readonly captures(none) %0, ptr addrspace(1) inreg writeonly captures(none) %1) local_unnamed_addr !dbg !4 { | ||
%3 = load <1 x float>, ptr addrspace(1) %0, align 4, !dbg !8, !amdgpu.noclobber !6 | ||
store <1 x float> %3, ptr addrspace(1) %1, align 4, !dbg !7 | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
%3 = load <1 x float>, ptr addrspace(1) %0, align 4, !dbg !8, !amdgpu.noclobber !6 | ||
store <1 x float> %3, ptr addrspace(1) %1, align 4, !dbg !7 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
%3 = load <1 x float>, ptr addrspace(1) %0, align 4, !dbg !8, !amdgpu.noclobber !6 | |
store <1 x float> %3, ptr addrspace(1) %1, align 4, !dbg !7 | |
%ld = load <1 x float>, ptr addrspace(1) %arg0, align 4, !dbg !8, !amdgpu.noclobber !6 | |
store <1 x float> %ld, ptr addrspace(1) %arg1, align 4, !dbg !7 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
; CHECK: .loc 1 8 16 prologue_end ; test.py:8:16 | ||
; CHECK-NEXT: s_load_dword | ||
|
||
define void @test(ptr addrspace(1) inreg readonly captures(none) %0, ptr addrspace(1) inreg writeonly captures(none) %1) local_unnamed_addr !dbg !4 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
define void @test(ptr addrspace(1) inreg readonly captures(none) %0, ptr addrspace(1) inreg writeonly captures(none) %1) local_unnamed_addr !dbg !4 { | |
define void @test(ptr addrspace(1) inreg readonly captures(none) %arg0, ptr addrspace(1) inreg writeonly captures(none) %arg1) local_unnamed_addr !dbg !4 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
…(load x)) This patch fixes a debug information loss issue during the combine of a conversion (e.g., bitcast) with a load into a new load: `fold (conv (load x)) -> (load (conv*)x)`. The newly created load node was incorrectly using the debug location (`SDLoc`) of the conversion operation (the `conv` node, `N`) instead of the location of the original load operation (the `load` node, `LN0`). The location of the conversion operation often points to compiler-internal instructions and provides little value for source-level debugging. In contrast, the original load's location accurately represents the source of the data access in the user's code. This change ensures the new load inherits the debug location from `LN0` by using `SDLoc(LN0)`, which improves debugging experience and fixes a test case failure observed in the Triton compiler.
13de15a
to
e647079
Compare
@arsenm do you know how to execute the "Build and Test Linux" and "Build and Test Windows" ? |
You can test this locally with the following command:git-clang-format --diff origin/main HEAD --extensions cpp -- llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
View the diff from clang-format here.diff --git a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
index 4cb0a35aa..d80f20317 100644
--- a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
@@ -16702,9 +16702,8 @@ SDValue DAGCombiner::visitBITCAST(SDNode *N) {
LN0->getMemOperand()->clearRanges();
}
}
- SDValue Load =
- DAG.getLoad(VT, SDLoc(LN0), LN0->getChain(), LN0->getBasePtr(),
- LN0->getMemOperand());
+ SDValue Load = DAG.getLoad(VT, SDLoc(LN0), LN0->getChain(),
+ LN0->getBasePtr(), LN0->getMemOperand());
DAG.ReplaceAllUsesOfValueWith(N0.getValue(1), Load.getValue(1));
return Load;
}
|
[DAGCombiner] Preserve debug location of original load in fold (conv (load x))
This patch fixes a debug information loss issue during the combine of a conversion (e.g., bitcast) with a load into a new load:
fold (conv (load x)) -> (load (conv*)x)
.The newly created load node was incorrectly using the debug location (
SDLoc
) of the conversion operation (theconv
node,N
) instead of the location of the original load operation (theload
node,LN0
). The location of the conversion operation often points to compiler-internal instructions and provides little value for source-level debugging. In contrast, the original load's location accurately represents the source of the data access in the user's code.This change ensures the new load inherits the debug location from
LN0
by usingSDLoc(LN0)
, which improves debugging experience and fixes a test case failure observed in the Triton compiler.