Skip to content

Conversation

dwblaikie
Copy link
Collaborator

For structured bindings that use custom get specializations, the
resulting LLVM IR ascribes the load of the result of get to the
binding's declaration, rather than the place where the binding is
referenced - this caused awkward sequencing in the debug info where,
when stepping through the code you'd step back to the binding
declaration every time there was a reference to the binding.

To fix that - when we cross into IRGening a binding - suppress the debug
info location of that subexpression.

I don't represent this as a great bit of API design - certainly open to
ideas, but putting it out here as a place to start.

It's /possible/ this is an incomplete fix, even - if the binding decl
had other subexpressions, those would still get their location applied &
it'd likely be wrong.

So maybe that's a direction to go with to productionize this - add a new
location scoped device that suppresses any overriding - this might be
more robust. How do people feel about that?

@llvmbot llvmbot added clang Clang issues not falling into any other category clang:codegen IR generation bugs: mangling, exceptions, etc. labels Aug 14, 2025
@llvmbot
Copy link
Member

llvmbot commented Aug 14, 2025

@llvm/pr-subscribers-clang

@llvm/pr-subscribers-clang-codegen

Author: David Blaikie (dwblaikie)

Changes

For structured bindings that use custom get specializations, the
resulting LLVM IR ascribes the load of the result of get to the
binding's declaration, rather than the place where the binding is
referenced - this caused awkward sequencing in the debug info where,
when stepping through the code you'd step back to the binding
declaration every time there was a reference to the binding.

To fix that - when we cross into IRGening a binding - suppress the debug
info location of that subexpression.

I don't represent this as a great bit of API design - certainly open to
ideas, but putting it out here as a place to start.

It's /possible/ this is an incomplete fix, even - if the binding decl
had other subexpressions, those would still get their location applied &
it'd likely be wrong.

So maybe that's a direction to go with to productionize this - add a new
location scoped device that suppresses any overriding - this might be
more robust. How do people feel about that?


Full diff: https://github.com/llvm/llvm-project/pull/153637.diff

3 Files Affected:

  • (modified) clang/lib/CodeGen/CGExpr.cpp (+10-6)
  • (modified) clang/lib/CodeGen/CodeGenFunction.h (+4-2)
  • (modified) clang/test/CodeGenCXX/debug-info-structured-binding.cpp (+6-1)
diff --git a/clang/lib/CodeGen/CGExpr.cpp b/clang/lib/CodeGen/CGExpr.cpp
index d5df6dd3e303c..b85fffc97e5c0 100644
--- a/clang/lib/CodeGen/CGExpr.cpp
+++ b/clang/lib/CodeGen/CGExpr.cpp
@@ -1565,12 +1565,13 @@ LValue CodeGenFunction::EmitCheckedLValue(const Expr *E, TypeCheckKind TCK) {
 /// length type, this is not possible.
 ///
 LValue CodeGenFunction::EmitLValue(const Expr *E,
-                                   KnownNonNull_t IsKnownNonNull) {
+                                   KnownNonNull_t IsKnownNonNull, bool SuppressLocation) {
   // Running with sufficient stack space to avoid deeply nested expressions
   // cause a stack overflow.
   LValue LV;
-  CGM.runWithSufficientStackSpace(
-      E->getExprLoc(), [&] { LV = EmitLValueHelper(E, IsKnownNonNull); });
+  CGM.runWithSufficientStackSpace(E->getExprLoc(), [&] {
+    LV = EmitLValueHelper(E, IsKnownNonNull, SuppressLocation);
+  });
 
   if (IsKnownNonNull && !LV.isKnownNonNull())
     LV.setKnownNonNull();
@@ -1586,8 +1587,11 @@ static QualType getConstantExprReferredType(const FullExpr *E,
 }
 
 LValue CodeGenFunction::EmitLValueHelper(const Expr *E,
-                                         KnownNonNull_t IsKnownNonNull) {
-  ApplyDebugLocation DL(*this, E);
+                                         KnownNonNull_t IsKnownNonNull,
+                                         bool SuppressLocation) {
+  std::optional<ApplyDebugLocation> DL;
+  if (!SuppressLocation)
+    DL.emplace(*this, E);
   switch (E->getStmtClass()) {
   default: return EmitUnsupportedLValue(E, "l-value expression");
 
@@ -3325,7 +3329,7 @@ LValue CodeGenFunction::EmitDeclRefLValue(const DeclRefExpr *E) {
       auto *FD = LambdaCaptureFields.lookup(BD);
       return EmitCapturedFieldLValue(*this, FD, CXXABIThisValue);
     }
-    return EmitLValue(BD->getBinding());
+    return EmitLValue(BD->getBinding(), NotKnownNonNull, true);
   }
 
   // We can form DeclRefExprs naming GUID declarations when reconstituting
diff --git a/clang/lib/CodeGen/CodeGenFunction.h b/clang/lib/CodeGen/CodeGenFunction.h
index ad318f289ee83..e9ccee9c9c86b 100644
--- a/clang/lib/CodeGen/CodeGenFunction.h
+++ b/clang/lib/CodeGen/CodeGenFunction.h
@@ -4225,10 +4225,12 @@ class CodeGenFunction : public CodeGenTypeCache {
   /// variable length type, this is not possible.
   ///
   LValue EmitLValue(const Expr *E,
-                    KnownNonNull_t IsKnownNonNull = NotKnownNonNull);
+                    KnownNonNull_t IsKnownNonNull = NotKnownNonNull,
+                    bool SuppressLocation = false);
 
 private:
-  LValue EmitLValueHelper(const Expr *E, KnownNonNull_t IsKnownNonNull);
+  LValue EmitLValueHelper(const Expr *E, KnownNonNull_t IsKnownNonNull,
+                          bool SuppressLocation);
 
 public:
   /// Same as EmitLValue but additionally we generate checking code to
diff --git a/clang/test/CodeGenCXX/debug-info-structured-binding.cpp b/clang/test/CodeGenCXX/debug-info-structured-binding.cpp
index 5fbd54c16382c..d1de46807f5f0 100644
--- a/clang/test/CodeGenCXX/debug-info-structured-binding.cpp
+++ b/clang/test/CodeGenCXX/debug-info-structured-binding.cpp
@@ -7,6 +7,7 @@
 // CHECK: #dbg_declare(ptr %{{[0-9]+}}, ![[VAR_4:[0-9]+]], !DIExpression(DW_OP_deref, DW_OP_plus_uconst, 4),
 // CHECK: #dbg_declare(ptr %z1, ![[VAR_5:[0-9]+]], !DIExpression()
 // CHECK: #dbg_declare(ptr %z2, ![[VAR_6:[0-9]+]], !DIExpression()
+// CHECK: load ptr, ptr %z1, {{.*}}!dbg ![[Z1_DEBUG_LOC:[0-9]+]]
 // CHECK: ![[VAR_0]] = !DILocalVariable(name: "a"
 // CHECK: ![[VAR_1]] = !DILocalVariable(name: "x1", scope: !{{[0-9]+}}, file: !{{[0-9]+}}, line: {{[0-9]+}}, type: !{{[0-9]+}})
 // CHECK: ![[VAR_2]] = !DILocalVariable(name: "y1", scope: !{{[0-9]+}}, file: !{{[0-9]+}}, line: {{[0-9]+}}, type: !{{[0-9]+}})
@@ -46,5 +47,9 @@ int f() {
   auto [x1, y1] = a;
   auto &[x2, y2] = a;
   auto [z1, z2] = B{1, 2};
-  return x1 + y1 + x2 + y2 + z1 + z2;
+  return x1 + y1 + x2 + y2 + 
+// CHECK: ![[Z1_DEBUG_LOC]] = !DILocation(line: [[@LINE+1]]
+     z1  //
+     +   //
+     z2; //
 }

Copy link

github-actions bot commented Aug 14, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

@efriedma-quic
Copy link
Collaborator

Really, the AST design is weird: we're not supposed to do any computation when you reference parts of the binding. We represent it as an expression just to reduce the code required to handle bindings. As such, the set of expressions actually allowed in a binding are quite limited. So this might be sufficient.

But this at least deserves a comment in the code where we handle BindingDecl. And testcases for all the forms of bindings (array, vector, member, complex, tuple).

@labath
Copy link
Collaborator

labath commented Aug 15, 2025

It'd be nice to have this fixed. I don't think I can say anything about the implementation.

Copy link
Member

@Michael137 Michael137 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I run into this stepping behaviour once in a while. Happy to see it get fixed

Given what @rjmccall, if we don't know of any sub-expressions that could cause this logic to fail, then happy to keep it as proposed. But agreed, a comment and some more tests would be nice

You might be able to use the tests I wanted to land (but haven't yet) in: #122265, to sanity check this (or even move some of those tests here). I think that covers all the possible expressions I found back then, but wouldn't be surprised if there's more

…location, not the binding's declaration's location

For structured bindings that use custom `get` specializations, the
resulting LLVM IR ascribes the load of the result of `get` to the
binding's declaration, rather than the place where the binding is
referenced - this caused awkward sequencing in the debug info where,
when stepping through the code you'd step back to the binding
declaration every time there was a reference to the binding.

To fix that - when we cross into IRGening a binding - suppress the debug
info location of that subexpression.

I don't represent this as a great bit of API design - certainly open to
ideas, but putting it out here as a place to start.

It's /possible/ this is an incomplete fix, even - if the binding decl
had other subexpressions, those would still get their location applied &
it'd likely be wrong.

So maybe that's a direction to go with to productionize this - add a new
location scoped device that suppresses any overriding - this might be
more robust. How do people feel about that?
@dwblaikie dwblaikie force-pushed the structure_binding_reference_debug_info branch from ab63c6d to 45dfb2a Compare September 2, 2025 20:42
@dwblaikie
Copy link
Collaborator Author

Found a pre-existinrg scoped debug-suppression helper and refactored that to be reusable in this case.
Added test coverage.

@dwblaikie dwblaikie enabled auto-merge (squash) September 2, 2025 20:43
@dwblaikie dwblaikie merged commit 665e875 into llvm:main Sep 2, 2025
9 checks passed
lakshayk-nv pushed a commit to lakshayk-nv/llvm-project that referenced this pull request Sep 3, 2025
…location, not the binding's declaration's location (llvm#153637)

For structured bindings that use custom `get` specializations, the
resulting LLVM IR ascribes the load of the result of `get` to the
binding's declaration, rather than the place where the binding is
referenced - this caused awkward sequencing in the debug info where,
when stepping through the code you'd step back to the binding
declaration every time there was a reference to the binding.

To fix that - when we cross into IRGening a binding - suppress the debug
info location of that subexpression.

I don't represent this as a great bit of API design - certainly open to
ideas, but putting it out here as a place to start.

It's /possible/ this is an incomplete fix, even - if the binding decl
had other subexpressions, those would still get their location applied &
it'd likely be wrong.

So maybe that's a direction to go with to productionize this - add a new
location scoped device that suppresses any overriding - this might be
more robust. How do people feel about that?
// leads to jumpy debug stepping behavior where the location/debugger jump
// back to the binding declaration, then back to the expression referencing
// the binding)
DisableDebugLocationUpdates D(*this);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

@Michael137
Copy link
Member

Michael137 commented Sep 3, 2025

Can we cherry-pick this to release/21.x? I'm planning to pull this into the Apple LLVM fork for the 21.x release

David, I'll prepare the cherry-pick and CC you on the PR for review

@Michael137
Copy link
Member

/cherry-pick 665e875

@llvmbot
Copy link
Member

llvmbot commented Sep 3, 2025

/pull-request #156664

@llvmbot llvmbot moved this from Needs Triage to Done in LLVM Release Status Sep 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:codegen IR generation bugs: mangling, exceptions, etc. clang Clang issues not falling into any other category
Projects
Development

Successfully merging this pull request may close these issues.

5 participants