Skip to content

Conversation

@sstwcw
Copy link
Contributor

@sstwcw sstwcw commented Sep 16, 2025

When the style is
{AlignConsecutiveDeclarations: true, Cpp11BracedListStyle: false}, the program would sometimes align the lambda body with the outside. Like this.

const volatile auto result{ []() {
  const auto        something = 1;
  return 2;
} };

This patch stops it. Now the output looks like this.

const volatile auto result{ []() {
  const auto something = 1;
  return 2;
} };

Fixes #53699.

The problem was with how the AlignTokenSequence function tracked levels. The stack was pushed once when a token was more nested than the previous one. It was popped once when a token was less nested than the previous one. With the option Cpp11BracedListStyle disabled, the [ token was deeper than the previous token by 2 levels. Both its indentation level and nesting level were more than that of the previous one. But the stack was only pushed once. The following tokens popped the 2 levels separately. The function treated the body of the lambda expression as on the same level as the outside.

The problem is fixed this way. The function AlignTokens which calls the function AlignTokenSequence already uses a simpler and more robust way to track levels. It remembers the outermost level it should align. It compares the token's level with the outermost level. It does not need a stack. So it is immune to the problem. This patch makes the function AlignTokenSequence take as a parameter the indices of the tokens matched by the function AlignTokens. This way it does not have to contain the logic again.

Now the function AlignTokenSequence only aligns tokens already matched by the function AlignTokens. It is easy to see that the assertion on line 453 holds. The workaround on line 354 is not needed any more. The test on line 20310 added at the same time as the assertion ensures that the assertion hold.

The stack in the function AlignTokenSequence is kept for now. It is still used to figure out when things inside a level should move along with the outer level. Since the stack has the problem, the developer plans to move the logic into token annotator. It already uses a stack.

When the style is
`{AlignConsecutiveDeclarations: true, Cpp11BracedListStyle: false}`,
the program would sometimes align the lambda body with the outside.
Like this.

```C++
const volatile auto result{ []() {
  const auto        something = 1;
  return 2;
} };
```

This patch stops it.  Now the output looks like this.

```C++
const volatile auto result{ []() {
  const auto something = 1;
  return 2;
} };
```

Fixes llvm#53699.

The problem was with how the `AlignTokenSequence` function tracked
levels.  The stack was pushed once when a token was more nested than the
previous one.  It was popped once when a token was less nested than the
previous one.  With the option `Cpp11BracedListStyle` disabled, the `[`
token was deeper than the previous token by 2 levels.  Both its
indentation level and nesting level were more than that of the previous
one.  But the stack was only pushed once.  The following tokens popped
the 2 levels separately.  The function treated the body of the lambda
expression as on the same level as the outside.

The problem is fixed this way.  The function `AlignTokens` which calls
the function `AlignTokenSequence` already uses a simpler and more robust
way to track levels.  It remembers the outermost level it should align.
It compares the token's level with the outermost level.  It does not
need a stack.  So it is immune to the problem.  This patch makes the
function `AlignTokenSequence` take as a parameter the indices of the
tokens matched by the function `AlignTokens`.  This way it does not have
to contain the logic again.

Now the function `AlignTokenSequence` only aligns tokens already matched
by the function `AlignTokens`.  It is easy to see that the assertion on
line 453 holds.  The workaround on line 354 is not needed any more.  The
test on line 20310 added at the same time as the assertion ensures that
the assertion hold.

The stack in the function `AlignTokenSequence` is kept for now.  It is
still used to figure out when things inside a level should move along
with the outer level.  Since the stack has the problem, the developer
plans to move the logic into token annotator.  It already uses a stack.
@llvmbot
Copy link
Member

llvmbot commented Sep 16, 2025

@llvm/pr-subscribers-clang-format

Author: None (sstwcw)

Changes

When the style is
{AlignConsecutiveDeclarations: true, Cpp11BracedListStyle: false}, the program would sometimes align the lambda body with the outside. Like this.

const volatile auto result{ []() {
  const auto        something = 1;
  return 2;
} };

This patch stops it. Now the output looks like this.

const volatile auto result{ []() {
  const auto something = 1;
  return 2;
} };

Fixes #53699.

The problem was with how the AlignTokenSequence function tracked levels. The stack was pushed once when a token was more nested than the previous one. It was popped once when a token was less nested than the previous one. With the option Cpp11BracedListStyle disabled, the [ token was deeper than the previous token by 2 levels. Both its indentation level and nesting level were more than that of the previous one. But the stack was only pushed once. The following tokens popped the 2 levels separately. The function treated the body of the lambda expression as on the same level as the outside.

The problem is fixed this way. The function AlignTokens which calls the function AlignTokenSequence already uses a simpler and more robust way to track levels. It remembers the outermost level it should align. It compares the token's level with the outermost level. It does not need a stack. So it is immune to the problem. This patch makes the function AlignTokenSequence take as a parameter the indices of the tokens matched by the function AlignTokens. This way it does not have to contain the logic again.

Now the function AlignTokenSequence only aligns tokens already matched by the function AlignTokens. It is easy to see that the assertion on line 453 holds. The workaround on line 354 is not needed any more. The test on line 20310 added at the same time as the assertion ensures that the assertion hold.

The stack in the function AlignTokenSequence is kept for now. It is still used to figure out when things inside a level should move along with the outer level. Since the stack has the problem, the developer plans to move the logic into token annotator. It already uses a stack.


Full diff: https://github.com/llvm/llvm-project/pull/159140.diff

3 Files Affected:

  • (modified) clang/lib/Format/TokenAnnotator.cpp (+3-2)
  • (modified) clang/lib/Format/WhitespaceManager.cpp (+26-23)
  • (modified) clang/unittests/Format/FormatTest.cpp (+5)
diff --git a/clang/lib/Format/TokenAnnotator.cpp b/clang/lib/Format/TokenAnnotator.cpp
index d97f56751ea69..e7081aac66c98 100644
--- a/clang/lib/Format/TokenAnnotator.cpp
+++ b/clang/lib/Format/TokenAnnotator.cpp
@@ -6486,13 +6486,14 @@ void TokenAnnotator::printDebugInfo(const AnnotatedLine &Line) const {
                << "):\n";
   const FormatToken *Tok = Line.First;
   while (Tok) {
-    llvm::errs() << " M=" << Tok->MustBreakBefore
+    llvm::errs() << " I=" << Tok->IndentLevel << " M=" << Tok->MustBreakBefore
                  << " C=" << Tok->CanBreakBefore
                  << " T=" << getTokenTypeName(Tok->getType())
                  << " S=" << Tok->SpacesRequiredBefore
                  << " F=" << Tok->Finalized << " B=" << Tok->BlockParameterCount
                  << " BK=" << Tok->getBlockKind() << " P=" << Tok->SplitPenalty
-                 << " Name=" << Tok->Tok.getName() << " L=" << Tok->TotalLength
+                 << " Name=" << Tok->Tok.getName() << " N=" << Tok->NestingLevel
+                 << " L=" << Tok->TotalLength
                  << " PPK=" << Tok->getPackingKind() << " FakeLParens=";
     for (prec::Level LParen : Tok->FakeLParens)
       llvm::errs() << LParen << "/";
diff --git a/clang/lib/Format/WhitespaceManager.cpp b/clang/lib/Format/WhitespaceManager.cpp
index cc3cc0f6906cc..64b0c486a359e 100644
--- a/clang/lib/Format/WhitespaceManager.cpp
+++ b/clang/lib/Format/WhitespaceManager.cpp
@@ -12,6 +12,7 @@
 //===----------------------------------------------------------------------===//
 
 #include "WhitespaceManager.h"
+#include "llvm/ADT/ArrayRef.h"
 #include "llvm/ADT/STLExtras.h"
 #include "llvm/ADT/SmallVector.h"
 #include <algorithm>
@@ -279,20 +280,19 @@ void WhitespaceManager::calculateLineBreakInformation() {
 }
 
 // Align a single sequence of tokens, see AlignTokens below.
-// Column - The token for which Matches returns true is moved to this column.
+// Column - The tokens indexed in Matches are moved to this column.
 // RightJustify - Whether it is the token's right end or left end that gets
 // moved to that column.
-template <typename F>
 static void
 AlignTokenSequence(const FormatStyle &Style, unsigned Start, unsigned End,
-                   unsigned Column, bool RightJustify, F &&Matches,
+                   unsigned Column, bool RightJustify,
+                   ArrayRef<const unsigned> Matches,
                    SmallVector<WhitespaceManager::Change, 16> &Changes) {
-  bool FoundMatchOnLine = false;
   int Shift = 0;
 
   // ScopeStack keeps track of the current scope depth. It contains indices of
   // the first token on each scope.
-  // We only run the "Matches" function on tokens from the outer-most scope.
+  // The "Matches" indices should only have tokens from the outer-most scope.
   // However, we do need to pay special attention to one class of tokens
   // that are not in the outer-most scope, and that is function parameters
   // which are split across multiple lines, as illustrated by this example:
@@ -314,6 +314,9 @@ AlignTokenSequence(const FormatStyle &Style, unsigned Start, unsigned End,
 
   for (unsigned i = Start; i != End; ++i) {
     auto &CurrentChange = Changes[i];
+    if (!Matches.empty() && Matches[0] < i)
+      Matches.consume_front();
+    assert(Matches.empty() || Matches[0] >= i);
     if (!ScopeStack.empty() &&
         CurrentChange.indentAndNestingLevel() <
             Changes[ScopeStack.back()].indentAndNestingLevel()) {
@@ -340,24 +343,15 @@ AlignTokenSequence(const FormatStyle &Style, unsigned Start, unsigned End,
 
     if (CurrentChange.NewlinesBefore > 0 && !SkipMatchCheck) {
       Shift = 0;
-      FoundMatchOnLine = false;
     }
 
     // If this is the first matching token to be aligned, remember by how many
     // spaces it has to be shifted, so the rest of the changes on the line are
     // shifted by the same amount
-    if (!FoundMatchOnLine && !SkipMatchCheck && Matches(CurrentChange)) {
-      FoundMatchOnLine = true;
+    if (!Matches.empty() && Matches[0] == i) {
       Shift = Column - (RightJustify ? CurrentChange.TokenLength : 0) -
               CurrentChange.StartOfTokenColumn;
       CurrentChange.Spaces += Shift;
-      // FIXME: This is a workaround that should be removed when we fix
-      // http://llvm.org/PR53699. An assertion later below verifies this.
-      if (CurrentChange.NewlinesBefore == 0) {
-        CurrentChange.Spaces =
-            std::max(CurrentChange.Spaces,
-                     static_cast<int>(CurrentChange.Tok->SpacesRequiredBefore));
-      }
     }
 
     if (Shift == 0)
@@ -532,12 +526,14 @@ static unsigned AlignTokens(const FormatStyle &Style, F &&Matches,
                             bool RightJustify = false) {
   // We arrange each line in 3 parts. The operator to be aligned (the anchor),
   // and text to its left and right. In the aligned text the width of each part
-  // will be the maximum of that over the block that has been aligned. Maximum
-  // widths of each part so far. When RightJustify is true and ACS.PadOperators
-  // is false, the part from start of line to the right end of the anchor.
-  // Otherwise, only the part to the left of the anchor. Including the space
-  // that exists on its left from the start. Not including the padding added on
-  // the left to right-justify the anchor.
+  // will be the maximum of that over the block that has been aligned.
+
+  // Maximum widths of each part so far.
+  // When RightJustify is true and ACS.PadOperators is false, the part from
+  // start of line to the right end of the anchor. Otherwise, only the part to
+  // the left of the anchor. Including the space that exists on its left from
+  // the start. Not including the padding added on the left to right-justify the
+  // anchor.
   unsigned WidthLeft = 0;
   // The operator to be aligned when RightJustify is true and ACS.PadOperators
   // is false. 0 otherwise.
@@ -550,6 +546,9 @@ static unsigned AlignTokens(const FormatStyle &Style, F &&Matches,
   unsigned StartOfSequence = 0;
   unsigned EndOfSequence = 0;
 
+  // The positions of the tokens to be aligned.
+  SmallVector<unsigned> MatchedIndices;
+
   // Measure the scope level (i.e. depth of (), [], {}) of the first token, and
   // abort when we hit any token in a higher scope than the starting one.
   auto IndentAndNestingLevel = StartAt < Changes.size()
@@ -578,7 +577,7 @@ static unsigned AlignTokens(const FormatStyle &Style, F &&Matches,
   auto AlignCurrentSequence = [&] {
     if (StartOfSequence > 0 && StartOfSequence < EndOfSequence) {
       AlignTokenSequence(Style, StartOfSequence, EndOfSequence,
-                         WidthLeft + WidthAnchor, RightJustify, Matches,
+                         WidthLeft + WidthAnchor, RightJustify, MatchedIndices,
                          Changes);
     }
     WidthLeft = 0;
@@ -586,6 +585,7 @@ static unsigned AlignTokens(const FormatStyle &Style, F &&Matches,
     WidthRight = 0;
     StartOfSequence = 0;
     EndOfSequence = 0;
+    MatchedIndices.clear();
   };
 
   unsigned i = StartAt;
@@ -637,8 +637,10 @@ static unsigned AlignTokens(const FormatStyle &Style, F &&Matches,
 
     // If there is more than one matching token per line, or if the number of
     // preceding commas, do not match anymore, end the sequence.
-    if (FoundMatchOnLine || CommasBeforeMatch != CommasBeforeLastMatch)
+    if (FoundMatchOnLine || CommasBeforeMatch != CommasBeforeLastMatch) {
+      MatchedIndices.push_back(i);
       AlignCurrentSequence();
+    }
 
     CommasBeforeLastMatch = CommasBeforeMatch;
     FoundMatchOnLine = true;
@@ -684,6 +686,7 @@ static unsigned AlignTokens(const FormatStyle &Style, F &&Matches,
       WidthAnchor = NewAnchor;
       WidthRight = NewRight;
     }
+    MatchedIndices.push_back(i);
   }
 
   EndOfSequence = i;
diff --git a/clang/unittests/Format/FormatTest.cpp b/clang/unittests/Format/FormatTest.cpp
index d9db06667d802..cd8e67d3a2425 100644
--- a/clang/unittests/Format/FormatTest.cpp
+++ b/clang/unittests/Format/FormatTest.cpp
@@ -20312,6 +20312,11 @@ TEST_F(FormatTest, AlignConsecutiveDeclarations) {
                "  return 2;\n"
                "} };",
                BracedAlign);
+  verifyFormat("const volatile auto result{ []() {\n"
+               "  const auto something = 1;\n"
+               "  return 2;\n"
+               "} };",
+               BracedAlign);
   verifyFormat("int foo{ []() {\n"
                "  int bar{ 0 };\n"
                "  return 0;\n"

Copy link
Contributor

@HazardyKnusperkeks HazardyKnusperkeks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, please wait a day or two before the merge.

AlignTokenSequence(const FormatStyle &Style, unsigned Start, unsigned End,
unsigned Column, bool RightJustify, F &&Matches,
unsigned Column, bool RightJustify,
ArrayRef<const unsigned> Matches,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the const redundant?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems so. I removed it. I thought the array ref type allowed changing the data just like the span type. By the way, why is there an array ref type when there is already the span type?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean std::span in C++20? I haven't used it. FWIW, LLVM is still at C++17.

@sstwcw sstwcw merged commit a0c75a9 into llvm:main Sep 25, 2025
9 checks passed
@sstwcw sstwcw deleted the format-align branch September 25, 2025 16:46
mahesh-attarde pushed a commit to mahesh-attarde/llvm-project that referenced this pull request Oct 3, 2025
…led (llvm#159140)

When the style is
`{AlignConsecutiveDeclarations: true, Cpp11BracedListStyle: false}`, the
program would sometimes align the lambda body with the outside. Like
this.

```C++
const volatile auto result{ []() {
  const auto        something = 1;
  return 2;
} };
```

This patch stops it.  Now the output looks like this.

```C++
const volatile auto result{ []() {
  const auto something = 1;
  return 2;
} };
```

Fixes llvm#53699.

The problem was with how the `AlignTokenSequence` function tracked
levels. The stack was pushed once when a token was more nested than the
previous one. It was popped once when a token was less nested than the
previous one. With the option `Cpp11BracedListStyle` disabled, the `[`
token was deeper than the previous token by 2 levels. Both its
indentation level and nesting level were more than that of the previous
one. But the stack was only pushed once. The following tokens popped the
2 levels separately. The function treated the body of the lambda
expression as on the same level as the outside.

The problem is fixed this way. The function `AlignTokens` which calls
the function `AlignTokenSequence` already uses a simpler and more robust
way to track levels. It remembers the outermost level it should align.
It compares the token's level with the outermost level. It does not need
a stack. So it is immune to the problem. This patch makes the function
`AlignTokenSequence` take as a parameter the indices of the tokens
matched by the function `AlignTokens`. This way it does not have to
contain the logic again.

Now the function `AlignTokenSequence` only aligns tokens already matched
by the function `AlignTokens`. It is easy to see that the assertion on
line 453 holds. The workaround on line 354 is not needed any more. The
test on line 20310 added at the same time as the assertion ensures that
the assertion hold.

The stack in the function `AlignTokenSequence` is kept for now. It is
still used to figure out when things inside a level should move along
with the outer level. Since the stack has the problem, the developer
plans to move the logic into token annotator. It already uses a stack.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[clang-format] Unrelated declarations are aligned with Cpp11BracedListStyle: false

4 participants