[clang-format] Align within the level with Cpp11BracedListStyle disabled #159140

sstwcw · 2025-09-16T17:46:47Z

When the style is
{AlignConsecutiveDeclarations: true, Cpp11BracedListStyle: false}, the program would sometimes align the lambda body with the outside. Like this.

const volatile auto result{ []() {
  const auto        something = 1;
  return 2;
} };

This patch stops it. Now the output looks like this.

const volatile auto result{ []() {
  const auto something = 1;
  return 2;
} };

Fixes #53699.

The problem was with how the AlignTokenSequence function tracked levels. The stack was pushed once when a token was more nested than the previous one. It was popped once when a token was less nested than the previous one. With the option Cpp11BracedListStyle disabled, the [ token was deeper than the previous token by 2 levels. Both its indentation level and nesting level were more than that of the previous one. But the stack was only pushed once. The following tokens popped the 2 levels separately. The function treated the body of the lambda expression as on the same level as the outside.

The problem is fixed this way. The function AlignTokens which calls the function AlignTokenSequence already uses a simpler and more robust way to track levels. It remembers the outermost level it should align. It compares the token's level with the outermost level. It does not need a stack. So it is immune to the problem. This patch makes the function AlignTokenSequence take as a parameter the indices of the tokens matched by the function AlignTokens. This way it does not have to contain the logic again.

Now the function AlignTokenSequence only aligns tokens already matched by the function AlignTokens. It is easy to see that the assertion on line 453 holds. The workaround on line 354 is not needed any more. The test on line 20310 added at the same time as the assertion ensures that the assertion hold.

The stack in the function AlignTokenSequence is kept for now. It is still used to figure out when things inside a level should move along with the outer level. Since the stack has the problem, the developer plans to move the logic into token annotator. It already uses a stack.

When the style is `{AlignConsecutiveDeclarations: true, Cpp11BracedListStyle: false}`, the program would sometimes align the lambda body with the outside. Like this. ```C++ const volatile auto result{ []() { const auto something = 1; return 2; } }; ``` This patch stops it. Now the output looks like this. ```C++ const volatile auto result{ []() { const auto something = 1; return 2; } }; ``` Fixes llvm#53699. The problem was with how the `AlignTokenSequence` function tracked levels. The stack was pushed once when a token was more nested than the previous one. It was popped once when a token was less nested than the previous one. With the option `Cpp11BracedListStyle` disabled, the `[` token was deeper than the previous token by 2 levels. Both its indentation level and nesting level were more than that of the previous one. But the stack was only pushed once. The following tokens popped the 2 levels separately. The function treated the body of the lambda expression as on the same level as the outside. The problem is fixed this way. The function `AlignTokens` which calls the function `AlignTokenSequence` already uses a simpler and more robust way to track levels. It remembers the outermost level it should align. It compares the token's level with the outermost level. It does not need a stack. So it is immune to the problem. This patch makes the function `AlignTokenSequence` take as a parameter the indices of the tokens matched by the function `AlignTokens`. This way it does not have to contain the logic again. Now the function `AlignTokenSequence` only aligns tokens already matched by the function `AlignTokens`. It is easy to see that the assertion on line 453 holds. The workaround on line 354 is not needed any more. The test on line 20310 added at the same time as the assertion ensures that the assertion hold. The stack in the function `AlignTokenSequence` is kept for now. It is still used to figure out when things inside a level should move along with the outer level. Since the stack has the problem, the developer plans to move the logic into token annotator. It already uses a stack.

llvmbot · 2025-09-16T17:47:29Z

@llvm/pr-subscribers-clang-format

Author: None (sstwcw)

Changes

When the style is
{AlignConsecutiveDeclarations: true, Cpp11BracedListStyle: false}, the program would sometimes align the lambda body with the outside. Like this.

const volatile auto result{ []() {
  const auto        something = 1;
  return 2;
} };

This patch stops it. Now the output looks like this.

const volatile auto result{ []() {
  const auto something = 1;
  return 2;
} };

Fixes #53699.

The problem was with how the AlignTokenSequence function tracked levels. The stack was pushed once when a token was more nested than the previous one. It was popped once when a token was less nested than the previous one. With the option Cpp11BracedListStyle disabled, the [ token was deeper than the previous token by 2 levels. Both its indentation level and nesting level were more than that of the previous one. But the stack was only pushed once. The following tokens popped the 2 levels separately. The function treated the body of the lambda expression as on the same level as the outside.

The problem is fixed this way. The function AlignTokens which calls the function AlignTokenSequence already uses a simpler and more robust way to track levels. It remembers the outermost level it should align. It compares the token's level with the outermost level. It does not need a stack. So it is immune to the problem. This patch makes the function AlignTokenSequence take as a parameter the indices of the tokens matched by the function AlignTokens. This way it does not have to contain the logic again.

Now the function AlignTokenSequence only aligns tokens already matched by the function AlignTokens. It is easy to see that the assertion on line 453 holds. The workaround on line 354 is not needed any more. The test on line 20310 added at the same time as the assertion ensures that the assertion hold.

The stack in the function AlignTokenSequence is kept for now. It is still used to figure out when things inside a level should move along with the outer level. Since the stack has the problem, the developer plans to move the logic into token annotator. It already uses a stack.

Full diff: https://github.com/llvm/llvm-project/pull/159140.diff

3 Files Affected:

(modified) clang/lib/Format/TokenAnnotator.cpp (+3-2)
(modified) clang/lib/Format/WhitespaceManager.cpp (+26-23)
(modified) clang/unittests/Format/FormatTest.cpp (+5)

diff --git a/clang/lib/Format/TokenAnnotator.cpp b/clang/lib/Format/TokenAnnotator.cpp
index d97f56751ea69..e7081aac66c98 100644
--- a/clang/lib/Format/TokenAnnotator.cpp
+++ b/clang/lib/Format/TokenAnnotator.cpp
@@ -6486,13 +6486,14 @@ void TokenAnnotator::printDebugInfo(const AnnotatedLine &Line) const {
                << "):\n";
   const FormatToken *Tok = Line.First;
   while (Tok) {
-    llvm::errs() << " M=" << Tok->MustBreakBefore
+    llvm::errs() << " I=" << Tok->IndentLevel << " M=" << Tok->MustBreakBefore
                  << " C=" << Tok->CanBreakBefore
                  << " T=" << getTokenTypeName(Tok->getType())
                  << " S=" << Tok->SpacesRequiredBefore
                  << " F=" << Tok->Finalized << " B=" << Tok->BlockParameterCount
                  << " BK=" << Tok->getBlockKind() << " P=" << Tok->SplitPenalty
-                 << " Name=" << Tok->Tok.getName() << " L=" << Tok->TotalLength
+                 << " Name=" << Tok->Tok.getName() << " N=" << Tok->NestingLevel
+                 << " L=" << Tok->TotalLength
                  << " PPK=" << Tok->getPackingKind() << " FakeLParens=";
     for (prec::Level LParen : Tok->FakeLParens)
       llvm::errs() << LParen << "/";
diff --git a/clang/lib/Format/WhitespaceManager.cpp b/clang/lib/Format/WhitespaceManager.cpp
index cc3cc0f6906cc..64b0c486a359e 100644
--- a/clang/lib/Format/WhitespaceManager.cpp
+++ b/clang/lib/Format/WhitespaceManager.cpp
@@ -12,6 +12,7 @@
 //===----------------------------------------------------------------------===//
 
 #include "WhitespaceManager.h"
+#include "llvm/ADT/ArrayRef.h"
 #include "llvm/ADT/STLExtras.h"
 #include "llvm/ADT/SmallVector.h"
 #include <algorithm>
@@ -279,20 +280,19 @@ void WhitespaceManager::calculateLineBreakInformation() {
 }
 
 // Align a single sequence of tokens, see AlignTokens below.
-// Column - The token for which Matches returns true is moved to this column.
+// Column - The tokens indexed in Matches are moved to this column.
 // RightJustify - Whether it is the token's right end or left end that gets
 // moved to that column.
-template <typename F>
 static void
 AlignTokenSequence(const FormatStyle &Style, unsigned Start, unsigned End,
-                   unsigned Column, bool RightJustify, F &&Matches,
+                   unsigned Column, bool RightJustify,
+                   ArrayRef<const unsigned> Matches,
                    SmallVector<WhitespaceManager::Change, 16> &Changes) {
-  bool FoundMatchOnLine = false;
   int Shift = 0;
 
   // ScopeStack keeps track of the current scope depth. It contains indices of
   // the first token on each scope.
-  // We only run the "Matches" function on tokens from the outer-most scope.
+  // The "Matches" indices should only have tokens from the outer-most scope.
   // However, we do need to pay special attention to one class of tokens
   // that are not in the outer-most scope, and that is function parameters
   // which are split across multiple lines, as illustrated by this example:
@@ -314,6 +314,9 @@ AlignTokenSequence(const FormatStyle &Style, unsigned Start, unsigned End,
 
   for (unsigned i = Start; i != End; ++i) {
     auto &CurrentChange = Changes[i];
+    if (!Matches.empty() && Matches[0] < i)
+      Matches.consume_front();
+    assert(Matches.empty() || Matches[0] >= i);
     if (!ScopeStack.empty() &&
         CurrentChange.indentAndNestingLevel() <
             Changes[ScopeStack.back()].indentAndNestingLevel()) {
@@ -340,24 +343,15 @@ AlignTokenSequence(const FormatStyle &Style, unsigned Start, unsigned End,
 
     if (CurrentChange.NewlinesBefore > 0 && !SkipMatchCheck) {
       Shift = 0;
-      FoundMatchOnLine = false;
     }
 
     // If this is the first matching token to be aligned, remember by how many
     // spaces it has to be shifted, so the rest of the changes on the line are
     // shifted by the same amount
-    if (!FoundMatchOnLine && !SkipMatchCheck && Matches(CurrentChange)) {
-      FoundMatchOnLine = true;
+    if (!Matches.empty() && Matches[0] == i) {
       Shift = Column - (RightJustify ? CurrentChange.TokenLength : 0) -
               CurrentChange.StartOfTokenColumn;
       CurrentChange.Spaces += Shift;
-      // FIXME: This is a workaround that should be removed when we fix
-      // http://llvm.org/PR53699. An assertion later below verifies this.
-      if (CurrentChange.NewlinesBefore == 0) {
-        CurrentChange.Spaces =
-            std::max(CurrentChange.Spaces,
-                     static_cast<int>(CurrentChange.Tok->SpacesRequiredBefore));
-      }
     }
 
     if (Shift == 0)
@@ -532,12 +526,14 @@ static unsigned AlignTokens(const FormatStyle &Style, F &&Matches,
                             bool RightJustify = false) {
   // We arrange each line in 3 parts. The operator to be aligned (the anchor),
   // and text to its left and right. In the aligned text the width of each part
-  // will be the maximum of that over the block that has been aligned. Maximum
-  // widths of each part so far. When RightJustify is true and ACS.PadOperators
-  // is false, the part from start of line to the right end of the anchor.
-  // Otherwise, only the part to the left of the anchor. Including the space
-  // that exists on its left from the start. Not including the padding added on
-  // the left to right-justify the anchor.
+  // will be the maximum of that over the block that has been aligned.
+
+  // Maximum widths of each part so far.
+  // When RightJustify is true and ACS.PadOperators is false, the part from
+  // start of line to the right end of the anchor. Otherwise, only the part to
+  // the left of the anchor. Including the space that exists on its left from
+  // the start. Not including the padding added on the left to right-justify the
+  // anchor.
   unsigned WidthLeft = 0;
   // The operator to be aligned when RightJustify is true and ACS.PadOperators
   // is false. 0 otherwise.
@@ -550,6 +546,9 @@ static unsigned AlignTokens(const FormatStyle &Style, F &&Matches,
   unsigned StartOfSequence = 0;
   unsigned EndOfSequence = 0;
 
+  // The positions of the tokens to be aligned.
+  SmallVector<unsigned> MatchedIndices;
+
   // Measure the scope level (i.e. depth of (), [], {}) of the first token, and
   // abort when we hit any token in a higher scope than the starting one.
   auto IndentAndNestingLevel = StartAt < Changes.size()
@@ -578,7 +577,7 @@ static unsigned AlignTokens(const FormatStyle &Style, F &&Matches,
   auto AlignCurrentSequence = [&] {
     if (StartOfSequence > 0 && StartOfSequence < EndOfSequence) {
       AlignTokenSequence(Style, StartOfSequence, EndOfSequence,
-                         WidthLeft + WidthAnchor, RightJustify, Matches,
+                         WidthLeft + WidthAnchor, RightJustify, MatchedIndices,
                          Changes);
     }
     WidthLeft = 0;
@@ -586,6 +585,7 @@ static unsigned AlignTokens(const FormatStyle &Style, F &&Matches,
     WidthRight = 0;
     StartOfSequence = 0;
     EndOfSequence = 0;
+    MatchedIndices.clear();
   };
 
   unsigned i = StartAt;
@@ -637,8 +637,10 @@ static unsigned AlignTokens(const FormatStyle &Style, F &&Matches,
 
     // If there is more than one matching token per line, or if the number of
     // preceding commas, do not match anymore, end the sequence.
-    if (FoundMatchOnLine || CommasBeforeMatch != CommasBeforeLastMatch)
+    if (FoundMatchOnLine || CommasBeforeMatch != CommasBeforeLastMatch) {
+      MatchedIndices.push_back(i);
       AlignCurrentSequence();
+    }
 
     CommasBeforeLastMatch = CommasBeforeMatch;
     FoundMatchOnLine = true;
@@ -684,6 +686,7 @@ static unsigned AlignTokens(const FormatStyle &Style, F &&Matches,
       WidthAnchor = NewAnchor;
       WidthRight = NewRight;
     }
+    MatchedIndices.push_back(i);
   }
 
   EndOfSequence = i;
diff --git a/clang/unittests/Format/FormatTest.cpp b/clang/unittests/Format/FormatTest.cpp
index d9db06667d802..cd8e67d3a2425 100644
--- a/clang/unittests/Format/FormatTest.cpp
+++ b/clang/unittests/Format/FormatTest.cpp
@@ -20312,6 +20312,11 @@ TEST_F(FormatTest, AlignConsecutiveDeclarations) {
                "  return 2;\n"
                "} };",
                BracedAlign);
+  verifyFormat("const volatile auto result{ []() {\n"
+               "  const auto something = 1;\n"
+               "  return 2;\n"
+               "} };",
+               BracedAlign);
   verifyFormat("int foo{ []() {\n"
                "  int bar{ 0 };\n"
                "  return 0;\n"

clang/lib/Format/WhitespaceManager.cpp

HazardyKnusperkeks

LGTM, please wait a day or two before the merge.

clang/lib/Format/WhitespaceManager.cpp

owenca · 2025-09-20T09:45:09Z

clang/lib/Format/WhitespaceManager.cpp

 AlignTokenSequence(const FormatStyle &Style, unsigned Start, unsigned End,
-                   unsigned Column, bool RightJustify, F &&Matches,
+                   unsigned Column, bool RightJustify,
+                   ArrayRef<const unsigned> Matches,


Is the const redundant?

Seems so. I removed it. I thought the array ref type allowed changing the data just like the span type. By the way, why is there an array ref type when there is already the span type?

You mean std::span in C++20? I haven't used it. FWIW, LLVM is still at C++17.

…led (llvm#159140) When the style is `{AlignConsecutiveDeclarations: true, Cpp11BracedListStyle: false}`, the program would sometimes align the lambda body with the outside. Like this. ```C++ const volatile auto result{ []() { const auto something = 1; return 2; } }; ``` This patch stops it. Now the output looks like this. ```C++ const volatile auto result{ []() { const auto something = 1; return 2; } }; ``` Fixes llvm#53699. The problem was with how the `AlignTokenSequence` function tracked levels. The stack was pushed once when a token was more nested than the previous one. It was popped once when a token was less nested than the previous one. With the option `Cpp11BracedListStyle` disabled, the `[` token was deeper than the previous token by 2 levels. Both its indentation level and nesting level were more than that of the previous one. But the stack was only pushed once. The following tokens popped the 2 levels separately. The function treated the body of the lambda expression as on the same level as the outside. The problem is fixed this way. The function `AlignTokens` which calls the function `AlignTokenSequence` already uses a simpler and more robust way to track levels. It remembers the outermost level it should align. It compares the token's level with the outermost level. It does not need a stack. So it is immune to the problem. This patch makes the function `AlignTokenSequence` take as a parameter the indices of the tokens matched by the function `AlignTokens`. This way it does not have to contain the logic again. Now the function `AlignTokenSequence` only aligns tokens already matched by the function `AlignTokens`. It is easy to see that the assertion on line 453 holds. The workaround on line 354 is not needed any more. The test on line 20310 added at the same time as the assertion ensures that the assertion hold. The stack in the function `AlignTokenSequence` is kept for now. It is still used to figure out when things inside a level should move along with the outer level. Since the stack has the problem, the developer plans to move the logic into token annotator. It already uses a stack.

llvmbot added the clang-format label Sep 16, 2025

HazardyKnusperkeks reviewed Sep 17, 2025

View reviewed changes

clang/lib/Format/WhitespaceManager.cpp Outdated Show resolved Hide resolved

clang/lib/Format/WhitespaceManager.cpp Outdated Show resolved Hide resolved

Format

224b5d9

HazardyKnusperkeks approved these changes Sep 18, 2025

View reviewed changes

owenca approved these changes Sep 20, 2025

View reviewed changes

Remove extra const qualifier

50eaa88

sstwcw merged commit a0c75a9 into llvm:main Sep 25, 2025
9 checks passed

sstwcw deleted the format-align branch September 25, 2025 16:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[clang-format] Align within the level with Cpp11BracedListStyle disabled #159140

[clang-format] Align within the level with Cpp11BracedListStyle disabled #159140

Uh oh!

sstwcw commented Sep 16, 2025

Uh oh!

llvmbot commented Sep 16, 2025

Uh oh!

Uh oh!

Uh oh!

HazardyKnusperkeks left a comment

Uh oh!

Uh oh!

owenca Sep 20, 2025

Uh oh!

sstwcw Sep 20, 2025

Uh oh!

owenca Sep 20, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[clang-format] Align within the level with Cpp11BracedListStyle disabled #159140

[clang-format] Align within the level with Cpp11BracedListStyle disabled #159140

Uh oh!

Conversation

sstwcw commented Sep 16, 2025

Uh oh!

llvmbot commented Sep 16, 2025

Uh oh!

Uh oh!

Uh oh!

HazardyKnusperkeks left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

owenca Sep 20, 2025

Choose a reason for hiding this comment

Uh oh!

sstwcw Sep 20, 2025

Choose a reason for hiding this comment

Uh oh!

owenca Sep 20, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants