[clang] Make -dump-tokens option align tokens #164894

alexpaniman · 2025-10-23T21:01:13Z

When using -Xclang -dump-tokens, the lexer dump output is currently difficult to read because the data are misaligned. The existing implementation simply separates the token name, spelling, flags, and location using '\t', which results in inconsistent spacing.

For example, the current output looks like this on provided in this patch example (BEFORE THIS PR):

Changes

This small PR improves the readability of the token dump by:

Adding padding after the token name and after the spelling (the padding amount was chosen empirically to produce good average alignment).
Swapping the order of location and flags (since flags can take up a lot of space and disrupt alignment).

The result is a more readable output (AFTER THIS PR):

github-actions · 2025-10-23T21:01:56Z

Thank you for submitting a Pull Request (PR) to the LLVM Project!

This PR will be automatically labeled and the relevant teams will be notified.

If you wish to, you can add reviewers by using the "Reviewers" section on this page.

If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using @ followed by their GitHub username.

If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers.

If you have further questions, they may be answered by the LLVM GitHub User Guide.

You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums.

llvmbot · 2025-10-23T21:02:39Z

@llvm/pr-subscribers-clang

Author: None (alexpaniman)

Changes

When using -Xclang -dump-tokens, the lexer dump output is currently difficult to read because the data are misaligned. The existing implementation simply separates the token name, spelling, flags, and location using '\t', which results in inconsistent spacing.

For example, the current output looks like this on provided in this patch example (BEFORE THIS PR):

Changes

This small PR improves the readability of the token dump by:

Adding padding after the token name and after the spelling (the padding amount was chosen empirically to produce good average alignment).
Swapping the order of location and flags (since flags can take up a lot of space and disrupt alignment).

The result is a more readable output (AFTER THIS PR):

Full diff: https://github.com/llvm/llvm-project/pull/164894.diff

2 Files Affected:

(modified) clang/lib/Lex/Preprocessor.cpp (+11-8)
(added) clang/test/Preprocessor/dump-tokens.cpp (+16)

diff --git a/clang/lib/Lex/Preprocessor.cpp b/clang/lib/Lex/Preprocessor.cpp
index e003ad3a95570..fcf2369453d47 100644
--- a/clang/lib/Lex/Preprocessor.cpp
+++ b/clang/lib/Lex/Preprocessor.cpp
@@ -59,6 +59,7 @@
 #include "llvm/ADT/StringRef.h"
 #include "llvm/Support/Capacity.h"
 #include "llvm/Support/ErrorHandling.h"
+#include "llvm/Support/FormatVariadic.h"
 #include "llvm/Support/MemoryBuffer.h"
 #include "llvm/Support/raw_ostream.h"
 #include <algorithm>
@@ -234,14 +235,20 @@ void Preprocessor::FinalizeForModelFile() {
 }
 
 void Preprocessor::DumpToken(const Token &Tok, bool DumpFlags) const {
-  llvm::errs() << tok::getTokenName(Tok.getKind());
+  llvm::errs() << llvm::formatv("{0,-16} ", tok::getTokenName(Tok.getKind()));
 
-  if (!Tok.isAnnotation())
-    llvm::errs() << " '" << getSpelling(Tok) << "'";
+  std::string Spelling;
+  if (!Tok.isAnnotation()) {
+    Spelling = llvm::formatv("{0,-32} ", "'" + getSpelling(Tok) + "'");
+  }
+  llvm::errs() << Spelling;
 
   if (!DumpFlags) return;
 
-  llvm::errs() << "\t";
+  llvm::errs() << "Loc=<";
+  DumpLocation(Tok.getLocation());
+  llvm::errs() << ">";
+
   if (Tok.isAtStartOfLine())
     llvm::errs() << " [StartOfLine]";
   if (Tok.hasLeadingSpace())
@@ -253,10 +260,6 @@ void Preprocessor::DumpToken(const Token &Tok, bool DumpFlags) const {
     llvm::errs() << " [UnClean='" << StringRef(Start, Tok.getLength())
                  << "']";
   }
-
-  llvm::errs() << "\tLoc=<";
-  DumpLocation(Tok.getLocation());
-  llvm::errs() << ">";
 }
 
 void Preprocessor::DumpLocation(SourceLocation Loc) const {
diff --git a/clang/test/Preprocessor/dump-tokens.cpp b/clang/test/Preprocessor/dump-tokens.cpp
new file mode 100644
index 0000000000000..3774894943b87
--- /dev/null
+++ b/clang/test/Preprocessor/dump-tokens.cpp
@@ -0,0 +1,16 @@
+// RUN: %clang_cc1 -dump-tokens %s 2>&1 | FileCheck %s
+
+->                           // CHECK: arrow            '->'
+5                            // CHECK: numeric_constant '5'
+id                           // CHECK: identifier       'id'
+&                            // CHECK: amp              '&'
+)                            // CHECK: r_paren          ')'
+unsigned                     // CHECK: unsigned         'unsigned'
+~                            // CHECK: tilde            '~'
+long_variable_name_very_long // CHECK: identifier       'long_variable_name_very_long'
+union                        // CHECK: union            'union'
+42                           // CHECK: numeric_constant '42'
+j                            // CHECK: identifier       'j'
+&=                           // CHECK: ampequal         '&='
+15                           // CHECK: numeric_constant '15'
+

Fznamznon · 2025-10-24T15:55:10Z

clang/lib/Lex/Preprocessor.cpp

+  std::string Spelling;
+  if (!Tok.isAnnotation()) {
+    Spelling = llvm::formatv("{0,-32} ", "'" + getSpelling(Tok) + "'");
+  }
+  llvm::errs() << Spelling;


Why a new variable?

Suggested change

std::string Spelling;

if (!Tok.isAnnotation()) {

Spelling = llvm::formatv("{0,-32} ", "'" + getSpelling(Tok) + "'");

}

llvm::errs() << Spelling;

if (!Tok.isAnnotation())

llvm::errs() << llvm::formatv("{0,-32} ", "'" + getSpelling(Tok) + "'");

I agree, not necessary, fixed it

I remembered, I probably wanted to have consistent spacing for annotations (for which there is no spelling) too. Changed it to work as I intended.

Are annotation tokens ever printed this way? If yes, could you please add a test with an example?

…::DumpToken

AaronBallman

Thank you for this, I really like the new output compared to the old!

AaronBallman · 2025-11-04T18:57:00Z

clang/lib/Lex/Preprocessor.cpp

+    Spelling = "'" + getSpelling(Tok) + "'";
+  }
+
+  llvm::errs() << llvm::formatv("{0,-32} ", Spelling);


Should this line be included in the !Tok.isAnnotation() block? Otherwise, we're intentionally printing an empty string?

AaronBallman · 2025-11-04T18:57:57Z

clang/test/Preprocessor/dump-tokens.cpp

@@ -0,0 +1,16 @@
+// RUN: %clang_cc1 -dump-tokens %s 2>&1 | FileCheck %s


Suggested change

// RUN: %clang_cc1 -dump-tokens %s 2>&1 | FileCheck %s

// RUN: %clang_cc1 -dump-tokens %s 2>&1 | FileCheck %s --strict-whitespace

This way we can test that the whitespace is actually honored (https://llvm.org/docs/CommandGuide/FileCheck.html#cmdoption-FileCheck-strict-whitespace).

[clang] Make -dump-tokens option align tokens

805d4a5

llvmbot added clang Clang issues not falling into any other category clang:frontend Language frontend issues, e.g. anything involving "Sema" labels Oct 23, 2025

Fznamznon reviewed Oct 24, 2025

View reviewed changes

Fznamznon requested review from AaronBallman and tbaederr October 24, 2025 15:57

alexpaniman added 2 commits October 24, 2025 19:52

[clang] Remove unnecessary variable from Preprocessor::DumpToken

194f5ad

[clang] Ensure consistent spacing for annotations too in Preprocessor…

ba0ba38

…::DumpToken

alexpaniman requested a review from Fznamznon October 24, 2025 17:14

AaronBallman reviewed Nov 4, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[clang] Make -dump-tokens option align tokens #164894

[clang] Make -dump-tokens option align tokens #164894

Uh oh!

alexpaniman commented Oct 23, 2025

Uh oh!

github-actions bot commented Oct 23, 2025

Uh oh!

llvmbot commented Oct 23, 2025

Changes

Uh oh!

Fznamznon Oct 24, 2025

Uh oh!

alexpaniman Oct 24, 2025

Uh oh!

alexpaniman Oct 24, 2025

Uh oh!

Fznamznon Oct 29, 2025 •

edited

Loading

Uh oh!

AaronBallman left a comment

Uh oh!

AaronBallman Nov 4, 2025

Uh oh!

AaronBallman Nov 4, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

		@@ -0,0 +1,16 @@
		// RUN: %clang_cc1 -dump-tokens %s 2>&1 \| FileCheck %s

[clang] Make -dump-tokens option align tokens #164894

Are you sure you want to change the base?

[clang] Make -dump-tokens option align tokens #164894

Uh oh!

Conversation

alexpaniman commented Oct 23, 2025

Changes

Uh oh!

github-actions bot commented Oct 23, 2025

Uh oh!

llvmbot commented Oct 23, 2025

Changes

Uh oh!

Fznamznon Oct 24, 2025

Choose a reason for hiding this comment

Uh oh!

alexpaniman Oct 24, 2025

Choose a reason for hiding this comment

Uh oh!

alexpaniman Oct 24, 2025

Choose a reason for hiding this comment

Uh oh!

Fznamznon Oct 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

AaronBallman left a comment

Choose a reason for hiding this comment

Uh oh!

AaronBallman Nov 4, 2025

Choose a reason for hiding this comment

Uh oh!

AaronBallman Nov 4, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Fznamznon Oct 29, 2025 •

edited

Loading