Skip to content

Conversation

@alexpaniman
Copy link

When using -Xclang -dump-tokens, the lexer dump output is currently difficult to read because the data are misaligned. The existing implementation simply separates the token name, spelling, flags, and location using '\t', which results in inconsistent spacing.

For example, the current output looks like this on provided in this patch example (BEFORE THIS PR):

image

Changes

This small PR improves the readability of the token dump by:

  • Adding padding after the token name and after the spelling (the padding amount was chosen empirically to produce good average alignment).
  • Swapping the order of location and flags (since flags can take up a lot of space and disrupt alignment).

The result is a more readable output (AFTER THIS PR):

image

@github-actions
Copy link

Thank you for submitting a Pull Request (PR) to the LLVM Project!

This PR will be automatically labeled and the relevant teams will be notified.

If you wish to, you can add reviewers by using the "Reviewers" section on this page.

If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using @ followed by their GitHub username.

If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers.

If you have further questions, they may be answered by the LLVM GitHub User Guide.

You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums.

@llvmbot llvmbot added clang Clang issues not falling into any other category clang:frontend Language frontend issues, e.g. anything involving "Sema" labels Oct 23, 2025
@llvmbot
Copy link
Member

llvmbot commented Oct 23, 2025

@llvm/pr-subscribers-clang

Author: None (alexpaniman)

Changes

When using -Xclang -dump-tokens, the lexer dump output is currently difficult to read because the data are misaligned. The existing implementation simply separates the token name, spelling, flags, and location using '\t', which results in inconsistent spacing.

For example, the current output looks like this on provided in this patch example (BEFORE THIS PR):

<img width="2936" height="632" alt="image" src="https://github.com/user-attachments/assets/ad893958-6d57-4a76-8838-7fc56e37e6a7" />

Changes

This small PR improves the readability of the token dump by:

  • Adding padding after the token name and after the spelling (the padding amount was chosen empirically to produce good average alignment).
  • Swapping the order of location and flags (since flags can take up a lot of space and disrupt alignment).

The result is a more readable output (AFTER THIS PR):

<img width="1470" height="315" alt="image" src="https://github.com/user-attachments/assets/c24f24e5-a431-42cc-b5b6-232bac5c635e" />


Full diff: https://github.com/llvm/llvm-project/pull/164894.diff

2 Files Affected:

  • (modified) clang/lib/Lex/Preprocessor.cpp (+11-8)
  • (added) clang/test/Preprocessor/dump-tokens.cpp (+16)
diff --git a/clang/lib/Lex/Preprocessor.cpp b/clang/lib/Lex/Preprocessor.cpp
index e003ad3a95570..fcf2369453d47 100644
--- a/clang/lib/Lex/Preprocessor.cpp
+++ b/clang/lib/Lex/Preprocessor.cpp
@@ -59,6 +59,7 @@
 #include "llvm/ADT/StringRef.h"
 #include "llvm/Support/Capacity.h"
 #include "llvm/Support/ErrorHandling.h"
+#include "llvm/Support/FormatVariadic.h"
 #include "llvm/Support/MemoryBuffer.h"
 #include "llvm/Support/raw_ostream.h"
 #include <algorithm>
@@ -234,14 +235,20 @@ void Preprocessor::FinalizeForModelFile() {
 }
 
 void Preprocessor::DumpToken(const Token &Tok, bool DumpFlags) const {
-  llvm::errs() << tok::getTokenName(Tok.getKind());
+  llvm::errs() << llvm::formatv("{0,-16} ", tok::getTokenName(Tok.getKind()));
 
-  if (!Tok.isAnnotation())
-    llvm::errs() << " '" << getSpelling(Tok) << "'";
+  std::string Spelling;
+  if (!Tok.isAnnotation()) {
+    Spelling = llvm::formatv("{0,-32} ", "'" + getSpelling(Tok) + "'");
+  }
+  llvm::errs() << Spelling;
 
   if (!DumpFlags) return;
 
-  llvm::errs() << "\t";
+  llvm::errs() << "Loc=<";
+  DumpLocation(Tok.getLocation());
+  llvm::errs() << ">";
+
   if (Tok.isAtStartOfLine())
     llvm::errs() << " [StartOfLine]";
   if (Tok.hasLeadingSpace())
@@ -253,10 +260,6 @@ void Preprocessor::DumpToken(const Token &Tok, bool DumpFlags) const {
     llvm::errs() << " [UnClean='" << StringRef(Start, Tok.getLength())
                  << "']";
   }
-
-  llvm::errs() << "\tLoc=<";
-  DumpLocation(Tok.getLocation());
-  llvm::errs() << ">";
 }
 
 void Preprocessor::DumpLocation(SourceLocation Loc) const {
diff --git a/clang/test/Preprocessor/dump-tokens.cpp b/clang/test/Preprocessor/dump-tokens.cpp
new file mode 100644
index 0000000000000..3774894943b87
--- /dev/null
+++ b/clang/test/Preprocessor/dump-tokens.cpp
@@ -0,0 +1,16 @@
+// RUN: %clang_cc1 -dump-tokens %s 2>&1 | FileCheck %s
+
+->                           // CHECK: arrow            '->'
+5                            // CHECK: numeric_constant '5'
+id                           // CHECK: identifier       'id'
+&                            // CHECK: amp              '&'
+)                            // CHECK: r_paren          ')'
+unsigned                     // CHECK: unsigned         'unsigned'
+~                            // CHECK: tilde            '~'
+long_variable_name_very_long // CHECK: identifier       'long_variable_name_very_long'
+union                        // CHECK: union            'union'
+42                           // CHECK: numeric_constant '42'
+j                            // CHECK: identifier       'j'
+&=                           // CHECK: ampequal         '&='
+15                           // CHECK: numeric_constant '15'
+

Comment on lines 240 to 244
std::string Spelling;
if (!Tok.isAnnotation()) {
Spelling = llvm::formatv("{0,-32} ", "'" + getSpelling(Tok) + "'");
}
llvm::errs() << Spelling;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why a new variable?

Suggested change
std::string Spelling;
if (!Tok.isAnnotation()) {
Spelling = llvm::formatv("{0,-32} ", "'" + getSpelling(Tok) + "'");
}
llvm::errs() << Spelling;
if (!Tok.isAnnotation())
llvm::errs() << llvm::formatv("{0,-32} ", "'" + getSpelling(Tok) + "'");

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, not necessary, fixed it

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I remembered, I probably wanted to have consistent spacing for annotations (for which there is no spelling) too. Changed it to work as I intended.

Copy link
Contributor

@Fznamznon Fznamznon Oct 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are annotation tokens ever printed this way? If yes, could you please add a test with an example?

@alexpaniman alexpaniman requested a review from Fznamznon October 24, 2025 17:14
Copy link
Collaborator

@AaronBallman AaronBallman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this, I really like the new output compared to the old!

Spelling = "'" + getSpelling(Tok) + "'";
}

llvm::errs() << llvm::formatv("{0,-32} ", Spelling);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this line be included in the !Tok.isAnnotation() block? Otherwise, we're intentionally printing an empty string?

@@ -0,0 +1,16 @@
// RUN: %clang_cc1 -dump-tokens %s 2>&1 | FileCheck %s
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// RUN: %clang_cc1 -dump-tokens %s 2>&1 | FileCheck %s
// RUN: %clang_cc1 -dump-tokens %s 2>&1 | FileCheck %s --strict-whitespace

This way we can test that the whitespace is actually honored (https://llvm.org/docs/CommandGuide/FileCheck.html#cmdoption-FileCheck-strict-whitespace).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

clang:frontend Language frontend issues, e.g. anything involving "Sema" clang Clang issues not falling into any other category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants