[clang-format] Add an option to format numeric literal case #151590

30Wedge · 2025-07-31T20:41:08Z

Some languages have the flexibility to use upper or lower case characters interchangeably in integer and float literal definitions.

I'd like to be able to enforce a consistent case style in one of my projects, so I added this clang-format style option to control it.

With this .clang-format configuration:

    NumericLiteralCaseStyle:
      UpperCasePrefix: Never
      UpperCaseHexDigit: Always
      UpperCaseSuffix: Never

This line of code:

    unsigned long long  0XdEaDbEeFUll;

gets reformatted into this line of code:

    unsigned long long 0xDEAFBEEFull;

I'm new to this project, so please let me know if I missed something in the process. I modeled this PR from IntegerLiteralSeparatorFixer

github-actions · 2025-07-31T20:41:26Z

Thank you for submitting a Pull Request (PR) to the LLVM Project!

This PR will be automatically labeled and the relevant teams will be notified.

If you wish to, you can add reviewers by using the "Reviewers" section on this page.

If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using @ followed by their GitHub username.

If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers.

If you have further questions, they may be answered by the LLVM GitHub User Guide.

You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums.

llvmbot · 2025-07-31T20:41:52Z

@llvm/pr-subscribers-clang

Author: Andy MacGregor (30Wedge)

Changes

Some languages have the flexibility to use upper or lower case characters interchangeably in integer and float literal definitions.

I'd like to be able to enforce a consistent case style in one of my projects, so I added this clang-format style option to control it.

With this .clang-format configuration:

    NumericLiteralCaseStyle:
      PrefixCase: -1
      HexDigitCase: 1
      SuffixCase: -1

This line of code:

    unsigned long long  0XdEaDbEeFUll;

gets reformatted into this line of code:

    unsigned long long 0xDEAFBEEFull;

I'm new to this project, so please let me know if I missed something in the process. I modeled this PR from IntegerLiteralSeparatorFixer

Patch is 34.32 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/151590.diff

9 Files Affected:

(modified) clang/docs/ClangFormatStyleOptions.rst (+72-1)
(modified) clang/docs/ReleaseNotes.rst (+2)
(modified) clang/include/clang/Format/Format.h (+49)
(modified) clang/lib/Format/CMakeLists.txt (+1)
(modified) clang/lib/Format/Format.cpp (+19)
(added) clang/lib/Format/NumericLiteralCaseFixer.cpp (+368)
(added) clang/lib/Format/NumericLiteralCaseFixer.h (+32)
(modified) clang/unittests/Format/CMakeLists.txt (+1)
(added) clang/unittests/Format/NumericLiteralCaseTest.cpp (+354)

diff --git a/clang/docs/ClangFormatStyleOptions.rst b/clang/docs/ClangFormatStyleOptions.rst
index 02986a94a656c..abc73b0ae183c 100644
--- a/clang/docs/ClangFormatStyleOptions.rst
+++ b/clang/docs/ClangFormatStyleOptions.rst
@@ -4555,7 +4555,6 @@ the configuration (without a prefix: ``Auto``).
     So inserting a trailing comma counteracts bin-packing.
 
 
-
 .. _IntegerLiteralSeparator:
 
 **IntegerLiteralSeparator** (``IntegerLiteralSeparatorStyle``) :versionbadge:`clang-format 16` :ref:`¶ <IntegerLiteralSeparator>`
@@ -5076,6 +5075,78 @@ the configuration (without a prefix: ``Auto``).
 
   For example: TESTSUITE
 
+.. _NumericLiteralCase:
+
+**NumericLiteralCase** (``NumericLiteralCaseStyle``) :versionbadge:`clang-format 21` :ref:`¶ <NumericLiteralCase>`
+  Controls character case in numeric literals.
+
+  Possible values for each nexted configuration flag:
+
+  * ``0`` (Default) Do not modify characters.
+
+  * ``-1`` Convert characters to lower case.
+
+  * ``1`` Convert characters to upper case.
+
+  .. code-block:: yaml
+
+    # Example of usage:
+    NumericLiteralCaseStyle:
+      PrefixCase: -1
+      HexDigitCase: 1
+      FloatExponentSeparatorCase: 0
+      SuffixCase: -1
+
+  .. code-block:: c++
+
+    // Lower case prefix, upper case hexadecimal digits, lower case suffix
+    unsigned int 0xDEAFBEEFull;
+
+  Nested configuration flags:
+
+  * ``int PrefixCase`` Control numeric constant prefix case.
+
+   .. code-block:: c++
+
+      // PrefixCase: 1
+      int a = 0B101 | 0XF0;
+      // PrefixCase: -1
+      int a = 0b101 | 0xF0;
+      // PrefixCase: 0
+      int c = 0b101 | 0XF0;
+
+  * ``int HexDigitCase`` Control hexadecimal digit case.
+
+    .. code-block:: c++
+
+      // HexDigitCase: 1
+      int a = 0xBEAD;
+      // PrefixCase: -1
+      int b = 0xbead;
+      // PrefixCase: 0
+      int c = 0xBeAd;
+
+  * ``int FloatExponentSeparatorCase`` Control exponent separator case.
+
+    .. code-block:: c++
+
+      // FloatExponentSeparatorCase: 1
+      float a = 6.02E+23;
+      // FloatExponentSeparatorCase: -1
+      float b = 6.02e+23;
+
+  * ``int SuffixCase`` Control suffix case.
+
+    .. code-block:: c++
+
+      // SuffixCase: 1
+      unsigned long long a = 1ULL;
+      // SuffixCase: -1
+      unsigned long long a = 1ull;
+      // SuffixCase: 0
+      unsigned long long c = 1uLL;
+
+
 .. _ObjCBinPackProtocolList:
 
 **ObjCBinPackProtocolList** (``BinPackStyle``) :versionbadge:`clang-format 7` :ref:`¶ <ObjCBinPackProtocolList>`
diff --git a/clang/docs/ReleaseNotes.rst b/clang/docs/ReleaseNotes.rst
index 4a2edae7509de..f45363f86c135 100644
--- a/clang/docs/ReleaseNotes.rst
+++ b/clang/docs/ReleaseNotes.rst
@@ -230,6 +230,8 @@ AST Matchers
 
 clang-format
 ------------
+- Add ``NumericLiteralCase`` option for for enforcing character case in
+  numeric literals.
 
 libclang
 --------
diff --git a/clang/include/clang/Format/Format.h b/clang/include/clang/Format/Format.h
index 31582a40de866..301db5012b980 100644
--- a/clang/include/clang/Format/Format.h
+++ b/clang/include/clang/Format/Format.h
@@ -3100,6 +3100,54 @@ struct FormatStyle {
   /// \version 11
   TrailingCommaStyle InsertTrailingCommas;
 
+  /// Character case format for different components of a numeric literal.
+  ///
+  /// For all options, ``0`` leave the case unchanged, ``-1``
+  /// uses lower case and, ``1`` uses upper case.
+  ///
+  struct NumericLiteralCaseStyle {
+    /// Format numeric constant prefixes.
+    /// \code{.text}
+    ///   /* -1: lower case */ b = 0x01;
+    ///   /*  0: don't care */
+    ///   /*  1: upper case */ b = 0X01;
+    /// \endcode
+    int8_t PrefixCase;
+    /// Format hexadecimal digit case.
+    /// \code{.text}
+    ///   /* -1: lower case */ b = 0xabcdef;
+    ///   /*  0: don't care */
+    ///   /*  1: upper case */ b = 0xABCDEF;
+    /// \endcode
+    int8_t HexDigitCase;
+    /// Format exponent separator character case in floating point literals.
+    /// \code{.text}
+    ///   /* -1: lower case */ b = 6.02e23;
+    ///   /*  0: don't care */
+    ///   /*  1: upper case */ b = 6.02E23;
+    /// \endcode
+    int8_t FloatExponentSeparatorCase;
+    /// Format suffix case. This option excludes case-specific reserved
+    /// suffixes, such as ``min`` in C++.
+    /// \code{.text}
+    ///   /* -1: lower case */ b = 10u;
+    ///   /*  0: don't care */
+    ///   /*  1: upper case */ b = 10U;
+    /// \endcode
+    int8_t SuffixCase;
+
+    bool operator==(const NumericLiteralCaseStyle &R) const {
+      return PrefixCase == R.PrefixCase && HexDigitCase == R.HexDigitCase &&
+             FloatExponentSeparatorCase == R.FloatExponentSeparatorCase &&
+             SuffixCase == R.SuffixCase;
+    }
+  };
+
+  /// Format numeric literals for languages that support flexible character case
+  /// in numeric literal constants.
+  /// \version 22
+  NumericLiteralCaseStyle NumericLiteralCase;
+
   /// Separator format of integer literals of different bases.
   ///
   /// If negative, remove separators. If  ``0``, leave the literal as is. If
@@ -5424,6 +5472,7 @@ struct FormatStyle {
            IndentWrappedFunctionNames == R.IndentWrappedFunctionNames &&
            InsertBraces == R.InsertBraces &&
            InsertNewlineAtEOF == R.InsertNewlineAtEOF &&
+           NumericLiteralCase == R.NumericLiteralCase &&
            IntegerLiteralSeparator == R.IntegerLiteralSeparator &&
            JavaImportGroups == R.JavaImportGroups &&
            JavaScriptQuotes == R.JavaScriptQuotes &&
diff --git a/clang/lib/Format/CMakeLists.txt b/clang/lib/Format/CMakeLists.txt
index 9f4939824fdb8..a003f1a951af6 100644
--- a/clang/lib/Format/CMakeLists.txt
+++ b/clang/lib/Format/CMakeLists.txt
@@ -13,6 +13,7 @@ add_clang_library(clangFormat
   MacroExpander.cpp
   MatchFilePath.cpp
   NamespaceEndCommentsFixer.cpp
+  NumericLiteralCaseFixer.cpp
   ObjCPropertyAttributeOrderFixer.cpp
   QualifierAlignmentFixer.cpp
   SortJavaScriptImports.cpp
diff --git a/clang/lib/Format/Format.cpp b/clang/lib/Format/Format.cpp
index 063780721423f..711a3e7501328 100644
--- a/clang/lib/Format/Format.cpp
+++ b/clang/lib/Format/Format.cpp
@@ -16,6 +16,7 @@
 #include "DefinitionBlockSeparator.h"
 #include "IntegerLiteralSeparatorFixer.h"
 #include "NamespaceEndCommentsFixer.h"
+#include "NumericLiteralCaseFixer.h"
 #include "ObjCPropertyAttributeOrderFixer.h"
 #include "QualifierAlignmentFixer.h"
 #include "SortJavaScriptImports.h"
@@ -382,6 +383,16 @@ struct ScalarEnumerationTraits<FormatStyle::IndentExternBlockStyle> {
   }
 };
 
+template <> struct MappingTraits<FormatStyle::NumericLiteralCaseStyle> {
+  static void mapping(IO &IO, FormatStyle::NumericLiteralCaseStyle &Base) {
+    IO.mapOptional("PrefixCase", Base.PrefixCase);
+    IO.mapOptional("HexDigitCase", Base.HexDigitCase);
+    IO.mapOptional("FloatExponentSeparatorCase",
+                   Base.FloatExponentSeparatorCase);
+    IO.mapOptional("SuffixCase", Base.SuffixCase);
+  }
+};
+
 template <> struct MappingTraits<FormatStyle::IntegerLiteralSeparatorStyle> {
   static void mapping(IO &IO, FormatStyle::IntegerLiteralSeparatorStyle &Base) {
     IO.mapOptional("Binary", Base.Binary);
@@ -1093,6 +1104,7 @@ template <> struct MappingTraits<FormatStyle> {
     IO.mapOptional("InsertBraces", Style.InsertBraces);
     IO.mapOptional("InsertNewlineAtEOF", Style.InsertNewlineAtEOF);
     IO.mapOptional("InsertTrailingCommas", Style.InsertTrailingCommas);
+    IO.mapOptional("NumericLiteralCase", Style.NumericLiteralCase);
     IO.mapOptional("IntegerLiteralSeparator", Style.IntegerLiteralSeparator);
     IO.mapOptional("JavaImportGroups", Style.JavaImportGroups);
     IO.mapOptional("JavaScriptQuotes", Style.JavaScriptQuotes);
@@ -1618,6 +1630,9 @@ FormatStyle getLLVMStyle(FormatStyle::LanguageKind Language) {
   LLVMStyle.InsertBraces = false;
   LLVMStyle.InsertNewlineAtEOF = false;
   LLVMStyle.InsertTrailingCommas = FormatStyle::TCS_None;
+  LLVMStyle.NumericLiteralCase = {/*PrefixCase=*/0, /*HexDigitCase=*/0,
+                                  /*FloatExponentSeparatorCase=*/0,
+                                  /*SuffixCase=*/0};
   LLVMStyle.IntegerLiteralSeparator = {
       /*Binary=*/0,  /*BinaryMinDigits=*/0,
       /*Decimal=*/0, /*DecimalMinDigits=*/0,
@@ -3872,6 +3887,10 @@ reformat(const FormatStyle &Style, StringRef Code,
     return IntegerLiteralSeparatorFixer().process(Env, Expanded);
   });
 
+  Passes.emplace_back([&](const Environment &Env) {
+    return NumericLiteralCaseFixer().process(Env, Expanded);
+  });
+
   if (Style.isCpp()) {
     if (Style.QualifierAlignment != FormatStyle::QAS_Leave)
       addQualifierAlignmentFixerPasses(Expanded, Passes);
diff --git a/clang/lib/Format/NumericLiteralCaseFixer.cpp b/clang/lib/Format/NumericLiteralCaseFixer.cpp
new file mode 100644
index 0000000000000..88adaf83fe381
--- /dev/null
+++ b/clang/lib/Format/NumericLiteralCaseFixer.cpp
@@ -0,0 +1,368 @@
+//===--- NumericLiteralCaseFixer.cpp -----------------------*- C++ -*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+///
+/// \file
+/// This file implements NumericLiteralCaseFixer that standardizes character
+/// case within numeric literal constants.
+///
+//===----------------------------------------------------------------------===//
+
+#include "NumericLiteralCaseFixer.h"
+
+#include "llvm/ADT/StringExtras.h"
+
+#include <algorithm>
+
+namespace clang {
+namespace format {
+
+using CharTransformFn = char (*)(char C);
+namespace {
+
+/// @brief Collection of std::transform predicates for each part of a numeric
+/// literal
+struct FormatParameters {
+  FormatParameters(FormatStyle::LanguageKind Language,
+                   const FormatStyle::NumericLiteralCaseStyle &CaseStyle);
+
+  CharTransformFn Prefix;
+  CharTransformFn HexDigit;
+  CharTransformFn FloatExponentSeparator;
+  CharTransformFn Suffix;
+
+  char Separator;
+};
+
+/// @brief Parse a single numeric constant from text into ranges that are
+/// appropriate for applying NumericLiteralCaseStyle rules.
+class QuickNumericalConstantParser {
+public:
+  QuickNumericalConstantParser(const StringRef &IntegerLiteral,
+                               const FormatParameters &Transforms);
+
+  /// @brief Reformats the numeric constant if needed.
+  /// Calling this method invalidates the object's state.
+  /// @return std::nullopt if no reformatting is required. std::option<>
+  /// containing the reformatted string otherwise.
+  std::optional<std::string> formatIfNeeded() &&;
+
+private:
+  const StringRef &IntegerLiteral;
+  const FormatParameters &Transforms;
+
+  std::string Formatted;
+
+  std::string::iterator PrefixBegin;
+  std::string::iterator PrefixEnd;
+  std::string::iterator HexDigitBegin;
+  std::string::iterator HexDigitEnd;
+  std::string::iterator FloatExponentSeparatorBegin;
+  std::string::iterator FloatExponentSeparatorEnd;
+  std::string::iterator SuffixBegin;
+  std::string::iterator SuffixEnd;
+
+  void parse();
+  void applyFormatting();
+};
+
+} // namespace
+
+static char noOpTransform(char C) { return C; }
+
+static CharTransformFn getTransform(int8_t config_value) {
+  switch (config_value) {
+  case -1:
+    return llvm::toLower;
+  case 1:
+    return llvm::toUpper;
+  default:
+    return noOpTransform;
+  }
+}
+
+/// @brief Test if Suffix matches a C++ literal reserved by the library.
+/// Matches against all suffixes reserved in the C++23 standard
+static bool matchesReservedSuffix(StringRef Suffix) {
+  static const std::set<StringRef> ReservedSuffixes = {
+      "h", "min", "s", "ms", "us", "ns", "il", "i", "if", "d", "y",
+  };
+
+  return ReservedSuffixes.find(Suffix) != ReservedSuffixes.end();
+}
+
+FormatParameters::FormatParameters(
+    FormatStyle::LanguageKind Language,
+    const FormatStyle::NumericLiteralCaseStyle &CaseStyle)
+    : Prefix(getTransform(CaseStyle.PrefixCase)),
+      HexDigit(getTransform(CaseStyle.HexDigitCase)),
+      FloatExponentSeparator(
+          getTransform(CaseStyle.FloatExponentSeparatorCase)),
+      Suffix(getTransform(CaseStyle.SuffixCase)) {
+  switch (Language) {
+  case FormatStyle::LK_CSharp:
+  case FormatStyle::LK_Java:
+  case FormatStyle::LK_JavaScript:
+    Separator = '_';
+    break;
+  case FormatStyle::LK_C:
+  case FormatStyle::LK_Cpp:
+  case FormatStyle::LK_ObjC:
+  default:
+    Separator = '\'';
+  }
+}
+
+QuickNumericalConstantParser::QuickNumericalConstantParser(
+    const StringRef &IntegerLiteral, const FormatParameters &Transforms)
+    : IntegerLiteral(IntegerLiteral), Transforms(Transforms),
+      Formatted(IntegerLiteral), PrefixBegin(Formatted.begin()),
+      PrefixEnd(Formatted.begin()), HexDigitBegin(Formatted.begin()),
+      HexDigitEnd(Formatted.begin()),
+      FloatExponentSeparatorBegin(Formatted.begin()),
+      FloatExponentSeparatorEnd(Formatted.begin()),
+      SuffixBegin(Formatted.begin()), SuffixEnd(Formatted.begin()) {}
+
+void QuickNumericalConstantParser::parse() {
+  auto Cur = Formatted.begin();
+  auto End = Formatted.cend();
+
+  bool IsHex = false;
+  bool IsFloat = false;
+
+  // Find the range that contains the prefix.
+  PrefixBegin = Cur;
+  if (*Cur != '0') {
+  } else {
+    ++Cur;
+    const char C = *Cur;
+    switch (C) {
+    case 'x':
+    case 'X':
+      IsHex = true;
+      ++Cur;
+      break;
+    case 'b':
+    case 'B':
+      ++Cur;
+      break;
+    case 'o':
+    case 'O':
+      // Javascript uses 0o as octal prefix.
+      ++Cur;
+      break;
+    default:
+      break;
+    }
+  }
+  PrefixEnd = Cur;
+
+  // Find the range that contains hex digits.
+  HexDigitBegin = Cur;
+  if (IsHex) {
+    while (Cur != End) {
+      const char C = *Cur;
+      if (llvm::isHexDigit(C)) {
+      } else if (C == Transforms.Separator) {
+      } else if (C == '.') {
+        IsFloat = true;
+      } else {
+        break;
+      }
+      ++Cur;
+    }
+  }
+  HexDigitEnd = Cur;
+  if (Cur == End)
+    return;
+
+  // Find the range that contains a floating point exponent separator.
+  // Hex digits have already been scanned through the decimal point.
+  // Decimal/octal/binary literals must fast forward through the decimal first.
+  if (!IsHex) {
+    while (Cur != End) {
+      const char C = *Cur;
+      if (llvm::isDigit(C)) {
+      } else if (C == Transforms.Separator) {
+      } else if (C == '.') {
+        IsFloat = true;
+      } else {
+        break;
+      }
+      ++Cur;
+    }
+  }
+
+  const char LSep = IsHex ? 'p' : 'e';
+  const char USep = IsHex ? 'P' : 'E';
+  // The next character of a floating point literal will either be the
+  // separator, or the start of a suffix.
+  FloatExponentSeparatorBegin = Cur;
+  if (IsFloat) {
+    const char C = *Cur;
+    if ((C == LSep) || (C == USep))
+      ++Cur;
+  }
+  FloatExponentSeparatorEnd = Cur;
+  if (Cur == End)
+    return;
+
+  // Fast forward through the exponent part of a floating point literal.
+  if (!IsFloat) {
+  } else if (FloatExponentSeparatorBegin == FloatExponentSeparatorEnd) {
+  } else {
+    while (Cur != End) {
+      const char C = *Cur;
+      if (llvm::isDigit(C)) {
+      } else if (C == '+') {
+      } else if (C == '-') {
+      } else {
+        break;
+      }
+      ++Cur;
+    }
+  }
+  if (Cur == End)
+    return;
+
+  // Find the range containing a suffix if any.
+  SuffixBegin = Cur;
+  size_t const SuffixLen = End - Cur;
+  StringRef suffix(&(*SuffixBegin), SuffixLen);
+  if (!matchesReservedSuffix(suffix)) {
+    while (Cur != End) {
+      const char C = *Cur;
+      if (C == '_') {
+        // In C++, it is idiomatic, but NOT standard to define user-defined
+        // literals with a leading '_'. Omit user defined literals from
+        // transformation.
+        break;
+      } else {
+      }
+      ++Cur;
+    }
+  }
+  SuffixEnd = Cur;
+}
+
+void QuickNumericalConstantParser::applyFormatting() {
+
+  auto Start = Formatted.cbegin();
+  auto End = Formatted.cend();
+
+  assert((Start <= PrefixBegin) && (End >= PrefixBegin) &&
+         "PrefixBegin is out of bounds");
+  assert((Start <= PrefixEnd) && (End >= PrefixEnd) &&
+         "PrefixEnd is out of bounds");
+  assert((Start <= HexDigitBegin) && (End >= HexDigitBegin) &&
+         "HexDigitBegin is out of bounds");
+  assert((Start <= HexDigitEnd) && (End >= HexDigitEnd) &&
+         "HexDigitEnd is out of bounds");
+  assert((Start <= FloatExponentSeparatorBegin) &&
+         (End >= FloatExponentSeparatorBegin) &&
+         "FloatExponentSeparatorBegin is out of bounds");
+  assert((Start <= FloatExponentSeparatorEnd) &&
+         (End >= FloatExponentSeparatorEnd) &&
+         "FloatExponentSeparatorEnd is out of bounds");
+  assert((Start <= SuffixBegin) && (End >= SuffixBegin) &&
+         "SuffixBegin is out of bounds");
+  assert((Start <= SuffixEnd) && (End >= SuffixEnd) &&
+         "SuffixEnd is out of bounds");
+
+  std::transform(PrefixBegin, PrefixEnd, PrefixBegin, Transforms.Prefix);
+  std::transform(HexDigitBegin, HexDigitEnd, HexDigitBegin,
+                 Transforms.HexDigit);
+  std::transform(FloatExponentSeparatorBegin, FloatExponentSeparatorEnd,
+                 FloatExponentSeparatorBegin,
+                 Transforms.FloatExponentSeparator);
+  std::transform(SuffixBegin, SuffixEnd, SuffixBegin, Transforms.Suffix);
+}
+
+std::optional<std::string> QuickNumericalConstantParser::formatIfNeeded() && {
+  parse();
+  applyFormatting();
+
+  return (Formatted == IntegerLiteral)
+             ? std::nullopt
+             : std::make_optional<std::string>(std::move(Formatted));
+}
+
+std::pair<tooling::Replacements, unsigned>
+NumericLiteralCaseFixer::process(const Environment &Env,
+                                 const FormatStyle &Style) {
+  switch (Style.Language) {
+  case FormatStyle::LK_C:
+  case FormatStyle::LK_Cpp:
+  case FormatStyle::LK_ObjC:
+  case FormatStyle::LK_CSharp:
+  case FormatStyle::LK_Java:
+  case FormatStyle::LK_JavaScript:
+    break;
+  default:
+    return {};
+  }
+
+  const auto &CaseStyle = Style.NumericLiteralCase;
+
+  const FormatStyle::NumericLiteralCaseStyle no_case_style{};
+  const bool SkipCaseFormatting = CaseStyle == no_case_style;
+
+  if (SkipCaseFormatting)
+    return {};
+
+  const FormatParameters Transforms{Style.Language, CaseStyle};
+
+  const auto &SourceMgr = Env.getSourceManager();
+  AffectedRangeManager AffectedRangeMgr(SourceMgr, Env.getCharRanges());
+
+  const auto ID = Env.getFileID();
+  const auto LangOpts = getFormattingLangOpts(Style);
+  Lexer Lex(ID, SourceMgr.getBufferOrFake(ID), SourceMgr, LangOpts);
+  Lex.SetCommentRetentionState(true);
+
+  Token Tok;
+  tooling::Replacements Result;
+  bool Skip = false;
+
+  while (!Lex.LexFromRawLexer(Tok)) {
+    // Skip tokens that are too small to contain a formattable literal.
+    auto Length = Tok.getLength();
+    if (Length < 2)
+      continue;
+
+    // Service clang-format off/on comments.
+    auto Location = Tok.getLocation();
+    auto Text = StringRef(SourceMgr.getCharacterData(Location), Length);
+    if (Tok.is(tok::comment)) {
+      if (isClangFormatOff(Text))
+        Skip = true;
+      else if (isClangFormatOn(Text))
+        Skip = false;
+      continue;
+    }
+
+    if (Skip || Tok.isNot(tok::numeric_constant) ||
+        !AffectedRangeMgr.affectsCharSourceRange(
+            CharSourceRange::getCharRange(Location, Tok.getEndLoc()))) {
+      continue;
+    }
+
+    const auto Formatted =
+        QuickNumericalConstantParser(Text, Transforms).formatIfNeeded();
+    if (Formatted) {
+      assert(*Formatted != Text && "QuickNumericalConstantParser returned an "
+                                   "unchanged value instead of nullopt");
+      cantFail(Result.add(
+          tooling::Replacement(SourceMgr, Location, Length, *Formatted)));
+    }
+  }
+
+  return {Result, 0};
+}
+
+} // namespace format
+} // namespace clang
diff --git a/clang/lib/Format/NumericLiteralCaseFixer.h b/clang/lib/Format/NumericLiteralCaseFixer.h
new file mode 100644
index 0000000000000..265d7343c468b
--- /dev/null
+++ b/clang/lib/Format/NumericLiteralCaseFixer.h
@@ -0,0 +1,32 @@
+//===--- NumericLiteralCaseFixer.h -------------------------*- C++ -*-===//
+//
+// Part of the LLVM Projec...
[truncated]

llvmbot · 2025-07-31T20:41:53Z

@llvm/pr-subscribers-clang-format

Author: Andy MacGregor (30Wedge)

Changes

Some languages have the flexibility to use upper or lower case characters interchangeably in integer and float literal definitions.

I'd like to be able to enforce a consistent case style in one of my projects, so I added this clang-format style option to control it.

With this .clang-format configuration:

    NumericLiteralCaseStyle:
      PrefixCase: -1
      HexDigitCase: 1
      SuffixCase: -1

This line of code:

    unsigned long long  0XdEaDbEeFUll;

gets reformatted into this line of code:

    unsigned long long 0xDEAFBEEFull;

I'm new to this project, so please let me know if I missed something in the process. I modeled this PR from IntegerLiteralSeparatorFixer

Patch is 34.32 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/151590.diff

9 Files Affected:

(modified) clang/docs/ClangFormatStyleOptions.rst (+72-1)
(modified) clang/docs/ReleaseNotes.rst (+2)
(modified) clang/include/clang/Format/Format.h (+49)
(modified) clang/lib/Format/CMakeLists.txt (+1)
(modified) clang/lib/Format/Format.cpp (+19)
(added) clang/lib/Format/NumericLiteralCaseFixer.cpp (+368)
(added) clang/lib/Format/NumericLiteralCaseFixer.h (+32)
(modified) clang/unittests/Format/CMakeLists.txt (+1)
(added) clang/unittests/Format/NumericLiteralCaseTest.cpp (+354)

diff --git a/clang/docs/ClangFormatStyleOptions.rst b/clang/docs/ClangFormatStyleOptions.rst
index 02986a94a656c..abc73b0ae183c 100644
--- a/clang/docs/ClangFormatStyleOptions.rst
+++ b/clang/docs/ClangFormatStyleOptions.rst
@@ -4555,7 +4555,6 @@ the configuration (without a prefix: ``Auto``).
     So inserting a trailing comma counteracts bin-packing.
 
 
-
 .. _IntegerLiteralSeparator:
 
 **IntegerLiteralSeparator** (``IntegerLiteralSeparatorStyle``) :versionbadge:`clang-format 16` :ref:`¶ <IntegerLiteralSeparator>`
@@ -5076,6 +5075,78 @@ the configuration (without a prefix: ``Auto``).
 
   For example: TESTSUITE
 
+.. _NumericLiteralCase:
+
+**NumericLiteralCase** (``NumericLiteralCaseStyle``) :versionbadge:`clang-format 21` :ref:`¶ <NumericLiteralCase>`
+  Controls character case in numeric literals.
+
+  Possible values for each nexted configuration flag:
+
+  * ``0`` (Default) Do not modify characters.
+
+  * ``-1`` Convert characters to lower case.
+
+  * ``1`` Convert characters to upper case.
+
+  .. code-block:: yaml
+
+    # Example of usage:
+    NumericLiteralCaseStyle:
+      PrefixCase: -1
+      HexDigitCase: 1
+      FloatExponentSeparatorCase: 0
+      SuffixCase: -1
+
+  .. code-block:: c++
+
+    // Lower case prefix, upper case hexadecimal digits, lower case suffix
+    unsigned int 0xDEAFBEEFull;
+
+  Nested configuration flags:
+
+  * ``int PrefixCase`` Control numeric constant prefix case.
+
+   .. code-block:: c++
+
+      // PrefixCase: 1
+      int a = 0B101 | 0XF0;
+      // PrefixCase: -1
+      int a = 0b101 | 0xF0;
+      // PrefixCase: 0
+      int c = 0b101 | 0XF0;
+
+  * ``int HexDigitCase`` Control hexadecimal digit case.
+
+    .. code-block:: c++
+
+      // HexDigitCase: 1
+      int a = 0xBEAD;
+      // PrefixCase: -1
+      int b = 0xbead;
+      // PrefixCase: 0
+      int c = 0xBeAd;
+
+  * ``int FloatExponentSeparatorCase`` Control exponent separator case.
+
+    .. code-block:: c++
+
+      // FloatExponentSeparatorCase: 1
+      float a = 6.02E+23;
+      // FloatExponentSeparatorCase: -1
+      float b = 6.02e+23;
+
+  * ``int SuffixCase`` Control suffix case.
+
+    .. code-block:: c++
+
+      // SuffixCase: 1
+      unsigned long long a = 1ULL;
+      // SuffixCase: -1
+      unsigned long long a = 1ull;
+      // SuffixCase: 0
+      unsigned long long c = 1uLL;
+
+
 .. _ObjCBinPackProtocolList:
 
 **ObjCBinPackProtocolList** (``BinPackStyle``) :versionbadge:`clang-format 7` :ref:`¶ <ObjCBinPackProtocolList>`
diff --git a/clang/docs/ReleaseNotes.rst b/clang/docs/ReleaseNotes.rst
index 4a2edae7509de..f45363f86c135 100644
--- a/clang/docs/ReleaseNotes.rst
+++ b/clang/docs/ReleaseNotes.rst
@@ -230,6 +230,8 @@ AST Matchers
 
 clang-format
 ------------
+- Add ``NumericLiteralCase`` option for for enforcing character case in
+  numeric literals.
 
 libclang
 --------
diff --git a/clang/include/clang/Format/Format.h b/clang/include/clang/Format/Format.h
index 31582a40de866..301db5012b980 100644
--- a/clang/include/clang/Format/Format.h
+++ b/clang/include/clang/Format/Format.h
@@ -3100,6 +3100,54 @@ struct FormatStyle {
   /// \version 11
   TrailingCommaStyle InsertTrailingCommas;
 
+  /// Character case format for different components of a numeric literal.
+  ///
+  /// For all options, ``0`` leave the case unchanged, ``-1``
+  /// uses lower case and, ``1`` uses upper case.
+  ///
+  struct NumericLiteralCaseStyle {
+    /// Format numeric constant prefixes.
+    /// \code{.text}
+    ///   /* -1: lower case */ b = 0x01;
+    ///   /*  0: don't care */
+    ///   /*  1: upper case */ b = 0X01;
+    /// \endcode
+    int8_t PrefixCase;
+    /// Format hexadecimal digit case.
+    /// \code{.text}
+    ///   /* -1: lower case */ b = 0xabcdef;
+    ///   /*  0: don't care */
+    ///   /*  1: upper case */ b = 0xABCDEF;
+    /// \endcode
+    int8_t HexDigitCase;
+    /// Format exponent separator character case in floating point literals.
+    /// \code{.text}
+    ///   /* -1: lower case */ b = 6.02e23;
+    ///   /*  0: don't care */
+    ///   /*  1: upper case */ b = 6.02E23;
+    /// \endcode
+    int8_t FloatExponentSeparatorCase;
+    /// Format suffix case. This option excludes case-specific reserved
+    /// suffixes, such as ``min`` in C++.
+    /// \code{.text}
+    ///   /* -1: lower case */ b = 10u;
+    ///   /*  0: don't care */
+    ///   /*  1: upper case */ b = 10U;
+    /// \endcode
+    int8_t SuffixCase;
+
+    bool operator==(const NumericLiteralCaseStyle &R) const {
+      return PrefixCase == R.PrefixCase && HexDigitCase == R.HexDigitCase &&
+             FloatExponentSeparatorCase == R.FloatExponentSeparatorCase &&
+             SuffixCase == R.SuffixCase;
+    }
+  };
+
+  /// Format numeric literals for languages that support flexible character case
+  /// in numeric literal constants.
+  /// \version 22
+  NumericLiteralCaseStyle NumericLiteralCase;
+
   /// Separator format of integer literals of different bases.
   ///
   /// If negative, remove separators. If  ``0``, leave the literal as is. If
@@ -5424,6 +5472,7 @@ struct FormatStyle {
            IndentWrappedFunctionNames == R.IndentWrappedFunctionNames &&
            InsertBraces == R.InsertBraces &&
            InsertNewlineAtEOF == R.InsertNewlineAtEOF &&
+           NumericLiteralCase == R.NumericLiteralCase &&
            IntegerLiteralSeparator == R.IntegerLiteralSeparator &&
            JavaImportGroups == R.JavaImportGroups &&
            JavaScriptQuotes == R.JavaScriptQuotes &&
diff --git a/clang/lib/Format/CMakeLists.txt b/clang/lib/Format/CMakeLists.txt
index 9f4939824fdb8..a003f1a951af6 100644
--- a/clang/lib/Format/CMakeLists.txt
+++ b/clang/lib/Format/CMakeLists.txt
@@ -13,6 +13,7 @@ add_clang_library(clangFormat
   MacroExpander.cpp
   MatchFilePath.cpp
   NamespaceEndCommentsFixer.cpp
+  NumericLiteralCaseFixer.cpp
   ObjCPropertyAttributeOrderFixer.cpp
   QualifierAlignmentFixer.cpp
   SortJavaScriptImports.cpp
diff --git a/clang/lib/Format/Format.cpp b/clang/lib/Format/Format.cpp
index 063780721423f..711a3e7501328 100644
--- a/clang/lib/Format/Format.cpp
+++ b/clang/lib/Format/Format.cpp
@@ -16,6 +16,7 @@
 #include "DefinitionBlockSeparator.h"
 #include "IntegerLiteralSeparatorFixer.h"
 #include "NamespaceEndCommentsFixer.h"
+#include "NumericLiteralCaseFixer.h"
 #include "ObjCPropertyAttributeOrderFixer.h"
 #include "QualifierAlignmentFixer.h"
 #include "SortJavaScriptImports.h"
@@ -382,6 +383,16 @@ struct ScalarEnumerationTraits<FormatStyle::IndentExternBlockStyle> {
   }
 };
 
+template <> struct MappingTraits<FormatStyle::NumericLiteralCaseStyle> {
+  static void mapping(IO &IO, FormatStyle::NumericLiteralCaseStyle &Base) {
+    IO.mapOptional("PrefixCase", Base.PrefixCase);
+    IO.mapOptional("HexDigitCase", Base.HexDigitCase);
+    IO.mapOptional("FloatExponentSeparatorCase",
+                   Base.FloatExponentSeparatorCase);
+    IO.mapOptional("SuffixCase", Base.SuffixCase);
+  }
+};
+
 template <> struct MappingTraits<FormatStyle::IntegerLiteralSeparatorStyle> {
   static void mapping(IO &IO, FormatStyle::IntegerLiteralSeparatorStyle &Base) {
     IO.mapOptional("Binary", Base.Binary);
@@ -1093,6 +1104,7 @@ template <> struct MappingTraits<FormatStyle> {
     IO.mapOptional("InsertBraces", Style.InsertBraces);
     IO.mapOptional("InsertNewlineAtEOF", Style.InsertNewlineAtEOF);
     IO.mapOptional("InsertTrailingCommas", Style.InsertTrailingCommas);
+    IO.mapOptional("NumericLiteralCase", Style.NumericLiteralCase);
     IO.mapOptional("IntegerLiteralSeparator", Style.IntegerLiteralSeparator);
     IO.mapOptional("JavaImportGroups", Style.JavaImportGroups);
     IO.mapOptional("JavaScriptQuotes", Style.JavaScriptQuotes);
@@ -1618,6 +1630,9 @@ FormatStyle getLLVMStyle(FormatStyle::LanguageKind Language) {
   LLVMStyle.InsertBraces = false;
   LLVMStyle.InsertNewlineAtEOF = false;
   LLVMStyle.InsertTrailingCommas = FormatStyle::TCS_None;
+  LLVMStyle.NumericLiteralCase = {/*PrefixCase=*/0, /*HexDigitCase=*/0,
+                                  /*FloatExponentSeparatorCase=*/0,
+                                  /*SuffixCase=*/0};
   LLVMStyle.IntegerLiteralSeparator = {
       /*Binary=*/0,  /*BinaryMinDigits=*/0,
       /*Decimal=*/0, /*DecimalMinDigits=*/0,
@@ -3872,6 +3887,10 @@ reformat(const FormatStyle &Style, StringRef Code,
     return IntegerLiteralSeparatorFixer().process(Env, Expanded);
   });
 
+  Passes.emplace_back([&](const Environment &Env) {
+    return NumericLiteralCaseFixer().process(Env, Expanded);
+  });
+
   if (Style.isCpp()) {
     if (Style.QualifierAlignment != FormatStyle::QAS_Leave)
       addQualifierAlignmentFixerPasses(Expanded, Passes);
diff --git a/clang/lib/Format/NumericLiteralCaseFixer.cpp b/clang/lib/Format/NumericLiteralCaseFixer.cpp
new file mode 100644
index 0000000000000..88adaf83fe381
--- /dev/null
+++ b/clang/lib/Format/NumericLiteralCaseFixer.cpp
@@ -0,0 +1,368 @@
+//===--- NumericLiteralCaseFixer.cpp -----------------------*- C++ -*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+///
+/// \file
+/// This file implements NumericLiteralCaseFixer that standardizes character
+/// case within numeric literal constants.
+///
+//===----------------------------------------------------------------------===//
+
+#include "NumericLiteralCaseFixer.h"
+
+#include "llvm/ADT/StringExtras.h"
+
+#include <algorithm>
+
+namespace clang {
+namespace format {
+
+using CharTransformFn = char (*)(char C);
+namespace {
+
+/// @brief Collection of std::transform predicates for each part of a numeric
+/// literal
+struct FormatParameters {
+  FormatParameters(FormatStyle::LanguageKind Language,
+                   const FormatStyle::NumericLiteralCaseStyle &CaseStyle);
+
+  CharTransformFn Prefix;
+  CharTransformFn HexDigit;
+  CharTransformFn FloatExponentSeparator;
+  CharTransformFn Suffix;
+
+  char Separator;
+};
+
+/// @brief Parse a single numeric constant from text into ranges that are
+/// appropriate for applying NumericLiteralCaseStyle rules.
+class QuickNumericalConstantParser {
+public:
+  QuickNumericalConstantParser(const StringRef &IntegerLiteral,
+                               const FormatParameters &Transforms);
+
+  /// @brief Reformats the numeric constant if needed.
+  /// Calling this method invalidates the object's state.
+  /// @return std::nullopt if no reformatting is required. std::option<>
+  /// containing the reformatted string otherwise.
+  std::optional<std::string> formatIfNeeded() &&;
+
+private:
+  const StringRef &IntegerLiteral;
+  const FormatParameters &Transforms;
+
+  std::string Formatted;
+
+  std::string::iterator PrefixBegin;
+  std::string::iterator PrefixEnd;
+  std::string::iterator HexDigitBegin;
+  std::string::iterator HexDigitEnd;
+  std::string::iterator FloatExponentSeparatorBegin;
+  std::string::iterator FloatExponentSeparatorEnd;
+  std::string::iterator SuffixBegin;
+  std::string::iterator SuffixEnd;
+
+  void parse();
+  void applyFormatting();
+};
+
+} // namespace
+
+static char noOpTransform(char C) { return C; }
+
+static CharTransformFn getTransform(int8_t config_value) {
+  switch (config_value) {
+  case -1:
+    return llvm::toLower;
+  case 1:
+    return llvm::toUpper;
+  default:
+    return noOpTransform;
+  }
+}
+
+/// @brief Test if Suffix matches a C++ literal reserved by the library.
+/// Matches against all suffixes reserved in the C++23 standard
+static bool matchesReservedSuffix(StringRef Suffix) {
+  static const std::set<StringRef> ReservedSuffixes = {
+      "h", "min", "s", "ms", "us", "ns", "il", "i", "if", "d", "y",
+  };
+
+  return ReservedSuffixes.find(Suffix) != ReservedSuffixes.end();
+}
+
+FormatParameters::FormatParameters(
+    FormatStyle::LanguageKind Language,
+    const FormatStyle::NumericLiteralCaseStyle &CaseStyle)
+    : Prefix(getTransform(CaseStyle.PrefixCase)),
+      HexDigit(getTransform(CaseStyle.HexDigitCase)),
+      FloatExponentSeparator(
+          getTransform(CaseStyle.FloatExponentSeparatorCase)),
+      Suffix(getTransform(CaseStyle.SuffixCase)) {
+  switch (Language) {
+  case FormatStyle::LK_CSharp:
+  case FormatStyle::LK_Java:
+  case FormatStyle::LK_JavaScript:
+    Separator = '_';
+    break;
+  case FormatStyle::LK_C:
+  case FormatStyle::LK_Cpp:
+  case FormatStyle::LK_ObjC:
+  default:
+    Separator = '\'';
+  }
+}
+
+QuickNumericalConstantParser::QuickNumericalConstantParser(
+    const StringRef &IntegerLiteral, const FormatParameters &Transforms)
+    : IntegerLiteral(IntegerLiteral), Transforms(Transforms),
+      Formatted(IntegerLiteral), PrefixBegin(Formatted.begin()),
+      PrefixEnd(Formatted.begin()), HexDigitBegin(Formatted.begin()),
+      HexDigitEnd(Formatted.begin()),
+      FloatExponentSeparatorBegin(Formatted.begin()),
+      FloatExponentSeparatorEnd(Formatted.begin()),
+      SuffixBegin(Formatted.begin()), SuffixEnd(Formatted.begin()) {}
+
+void QuickNumericalConstantParser::parse() {
+  auto Cur = Formatted.begin();
+  auto End = Formatted.cend();
+
+  bool IsHex = false;
+  bool IsFloat = false;
+
+  // Find the range that contains the prefix.
+  PrefixBegin = Cur;
+  if (*Cur != '0') {
+  } else {
+    ++Cur;
+    const char C = *Cur;
+    switch (C) {
+    case 'x':
+    case 'X':
+      IsHex = true;
+      ++Cur;
+      break;
+    case 'b':
+    case 'B':
+      ++Cur;
+      break;
+    case 'o':
+    case 'O':
+      // Javascript uses 0o as octal prefix.
+      ++Cur;
+      break;
+    default:
+      break;
+    }
+  }
+  PrefixEnd = Cur;
+
+  // Find the range that contains hex digits.
+  HexDigitBegin = Cur;
+  if (IsHex) {
+    while (Cur != End) {
+      const char C = *Cur;
+      if (llvm::isHexDigit(C)) {
+      } else if (C == Transforms.Separator) {
+      } else if (C == '.') {
+        IsFloat = true;
+      } else {
+        break;
+      }
+      ++Cur;
+    }
+  }
+  HexDigitEnd = Cur;
+  if (Cur == End)
+    return;
+
+  // Find the range that contains a floating point exponent separator.
+  // Hex digits have already been scanned through the decimal point.
+  // Decimal/octal/binary literals must fast forward through the decimal first.
+  if (!IsHex) {
+    while (Cur != End) {
+      const char C = *Cur;
+      if (llvm::isDigit(C)) {
+      } else if (C == Transforms.Separator) {
+      } else if (C == '.') {
+        IsFloat = true;
+      } else {
+        break;
+      }
+      ++Cur;
+    }
+  }
+
+  const char LSep = IsHex ? 'p' : 'e';
+  const char USep = IsHex ? 'P' : 'E';
+  // The next character of a floating point literal will either be the
+  // separator, or the start of a suffix.
+  FloatExponentSeparatorBegin = Cur;
+  if (IsFloat) {
+    const char C = *Cur;
+    if ((C == LSep) || (C == USep))
+      ++Cur;
+  }
+  FloatExponentSeparatorEnd = Cur;
+  if (Cur == End)
+    return;
+
+  // Fast forward through the exponent part of a floating point literal.
+  if (!IsFloat) {
+  } else if (FloatExponentSeparatorBegin == FloatExponentSeparatorEnd) {
+  } else {
+    while (Cur != End) {
+      const char C = *Cur;
+      if (llvm::isDigit(C)) {
+      } else if (C == '+') {
+      } else if (C == '-') {
+      } else {
+        break;
+      }
+      ++Cur;
+    }
+  }
+  if (Cur == End)
+    return;
+
+  // Find the range containing a suffix if any.
+  SuffixBegin = Cur;
+  size_t const SuffixLen = End - Cur;
+  StringRef suffix(&(*SuffixBegin), SuffixLen);
+  if (!matchesReservedSuffix(suffix)) {
+    while (Cur != End) {
+      const char C = *Cur;
+      if (C == '_') {
+        // In C++, it is idiomatic, but NOT standard to define user-defined
+        // literals with a leading '_'. Omit user defined literals from
+        // transformation.
+        break;
+      } else {
+      }
+      ++Cur;
+    }
+  }
+  SuffixEnd = Cur;
+}
+
+void QuickNumericalConstantParser::applyFormatting() {
+
+  auto Start = Formatted.cbegin();
+  auto End = Formatted.cend();
+
+  assert((Start <= PrefixBegin) && (End >= PrefixBegin) &&
+         "PrefixBegin is out of bounds");
+  assert((Start <= PrefixEnd) && (End >= PrefixEnd) &&
+         "PrefixEnd is out of bounds");
+  assert((Start <= HexDigitBegin) && (End >= HexDigitBegin) &&
+         "HexDigitBegin is out of bounds");
+  assert((Start <= HexDigitEnd) && (End >= HexDigitEnd) &&
+         "HexDigitEnd is out of bounds");
+  assert((Start <= FloatExponentSeparatorBegin) &&
+         (End >= FloatExponentSeparatorBegin) &&
+         "FloatExponentSeparatorBegin is out of bounds");
+  assert((Start <= FloatExponentSeparatorEnd) &&
+         (End >= FloatExponentSeparatorEnd) &&
+         "FloatExponentSeparatorEnd is out of bounds");
+  assert((Start <= SuffixBegin) && (End >= SuffixBegin) &&
+         "SuffixBegin is out of bounds");
+  assert((Start <= SuffixEnd) && (End >= SuffixEnd) &&
+         "SuffixEnd is out of bounds");
+
+  std::transform(PrefixBegin, PrefixEnd, PrefixBegin, Transforms.Prefix);
+  std::transform(HexDigitBegin, HexDigitEnd, HexDigitBegin,
+                 Transforms.HexDigit);
+  std::transform(FloatExponentSeparatorBegin, FloatExponentSeparatorEnd,
+                 FloatExponentSeparatorBegin,
+                 Transforms.FloatExponentSeparator);
+  std::transform(SuffixBegin, SuffixEnd, SuffixBegin, Transforms.Suffix);
+}
+
+std::optional<std::string> QuickNumericalConstantParser::formatIfNeeded() && {
+  parse();
+  applyFormatting();
+
+  return (Formatted == IntegerLiteral)
+             ? std::nullopt
+             : std::make_optional<std::string>(std::move(Formatted));
+}
+
+std::pair<tooling::Replacements, unsigned>
+NumericLiteralCaseFixer::process(const Environment &Env,
+                                 const FormatStyle &Style) {
+  switch (Style.Language) {
+  case FormatStyle::LK_C:
+  case FormatStyle::LK_Cpp:
+  case FormatStyle::LK_ObjC:
+  case FormatStyle::LK_CSharp:
+  case FormatStyle::LK_Java:
+  case FormatStyle::LK_JavaScript:
+    break;
+  default:
+    return {};
+  }
+
+  const auto &CaseStyle = Style.NumericLiteralCase;
+
+  const FormatStyle::NumericLiteralCaseStyle no_case_style{};
+  const bool SkipCaseFormatting = CaseStyle == no_case_style;
+
+  if (SkipCaseFormatting)
+    return {};
+
+  const FormatParameters Transforms{Style.Language, CaseStyle};
+
+  const auto &SourceMgr = Env.getSourceManager();
+  AffectedRangeManager AffectedRangeMgr(SourceMgr, Env.getCharRanges());
+
+  const auto ID = Env.getFileID();
+  const auto LangOpts = getFormattingLangOpts(Style);
+  Lexer Lex(ID, SourceMgr.getBufferOrFake(ID), SourceMgr, LangOpts);
+  Lex.SetCommentRetentionState(true);
+
+  Token Tok;
+  tooling::Replacements Result;
+  bool Skip = false;
+
+  while (!Lex.LexFromRawLexer(Tok)) {
+    // Skip tokens that are too small to contain a formattable literal.
+    auto Length = Tok.getLength();
+    if (Length < 2)
+      continue;
+
+    // Service clang-format off/on comments.
+    auto Location = Tok.getLocation();
+    auto Text = StringRef(SourceMgr.getCharacterData(Location), Length);
+    if (Tok.is(tok::comment)) {
+      if (isClangFormatOff(Text))
+        Skip = true;
+      else if (isClangFormatOn(Text))
+        Skip = false;
+      continue;
+    }
+
+    if (Skip || Tok.isNot(tok::numeric_constant) ||
+        !AffectedRangeMgr.affectsCharSourceRange(
+            CharSourceRange::getCharRange(Location, Tok.getEndLoc()))) {
+      continue;
+    }
+
+    const auto Formatted =
+        QuickNumericalConstantParser(Text, Transforms).formatIfNeeded();
+    if (Formatted) {
+      assert(*Formatted != Text && "QuickNumericalConstantParser returned an "
+                                   "unchanged value instead of nullopt");
+      cantFail(Result.add(
+          tooling::Replacement(SourceMgr, Location, Length, *Formatted)));
+    }
+  }
+
+  return {Result, 0};
+}
+
+} // namespace format
+} // namespace clang
diff --git a/clang/lib/Format/NumericLiteralCaseFixer.h b/clang/lib/Format/NumericLiteralCaseFixer.h
new file mode 100644
index 0000000000000..265d7343c468b
--- /dev/null
+++ b/clang/lib/Format/NumericLiteralCaseFixer.h
@@ -0,0 +1,32 @@
+//===--- NumericLiteralCaseFixer.h -------------------------*- C++ -*-===//
+//
+// Part of the LLVM Projec...
[truncated]

JustinStitt

I left a few comments, most of them nits or typos. This isn't a code area I've reviewed much hence the minimal code comments.

Looks great though!

clang/docs/ClangFormatStyleOptions.rst

clang/docs/ReleaseNotes.rst

clang/include/clang/Format/Format.h

clang/lib/Format/NumericLiteralCaseFixer.cpp

JustinStitt · 2025-07-31T21:37:41Z

Looking at the CI it seems the clang/test/Format/docs_updated.test test failed. This may be due to incorrect formatting of your style option in ClangFormatStyleOptions.rst but I am not actually sure. Check it out, though.

clang/docs/ClangFormatStyleOptions.rst

HazardyKnusperkeks

A bit to do, but I like the proposed feature.

clang/include/clang/Format/Format.h

clang/lib/Format/Format.cpp

clang/lib/Format/NumericLiteralCaseFixer.cpp

HazardyKnusperkeks · 2025-07-31T21:48:48Z

Looking at the CI it seems the clang/test/Format/docs_updated.test test failed. This may be due to incorrect formatting of your style option in ClangFormatStyleOptions.rst but I am not actually sure. Check it out, though.

Most likely the docs were not generated with the script, but manually, otherwise it wouldn't mismatch with version 21 vs 22.

30Wedge · 2025-08-02T22:50:18Z

Most likely the docs were not generated with the script, but manually

Yup! I manually edited ClangFormatStyleOptions.rst in the first commit. Much easier to have a script do it for you 🙂

In the fixup, this file is regenerated with: python clang/docs/tools/dump_format_style.py

JustinStitt

Again, can't give any comprehensive code critiques but you addressed my nits and such so LGTM.

github-actions · 2025-08-03T08:35:02Z

✅ With the latest revision this PR passed the C/C++ code formatter.

owenca · 2025-08-03T08:37:42Z

Please wait for @HazardyKnusperkeks and @mydeveloperday.

HazardyKnusperkeks

I'm not finished reviewing the changes, but must leave now. ;)

clang/include/clang/Format/Format.h

clang/lib/Format/NumericLiteralCaseFixer.cpp

owenca · 2025-08-03T23:48:29Z

I suggest NumericLiteralCase, Prefix, HexDigit, ExponentLetter, and Suffix for the option names and Leave, Lower, and Upper for the enum values. For example:

NumericLiteralCase:
  Prefix:         Lower # 0x, 0b, etc.
  HexDigit:       Upper # ABCDEF
  ExponentLetter: Lower # e and p
  Suffix:         Lower # ull, bf16, etc.

clang/lib/Format/NumericLiteralCaseFixer.cpp

clang/include/clang/Format/Format.h

clang/lib/Format/NumericLiteralCaseFixer.cpp

HazardyKnusperkeks · 2025-08-04T09:35:42Z

I suggest NumericLiteralCase, Prefix, HexDigit, ExponentLetter, and Suffix for the option names and Leave, Lower, and Upper for the enum values. For example:
NumericLiteralCase:
  Prefix:         Lower # 0x, 0b, etc.
  HexDigit:       Upper # ABCDEF
  ExponentLetter: Lower # e and p
  Suffix:         Lower # ull, bf16, etc.

That is way better.

30Wedge · 2025-08-04T20:15:52Z

I suggest NumericLiteralCase, Prefix, HexDigit, ExponentLetter, and Suffix for the option names and Leave, Lower, and Upper for the enum values.

I agree. So much more concise.

owenca

It seems that we don't need to add a separate formatting pass for this new option as changing the case of letters in numeric literals has no impact on any existing passes. IMO, the best place to handle this is in FormatTokenLexer::getNextToken(). For example:

--- a/clang/lib/Format/FormatTokenLexer.cpp
+++ b/clang/lib/Format/FormatTokenLexer.cpp
@@ -1313,6 +1313,9 @@ FormatToken *FormatTokenLexer::getNextToken() {
     }
     WhitespaceLength += Text.size();
     readRawToken(*FormatTok);
+    if (FormatTok->Finalized || FormatTok->isNot(tok::numeric_constant))
+      continue;
+    // Handle Style.NumericLiteralCase here.
   }
 
   if (FormatTok->is(tok::unknown))

30Wedge · 2025-08-05T19:39:45Z

IMO, the best place to handle this is in FormatTokenLexer::getNextToken().

I see you have much more experience in this part of the codebase, but I have some hangups because I don't understand the implications of doing this move myself. Here's what I'm thinking; are these valid concerns?

The FormatTokenLexer class seems like it does the work of parsing a file into tokens for use downstream by all of the other formatters. What are the implications of running NumericLiteralCaseFixer or any other reformatting at this stage for all consumers? Do consumers of lexed FormatTokens assume they are receiving a faithful representation of the underlying file? Are there architecture issues regarding separation of concerns that come up if we do formatting directly in the lexing stage?

Separately, I think that all classes that subclass TokenAnalyzer and use TokenAnalyzer::process() would wind up calling FormatTokenLexer::lex() -- which would end up running NumericLiteralCaseFixer reformatting redundantly for each other pass. Please correct me if I'm reading this wrong.
I could see how that is still lower overhead than adding a totally separate pass, but it still seems odd to run the same reformatting function many separate times in unrelated passes.
Maybe it would be better to add NumericLiteralCaseFixer into some other existing pass instead?

owenca · 2025-08-06T01:52:07Z

IMO, the best place to handle this is in FormatTokenLexer::getNextToken().

I see you have much more experience in this part of the codebase, but I have some hangups because I don't understand the implications of doing this move myself. Here's what I'm thinking; are these valid concerns?

The FormatTokenLexer class seems like it does the work of parsing a file into tokens for use downstream by all of the other formatters. What are the implications of running NumericLiteralCaseFixer or any other reformatting at this stage for all consumers? Do consumers of lexed FormatTokens assume they are receiving a faithful representation of the underlying file? Are there architecture issues regarding separation of concerns that come up if we do formatting directly in the lexing stage?

Separately, I think that all classes that subclass TokenAnalyzer and use TokenAnalyzer::process() would wind up calling FormatTokenLexer::lex() -- which would end up running NumericLiteralCaseFixer reformatting redundantly for each other pass. Please correct me if I'm reading this wrong. I could see how that is still lower overhead than adding a totally separate pass, but it still seems odd to run the same reformatting function many separate times in unrelated passes. Maybe it would be better to add NumericLiteralCaseFixer into some other existing pass instead?

You are absolutely right! I was wrong about handling NumericLiteralCase in FormatTokenLexer and totally agree with you that it'd be better to do that in an existing pass.

30Wedge · 2025-08-07T15:13:14Z

Cool, thanks for hearing me out! I am working on handling NumericLiteralCase in the same pass as IntegerLiteralSeparatorFixer; that seemed natural since they both only modify single numeric_constant tokens. I won't have time to get to it until next week.

clang/lib/Format/NumericLiteralCaseFixer.h

clang/lib/Format/NumericLiteralCaseFixer.cpp

clang/lib/Format/NumericLiteralCaseFixer.h

clang/include/clang/Format/Format.h

clang/lib/Format/CMakeLists.txt

clang/lib/Format/Format.cpp

clang/include/clang/Format/Format.h

clang/lib/Format/NumericLiteralCaseFixer.cpp

clang/unittests/Format/NumericLiteralCaseTest.cpp

mydeveloperday

nice!

owenca

Other compiler warnings:

clang/unittests/Format/NumericLiteralCaseTest.cpp:65:23: warning: unused variable 'K' [-Wunused-variable]
   65 |   constexpr StringRef K{"k = 0x0;"};
      |                       ^
clang/unittests/Format/NumericLiteralCaseTest.cpp:66:23: warning: unused variable 'L' [-Wunused-variable]
   66 |   constexpr StringRef L{"l = 0xA;"};
      |                       ^

clang/unittests/Format/NumericLiteralCaseTest.cpp

clang/lib/Format/NumericLiteralCaseFixer.cpp

clang/include/clang/Format/Format.h

owenca

You need to do e.g. ninja clean if you missed compiler warnings.

clang/unittests/Format/NumericLiteralCaseTest.cpp

clang/lib/Format/NumericLiteralCaseFixer.cpp

clang/unittests/Format/NumericLiteralCaseTest.cpp

clang/lib/Format/NumericLiteralCaseFixer.cpp

clang/include/clang/Format/Format.h

clang/unittests/Format/NumericLiteralCaseTest.cpp

owenca

LGTM except a few missed nits.

clang/unittests/Format/NumericLiteralCaseTest.cpp

30Wedge · 2025-09-10T13:52:20Z

Addressed missed nits #151590 (comment), squashed commits and rebased onto main. Thank you all for the review comments.

clang/lib/Format/NumericLiteralCaseFixer.cpp

github-actions · 2025-09-12T08:17:59Z

@30Wedge Congratulations on having your first Pull Request (PR) merged into the LLVM Project!

Your changes will be combined with recent changes from other authors, then tested by our build bots. If there is a problem with a build, you may receive a report in an email or a comment on this PR.

Please check whether problems have been caused by your change specifically, as the builds can include changes from many authors. It is not uncommon for your change to be included in a build that fails due to someone else's changes, or infrastructure issues.

How to do this, and the rest of the post-merge process, is covered in detail here.

If your change does cause a problem, it may be reverted, or you can revert it yourself. This is a normal part of LLVM development. You can fix your changes and open a new PR to merge them again.

If you don't get any reports, no action is required from you. Your changes are working as expected, well done!

llvmbot added clang Clang issues not falling into any other category clang-format labels Jul 31, 2025

JustinStitt reviewed Jul 31, 2025

View reviewed changes

clang/docs/ClangFormatStyleOptions.rst Outdated Show resolved Hide resolved

JustinStitt reviewed Jul 31, 2025

View reviewed changes

clang/docs/ClangFormatStyleOptions.rst Outdated Show resolved Hide resolved

HazardyKnusperkeks reviewed Jul 31, 2025

View reviewed changes

JustinStitt reviewed Jul 31, 2025

View reviewed changes

clang/lib/Format/NumericLiteralCaseFixer.cpp Outdated Show resolved Hide resolved

30Wedge requested review from HazardyKnusperkeks and JustinStitt August 2, 2025 22:59

JustinStitt approved these changes Aug 3, 2025

View reviewed changes

owenca requested a review from mydeveloperday August 3, 2025 08:38

HazardyKnusperkeks reviewed Aug 3, 2025

View reviewed changes

HazardyKnusperkeks reviewed Aug 4, 2025

View reviewed changes

30Wedge requested review from HazardyKnusperkeks and owenca August 4, 2025 20:16

owenca requested changes Aug 5, 2025

View reviewed changes

30Wedge requested a review from HazardyKnusperkeks August 21, 2025 16:42

owenca reviewed Aug 22, 2025

View reviewed changes

30Wedge requested a review from owenca August 26, 2025 00:23

owenca reviewed Aug 26, 2025

View reviewed changes

clang/lib/Format/NumericLiteralCaseFixer.cpp Outdated Show resolved Hide resolved

clang/unittests/Format/NumericLiteralCaseTest.cpp Outdated Show resolved Hide resolved

mydeveloperday reviewed Aug 26, 2025

View reviewed changes

30Wedge requested a review from owenca August 27, 2025 02:12

HazardyKnusperkeks approved these changes Aug 28, 2025

View reviewed changes

owenca reviewed Aug 29, 2025

View reviewed changes

clang/include/clang/Format/Format.h Outdated Show resolved Hide resolved

owenca requested changes Aug 29, 2025

View reviewed changes

owenca reviewed Aug 29, 2025

View reviewed changes

clang/unittests/Format/NumericLiteralCaseTest.cpp Outdated Show resolved Hide resolved

clang/unittests/Format/NumericLiteralCaseTest.cpp Outdated Show resolved Hide resolved

owenca reviewed Aug 30, 2025

View reviewed changes

clang/lib/Format/NumericLiteralCaseFixer.cpp Outdated Show resolved Hide resolved

owenca reviewed Aug 30, 2025

View reviewed changes

owenca reviewed Sep 9, 2025

View reviewed changes

clang/lib/Format/NumericLiteralCaseFixer.cpp Outdated Show resolved Hide resolved

clang/include/clang/Format/Format.h Outdated Show resolved Hide resolved

clang/unittests/Format/NumericLiteralCaseTest.cpp Outdated Show resolved Hide resolved

owenca reviewed Sep 9, 2025

View reviewed changes

clang/unittests/Format/NumericLiteralCaseTest.cpp Outdated Show resolved Hide resolved

clang/unittests/Format/NumericLiteralCaseTest.cpp Outdated Show resolved Hide resolved

30Wedge requested a review from owenca September 9, 2025 20:40

owenca approved these changes Sep 10, 2025

View reviewed changes

clang/unittests/Format/NumericLiteralCaseTest.cpp Outdated Show resolved Hide resolved

clang/unittests/Format/NumericLiteralCaseTest.cpp Outdated Show resolved Hide resolved

clang/unittests/Format/NumericLiteralCaseTest.cpp Outdated Show resolved Hide resolved

owenca changed the title ~~[clang-format] Add an option to format integer and float literal case~~ [clang-format] Add an option to format numeric literal case Sep 10, 2025

[clang-format] Add an option to format numeric literal case

ad46ad8

30Wedge force-pushed the format-integer-case branch from 9d1fd48 to ad46ad8 Compare September 10, 2025 13:50

owenca removed the clang Clang issues not falling into any other category label Sep 11, 2025

owenca reviewed Sep 12, 2025

View reviewed changes

clang/lib/Format/NumericLiteralCaseFixer.cpp Outdated Show resolved Hide resolved

Update clang/lib/Format/NumericLiteralCaseFixer.cpp

7803d3b

llvmbot added the clang Clang issues not falling into any other category label Sep 12, 2025

owenca removed the clang Clang issues not falling into any other category label Sep 12, 2025

owenca merged commit 220d705 into llvm:main Sep 12, 2025
9 of 11 checks passed

[clang-format] Add an option to format numeric literal case #151590

[clang-format] Add an option to format numeric literal case #151590

Uh oh!

Conversation

30Wedge commented Jul 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Jul 31, 2025

Uh oh!

llvmbot commented Jul 31, 2025

Uh oh!

llvmbot commented Jul 31, 2025

Uh oh!

JustinStitt left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

JustinStitt commented Jul 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

HazardyKnusperkeks left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

HazardyKnusperkeks commented Jul 31, 2025

Uh oh!

30Wedge commented Aug 2, 2025

Uh oh!

JustinStitt left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Aug 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

owenca commented Aug 3, 2025

Uh oh!

HazardyKnusperkeks left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

owenca commented Aug 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

HazardyKnusperkeks commented Aug 4, 2025

Uh oh!

30Wedge commented Aug 4, 2025

Uh oh!

owenca left a comment

Choose a reason for hiding this comment

30Wedge commented Jul 31, 2025 •

edited

Loading

JustinStitt left a comment •

edited

Loading

JustinStitt commented Jul 31, 2025 •

edited

Loading

github-actions bot commented Aug 3, 2025 •

edited

Loading

owenca commented Aug 3, 2025 •

edited

Loading

owenca left a comment •

edited

Loading