Skip to content

Conversation

@30Wedge
Copy link
Contributor

@30Wedge 30Wedge commented Jul 31, 2025

Some languages have the flexibility to use upper or lower case characters interchangeably in integer and float literal definitions.

I'd like to be able to enforce a consistent case style in one of my projects, so I added this clang-format style option to control it.

With this .clang-format configuration:

    NumericLiteralCaseStyle:
      UpperCasePrefix: Never
      UpperCaseHexDigit: Always
      UpperCaseSuffix: Never

This line of code:

    unsigned long long  0XdEaDbEeFUll;

gets reformatted into this line of code:

    unsigned long long 0xDEAFBEEFull;

I'm new to this project, so please let me know if I missed something in the process. I modeled this PR from IntegerLiteralSeparatorFixer

@github-actions
Copy link

Thank you for submitting a Pull Request (PR) to the LLVM Project!

This PR will be automatically labeled and the relevant teams will be notified.

If you wish to, you can add reviewers by using the "Reviewers" section on this page.

If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using @ followed by their GitHub username.

If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers.

If you have further questions, they may be answered by the LLVM GitHub User Guide.

You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums.

@llvmbot llvmbot added clang Clang issues not falling into any other category clang-format labels Jul 31, 2025
@llvmbot
Copy link
Member

llvmbot commented Jul 31, 2025

@llvm/pr-subscribers-clang

Author: Andy MacGregor (30Wedge)

Changes

Some languages have the flexibility to use upper or lower case characters interchangeably in integer and float literal definitions.

I'd like to be able to enforce a consistent case style in one of my projects, so I added this clang-format style option to control it.

With this .clang-format configuration:

    NumericLiteralCaseStyle:
      PrefixCase: -1
      HexDigitCase: 1
      SuffixCase: -1

This line of code:

    unsigned long long  0XdEaDbEeFUll;

gets reformatted into this line of code:

    unsigned long long 0xDEAFBEEFull;

I'm new to this project, so please let me know if I missed something in the process. I modeled this PR from IntegerLiteralSeparatorFixer


Patch is 34.32 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/151590.diff

9 Files Affected:

  • (modified) clang/docs/ClangFormatStyleOptions.rst (+72-1)
  • (modified) clang/docs/ReleaseNotes.rst (+2)
  • (modified) clang/include/clang/Format/Format.h (+49)
  • (modified) clang/lib/Format/CMakeLists.txt (+1)
  • (modified) clang/lib/Format/Format.cpp (+19)
  • (added) clang/lib/Format/NumericLiteralCaseFixer.cpp (+368)
  • (added) clang/lib/Format/NumericLiteralCaseFixer.h (+32)
  • (modified) clang/unittests/Format/CMakeLists.txt (+1)
  • (added) clang/unittests/Format/NumericLiteralCaseTest.cpp (+354)
diff --git a/clang/docs/ClangFormatStyleOptions.rst b/clang/docs/ClangFormatStyleOptions.rst
index 02986a94a656c..abc73b0ae183c 100644
--- a/clang/docs/ClangFormatStyleOptions.rst
+++ b/clang/docs/ClangFormatStyleOptions.rst
@@ -4555,7 +4555,6 @@ the configuration (without a prefix: ``Auto``).
     So inserting a trailing comma counteracts bin-packing.
 
 
-
 .. _IntegerLiteralSeparator:
 
 **IntegerLiteralSeparator** (``IntegerLiteralSeparatorStyle``) :versionbadge:`clang-format 16` :ref:`¶ <IntegerLiteralSeparator>`
@@ -5076,6 +5075,78 @@ the configuration (without a prefix: ``Auto``).
 
   For example: TESTSUITE
 
+.. _NumericLiteralCase:
+
+**NumericLiteralCase** (``NumericLiteralCaseStyle``) :versionbadge:`clang-format 21` :ref:`¶ <NumericLiteralCase>`
+  Controls character case in numeric literals.
+
+  Possible values for each nexted configuration flag:
+
+  * ``0`` (Default) Do not modify characters.
+
+  * ``-1`` Convert characters to lower case.
+
+  * ``1`` Convert characters to upper case.
+
+  .. code-block:: yaml
+
+    # Example of usage:
+    NumericLiteralCaseStyle:
+      PrefixCase: -1
+      HexDigitCase: 1
+      FloatExponentSeparatorCase: 0
+      SuffixCase: -1
+
+  .. code-block:: c++
+
+    // Lower case prefix, upper case hexadecimal digits, lower case suffix
+    unsigned int 0xDEAFBEEFull;
+
+  Nested configuration flags:
+
+  * ``int PrefixCase`` Control numeric constant prefix case.
+
+   .. code-block:: c++
+
+      // PrefixCase: 1
+      int a = 0B101 | 0XF0;
+      // PrefixCase: -1
+      int a = 0b101 | 0xF0;
+      // PrefixCase: 0
+      int c = 0b101 | 0XF0;
+
+  * ``int HexDigitCase`` Control hexadecimal digit case.
+
+    .. code-block:: c++
+
+      // HexDigitCase: 1
+      int a = 0xBEAD;
+      // PrefixCase: -1
+      int b = 0xbead;
+      // PrefixCase: 0
+      int c = 0xBeAd;
+
+  * ``int FloatExponentSeparatorCase`` Control exponent separator case.
+
+    .. code-block:: c++
+
+      // FloatExponentSeparatorCase: 1
+      float a = 6.02E+23;
+      // FloatExponentSeparatorCase: -1
+      float b = 6.02e+23;
+
+  * ``int SuffixCase`` Control suffix case.
+
+    .. code-block:: c++
+
+      // SuffixCase: 1
+      unsigned long long a = 1ULL;
+      // SuffixCase: -1
+      unsigned long long a = 1ull;
+      // SuffixCase: 0
+      unsigned long long c = 1uLL;
+
+
 .. _ObjCBinPackProtocolList:
 
 **ObjCBinPackProtocolList** (``BinPackStyle``) :versionbadge:`clang-format 7` :ref:`¶ <ObjCBinPackProtocolList>`
diff --git a/clang/docs/ReleaseNotes.rst b/clang/docs/ReleaseNotes.rst
index 4a2edae7509de..f45363f86c135 100644
--- a/clang/docs/ReleaseNotes.rst
+++ b/clang/docs/ReleaseNotes.rst
@@ -230,6 +230,8 @@ AST Matchers
 
 clang-format
 ------------
+- Add ``NumericLiteralCase`` option for for enforcing character case in
+  numeric literals.
 
 libclang
 --------
diff --git a/clang/include/clang/Format/Format.h b/clang/include/clang/Format/Format.h
index 31582a40de866..301db5012b980 100644
--- a/clang/include/clang/Format/Format.h
+++ b/clang/include/clang/Format/Format.h
@@ -3100,6 +3100,54 @@ struct FormatStyle {
   /// \version 11
   TrailingCommaStyle InsertTrailingCommas;
 
+  /// Character case format for different components of a numeric literal.
+  ///
+  /// For all options, ``0`` leave the case unchanged, ``-1``
+  /// uses lower case and, ``1`` uses upper case.
+  ///
+  struct NumericLiteralCaseStyle {
+    /// Format numeric constant prefixes.
+    /// \code{.text}
+    ///   /* -1: lower case */ b = 0x01;
+    ///   /*  0: don't care */
+    ///   /*  1: upper case */ b = 0X01;
+    /// \endcode
+    int8_t PrefixCase;
+    /// Format hexadecimal digit case.
+    /// \code{.text}
+    ///   /* -1: lower case */ b = 0xabcdef;
+    ///   /*  0: don't care */
+    ///   /*  1: upper case */ b = 0xABCDEF;
+    /// \endcode
+    int8_t HexDigitCase;
+    /// Format exponent separator character case in floating point literals.
+    /// \code{.text}
+    ///   /* -1: lower case */ b = 6.02e23;
+    ///   /*  0: don't care */
+    ///   /*  1: upper case */ b = 6.02E23;
+    /// \endcode
+    int8_t FloatExponentSeparatorCase;
+    /// Format suffix case. This option excludes case-specific reserved
+    /// suffixes, such as ``min`` in C++.
+    /// \code{.text}
+    ///   /* -1: lower case */ b = 10u;
+    ///   /*  0: don't care */
+    ///   /*  1: upper case */ b = 10U;
+    /// \endcode
+    int8_t SuffixCase;
+
+    bool operator==(const NumericLiteralCaseStyle &R) const {
+      return PrefixCase == R.PrefixCase && HexDigitCase == R.HexDigitCase &&
+             FloatExponentSeparatorCase == R.FloatExponentSeparatorCase &&
+             SuffixCase == R.SuffixCase;
+    }
+  };
+
+  /// Format numeric literals for languages that support flexible character case
+  /// in numeric literal constants.
+  /// \version 22
+  NumericLiteralCaseStyle NumericLiteralCase;
+
   /// Separator format of integer literals of different bases.
   ///
   /// If negative, remove separators. If  ``0``, leave the literal as is. If
@@ -5424,6 +5472,7 @@ struct FormatStyle {
            IndentWrappedFunctionNames == R.IndentWrappedFunctionNames &&
            InsertBraces == R.InsertBraces &&
            InsertNewlineAtEOF == R.InsertNewlineAtEOF &&
+           NumericLiteralCase == R.NumericLiteralCase &&
            IntegerLiteralSeparator == R.IntegerLiteralSeparator &&
            JavaImportGroups == R.JavaImportGroups &&
            JavaScriptQuotes == R.JavaScriptQuotes &&
diff --git a/clang/lib/Format/CMakeLists.txt b/clang/lib/Format/CMakeLists.txt
index 9f4939824fdb8..a003f1a951af6 100644
--- a/clang/lib/Format/CMakeLists.txt
+++ b/clang/lib/Format/CMakeLists.txt
@@ -13,6 +13,7 @@ add_clang_library(clangFormat
   MacroExpander.cpp
   MatchFilePath.cpp
   NamespaceEndCommentsFixer.cpp
+  NumericLiteralCaseFixer.cpp
   ObjCPropertyAttributeOrderFixer.cpp
   QualifierAlignmentFixer.cpp
   SortJavaScriptImports.cpp
diff --git a/clang/lib/Format/Format.cpp b/clang/lib/Format/Format.cpp
index 063780721423f..711a3e7501328 100644
--- a/clang/lib/Format/Format.cpp
+++ b/clang/lib/Format/Format.cpp
@@ -16,6 +16,7 @@
 #include "DefinitionBlockSeparator.h"
 #include "IntegerLiteralSeparatorFixer.h"
 #include "NamespaceEndCommentsFixer.h"
+#include "NumericLiteralCaseFixer.h"
 #include "ObjCPropertyAttributeOrderFixer.h"
 #include "QualifierAlignmentFixer.h"
 #include "SortJavaScriptImports.h"
@@ -382,6 +383,16 @@ struct ScalarEnumerationTraits<FormatStyle::IndentExternBlockStyle> {
   }
 };
 
+template <> struct MappingTraits<FormatStyle::NumericLiteralCaseStyle> {
+  static void mapping(IO &IO, FormatStyle::NumericLiteralCaseStyle &Base) {
+    IO.mapOptional("PrefixCase", Base.PrefixCase);
+    IO.mapOptional("HexDigitCase", Base.HexDigitCase);
+    IO.mapOptional("FloatExponentSeparatorCase",
+                   Base.FloatExponentSeparatorCase);
+    IO.mapOptional("SuffixCase", Base.SuffixCase);
+  }
+};
+
 template <> struct MappingTraits<FormatStyle::IntegerLiteralSeparatorStyle> {
   static void mapping(IO &IO, FormatStyle::IntegerLiteralSeparatorStyle &Base) {
     IO.mapOptional("Binary", Base.Binary);
@@ -1093,6 +1104,7 @@ template <> struct MappingTraits<FormatStyle> {
     IO.mapOptional("InsertBraces", Style.InsertBraces);
     IO.mapOptional("InsertNewlineAtEOF", Style.InsertNewlineAtEOF);
     IO.mapOptional("InsertTrailingCommas", Style.InsertTrailingCommas);
+    IO.mapOptional("NumericLiteralCase", Style.NumericLiteralCase);
     IO.mapOptional("IntegerLiteralSeparator", Style.IntegerLiteralSeparator);
     IO.mapOptional("JavaImportGroups", Style.JavaImportGroups);
     IO.mapOptional("JavaScriptQuotes", Style.JavaScriptQuotes);
@@ -1618,6 +1630,9 @@ FormatStyle getLLVMStyle(FormatStyle::LanguageKind Language) {
   LLVMStyle.InsertBraces = false;
   LLVMStyle.InsertNewlineAtEOF = false;
   LLVMStyle.InsertTrailingCommas = FormatStyle::TCS_None;
+  LLVMStyle.NumericLiteralCase = {/*PrefixCase=*/0, /*HexDigitCase=*/0,
+                                  /*FloatExponentSeparatorCase=*/0,
+                                  /*SuffixCase=*/0};
   LLVMStyle.IntegerLiteralSeparator = {
       /*Binary=*/0,  /*BinaryMinDigits=*/0,
       /*Decimal=*/0, /*DecimalMinDigits=*/0,
@@ -3872,6 +3887,10 @@ reformat(const FormatStyle &Style, StringRef Code,
     return IntegerLiteralSeparatorFixer().process(Env, Expanded);
   });
 
+  Passes.emplace_back([&](const Environment &Env) {
+    return NumericLiteralCaseFixer().process(Env, Expanded);
+  });
+
   if (Style.isCpp()) {
     if (Style.QualifierAlignment != FormatStyle::QAS_Leave)
       addQualifierAlignmentFixerPasses(Expanded, Passes);
diff --git a/clang/lib/Format/NumericLiteralCaseFixer.cpp b/clang/lib/Format/NumericLiteralCaseFixer.cpp
new file mode 100644
index 0000000000000..88adaf83fe381
--- /dev/null
+++ b/clang/lib/Format/NumericLiteralCaseFixer.cpp
@@ -0,0 +1,368 @@
+//===--- NumericLiteralCaseFixer.cpp -----------------------*- C++ -*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+///
+/// \file
+/// This file implements NumericLiteralCaseFixer that standardizes character
+/// case within numeric literal constants.
+///
+//===----------------------------------------------------------------------===//
+
+#include "NumericLiteralCaseFixer.h"
+
+#include "llvm/ADT/StringExtras.h"
+
+#include <algorithm>
+
+namespace clang {
+namespace format {
+
+using CharTransformFn = char (*)(char C);
+namespace {
+
+/// @brief Collection of std::transform predicates for each part of a numeric
+/// literal
+struct FormatParameters {
+  FormatParameters(FormatStyle::LanguageKind Language,
+                   const FormatStyle::NumericLiteralCaseStyle &CaseStyle);
+
+  CharTransformFn Prefix;
+  CharTransformFn HexDigit;
+  CharTransformFn FloatExponentSeparator;
+  CharTransformFn Suffix;
+
+  char Separator;
+};
+
+/// @brief Parse a single numeric constant from text into ranges that are
+/// appropriate for applying NumericLiteralCaseStyle rules.
+class QuickNumericalConstantParser {
+public:
+  QuickNumericalConstantParser(const StringRef &IntegerLiteral,
+                               const FormatParameters &Transforms);
+
+  /// @brief Reformats the numeric constant if needed.
+  /// Calling this method invalidates the object's state.
+  /// @return std::nullopt if no reformatting is required. std::option<>
+  /// containing the reformatted string otherwise.
+  std::optional<std::string> formatIfNeeded() &&;
+
+private:
+  const StringRef &IntegerLiteral;
+  const FormatParameters &Transforms;
+
+  std::string Formatted;
+
+  std::string::iterator PrefixBegin;
+  std::string::iterator PrefixEnd;
+  std::string::iterator HexDigitBegin;
+  std::string::iterator HexDigitEnd;
+  std::string::iterator FloatExponentSeparatorBegin;
+  std::string::iterator FloatExponentSeparatorEnd;
+  std::string::iterator SuffixBegin;
+  std::string::iterator SuffixEnd;
+
+  void parse();
+  void applyFormatting();
+};
+
+} // namespace
+
+static char noOpTransform(char C) { return C; }
+
+static CharTransformFn getTransform(int8_t config_value) {
+  switch (config_value) {
+  case -1:
+    return llvm::toLower;
+  case 1:
+    return llvm::toUpper;
+  default:
+    return noOpTransform;
+  }
+}
+
+/// @brief Test if Suffix matches a C++ literal reserved by the library.
+/// Matches against all suffixes reserved in the C++23 standard
+static bool matchesReservedSuffix(StringRef Suffix) {
+  static const std::set<StringRef> ReservedSuffixes = {
+      "h", "min", "s", "ms", "us", "ns", "il", "i", "if", "d", "y",
+  };
+
+  return ReservedSuffixes.find(Suffix) != ReservedSuffixes.end();
+}
+
+FormatParameters::FormatParameters(
+    FormatStyle::LanguageKind Language,
+    const FormatStyle::NumericLiteralCaseStyle &CaseStyle)
+    : Prefix(getTransform(CaseStyle.PrefixCase)),
+      HexDigit(getTransform(CaseStyle.HexDigitCase)),
+      FloatExponentSeparator(
+          getTransform(CaseStyle.FloatExponentSeparatorCase)),
+      Suffix(getTransform(CaseStyle.SuffixCase)) {
+  switch (Language) {
+  case FormatStyle::LK_CSharp:
+  case FormatStyle::LK_Java:
+  case FormatStyle::LK_JavaScript:
+    Separator = '_';
+    break;
+  case FormatStyle::LK_C:
+  case FormatStyle::LK_Cpp:
+  case FormatStyle::LK_ObjC:
+  default:
+    Separator = '\'';
+  }
+}
+
+QuickNumericalConstantParser::QuickNumericalConstantParser(
+    const StringRef &IntegerLiteral, const FormatParameters &Transforms)
+    : IntegerLiteral(IntegerLiteral), Transforms(Transforms),
+      Formatted(IntegerLiteral), PrefixBegin(Formatted.begin()),
+      PrefixEnd(Formatted.begin()), HexDigitBegin(Formatted.begin()),
+      HexDigitEnd(Formatted.begin()),
+      FloatExponentSeparatorBegin(Formatted.begin()),
+      FloatExponentSeparatorEnd(Formatted.begin()),
+      SuffixBegin(Formatted.begin()), SuffixEnd(Formatted.begin()) {}
+
+void QuickNumericalConstantParser::parse() {
+  auto Cur = Formatted.begin();
+  auto End = Formatted.cend();
+
+  bool IsHex = false;
+  bool IsFloat = false;
+
+  // Find the range that contains the prefix.
+  PrefixBegin = Cur;
+  if (*Cur != '0') {
+  } else {
+    ++Cur;
+    const char C = *Cur;
+    switch (C) {
+    case 'x':
+    case 'X':
+      IsHex = true;
+      ++Cur;
+      break;
+    case 'b':
+    case 'B':
+      ++Cur;
+      break;
+    case 'o':
+    case 'O':
+      // Javascript uses 0o as octal prefix.
+      ++Cur;
+      break;
+    default:
+      break;
+    }
+  }
+  PrefixEnd = Cur;
+
+  // Find the range that contains hex digits.
+  HexDigitBegin = Cur;
+  if (IsHex) {
+    while (Cur != End) {
+      const char C = *Cur;
+      if (llvm::isHexDigit(C)) {
+      } else if (C == Transforms.Separator) {
+      } else if (C == '.') {
+        IsFloat = true;
+      } else {
+        break;
+      }
+      ++Cur;
+    }
+  }
+  HexDigitEnd = Cur;
+  if (Cur == End)
+    return;
+
+  // Find the range that contains a floating point exponent separator.
+  // Hex digits have already been scanned through the decimal point.
+  // Decimal/octal/binary literals must fast forward through the decimal first.
+  if (!IsHex) {
+    while (Cur != End) {
+      const char C = *Cur;
+      if (llvm::isDigit(C)) {
+      } else if (C == Transforms.Separator) {
+      } else if (C == '.') {
+        IsFloat = true;
+      } else {
+        break;
+      }
+      ++Cur;
+    }
+  }
+
+  const char LSep = IsHex ? 'p' : 'e';
+  const char USep = IsHex ? 'P' : 'E';
+  // The next character of a floating point literal will either be the
+  // separator, or the start of a suffix.
+  FloatExponentSeparatorBegin = Cur;
+  if (IsFloat) {
+    const char C = *Cur;
+    if ((C == LSep) || (C == USep))
+      ++Cur;
+  }
+  FloatExponentSeparatorEnd = Cur;
+  if (Cur == End)
+    return;
+
+  // Fast forward through the exponent part of a floating point literal.
+  if (!IsFloat) {
+  } else if (FloatExponentSeparatorBegin == FloatExponentSeparatorEnd) {
+  } else {
+    while (Cur != End) {
+      const char C = *Cur;
+      if (llvm::isDigit(C)) {
+      } else if (C == '+') {
+      } else if (C == '-') {
+      } else {
+        break;
+      }
+      ++Cur;
+    }
+  }
+  if (Cur == End)
+    return;
+
+  // Find the range containing a suffix if any.
+  SuffixBegin = Cur;
+  size_t const SuffixLen = End - Cur;
+  StringRef suffix(&(*SuffixBegin), SuffixLen);
+  if (!matchesReservedSuffix(suffix)) {
+    while (Cur != End) {
+      const char C = *Cur;
+      if (C == '_') {
+        // In C++, it is idiomatic, but NOT standard to define user-defined
+        // literals with a leading '_'. Omit user defined literals from
+        // transformation.
+        break;
+      } else {
+      }
+      ++Cur;
+    }
+  }
+  SuffixEnd = Cur;
+}
+
+void QuickNumericalConstantParser::applyFormatting() {
+
+  auto Start = Formatted.cbegin();
+  auto End = Formatted.cend();
+
+  assert((Start <= PrefixBegin) && (End >= PrefixBegin) &&
+         "PrefixBegin is out of bounds");
+  assert((Start <= PrefixEnd) && (End >= PrefixEnd) &&
+         "PrefixEnd is out of bounds");
+  assert((Start <= HexDigitBegin) && (End >= HexDigitBegin) &&
+         "HexDigitBegin is out of bounds");
+  assert((Start <= HexDigitEnd) && (End >= HexDigitEnd) &&
+         "HexDigitEnd is out of bounds");
+  assert((Start <= FloatExponentSeparatorBegin) &&
+         (End >= FloatExponentSeparatorBegin) &&
+         "FloatExponentSeparatorBegin is out of bounds");
+  assert((Start <= FloatExponentSeparatorEnd) &&
+         (End >= FloatExponentSeparatorEnd) &&
+         "FloatExponentSeparatorEnd is out of bounds");
+  assert((Start <= SuffixBegin) && (End >= SuffixBegin) &&
+         "SuffixBegin is out of bounds");
+  assert((Start <= SuffixEnd) && (End >= SuffixEnd) &&
+         "SuffixEnd is out of bounds");
+
+  std::transform(PrefixBegin, PrefixEnd, PrefixBegin, Transforms.Prefix);
+  std::transform(HexDigitBegin, HexDigitEnd, HexDigitBegin,
+                 Transforms.HexDigit);
+  std::transform(FloatExponentSeparatorBegin, FloatExponentSeparatorEnd,
+                 FloatExponentSeparatorBegin,
+                 Transforms.FloatExponentSeparator);
+  std::transform(SuffixBegin, SuffixEnd, SuffixBegin, Transforms.Suffix);
+}
+
+std::optional<std::string> QuickNumericalConstantParser::formatIfNeeded() && {
+  parse();
+  applyFormatting();
+
+  return (Formatted == IntegerLiteral)
+             ? std::nullopt
+             : std::make_optional<std::string>(std::move(Formatted));
+}
+
+std::pair<tooling::Replacements, unsigned>
+NumericLiteralCaseFixer::process(const Environment &Env,
+                                 const FormatStyle &Style) {
+  switch (Style.Language) {
+  case FormatStyle::LK_C:
+  case FormatStyle::LK_Cpp:
+  case FormatStyle::LK_ObjC:
+  case FormatStyle::LK_CSharp:
+  case FormatStyle::LK_Java:
+  case FormatStyle::LK_JavaScript:
+    break;
+  default:
+    return {};
+  }
+
+  const auto &CaseStyle = Style.NumericLiteralCase;
+
+  const FormatStyle::NumericLiteralCaseStyle no_case_style{};
+  const bool SkipCaseFormatting = CaseStyle == no_case_style;
+
+  if (SkipCaseFormatting)
+    return {};
+
+  const FormatParameters Transforms{Style.Language, CaseStyle};
+
+  const auto &SourceMgr = Env.getSourceManager();
+  AffectedRangeManager AffectedRangeMgr(SourceMgr, Env.getCharRanges());
+
+  const auto ID = Env.getFileID();
+  const auto LangOpts = getFormattingLangOpts(Style);
+  Lexer Lex(ID, SourceMgr.getBufferOrFake(ID), SourceMgr, LangOpts);
+  Lex.SetCommentRetentionState(true);
+
+  Token Tok;
+  tooling::Replacements Result;
+  bool Skip = false;
+
+  while (!Lex.LexFromRawLexer(Tok)) {
+    // Skip tokens that are too small to contain a formattable literal.
+    auto Length = Tok.getLength();
+    if (Length < 2)
+      continue;
+
+    // Service clang-format off/on comments.
+    auto Location = Tok.getLocation();
+    auto Text = StringRef(SourceMgr.getCharacterData(Location), Length);
+    if (Tok.is(tok::comment)) {
+      if (isClangFormatOff(Text))
+        Skip = true;
+      else if (isClangFormatOn(Text))
+        Skip = false;
+      continue;
+    }
+
+    if (Skip || Tok.isNot(tok::numeric_constant) ||
+        !AffectedRangeMgr.affectsCharSourceRange(
+            CharSourceRange::getCharRange(Location, Tok.getEndLoc()))) {
+      continue;
+    }
+
+    const auto Formatted =
+        QuickNumericalConstantParser(Text, Transforms).formatIfNeeded();
+    if (Formatted) {
+      assert(*Formatted != Text && "QuickNumericalConstantParser returned an "
+                                   "unchanged value instead of nullopt");
+      cantFail(Result.add(
+          tooling::Replacement(SourceMgr, Location, Length, *Formatted)));
+    }
+  }
+
+  return {Result, 0};
+}
+
+} // namespace format
+} // namespace clang
diff --git a/clang/lib/Format/NumericLiteralCaseFixer.h b/clang/lib/Format/NumericLiteralCaseFixer.h
new file mode 100644
index 0000000000000..265d7343c468b
--- /dev/null
+++ b/clang/lib/Format/NumericLiteralCaseFixer.h
@@ -0,0 +1,32 @@
+//===--- NumericLiteralCaseFixer.h -------------------------*- C++ -*-===//
+//
+// Part of the LLVM Projec...
[truncated]

@llvmbot
Copy link
Member

llvmbot commented Jul 31, 2025

@llvm/pr-subscribers-clang-format

Author: Andy MacGregor (30Wedge)

Changes

Some languages have the flexibility to use upper or lower case characters interchangeably in integer and float literal definitions.

I'd like to be able to enforce a consistent case style in one of my projects, so I added this clang-format style option to control it.

With this .clang-format configuration:

    NumericLiteralCaseStyle:
      PrefixCase: -1
      HexDigitCase: 1
      SuffixCase: -1

This line of code:

    unsigned long long  0XdEaDbEeFUll;

gets reformatted into this line of code:

    unsigned long long 0xDEAFBEEFull;

I'm new to this project, so please let me know if I missed something in the process. I modeled this PR from IntegerLiteralSeparatorFixer


Patch is 34.32 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/151590.diff

9 Files Affected:

  • (modified) clang/docs/ClangFormatStyleOptions.rst (+72-1)
  • (modified) clang/docs/ReleaseNotes.rst (+2)
  • (modified) clang/include/clang/Format/Format.h (+49)
  • (modified) clang/lib/Format/CMakeLists.txt (+1)
  • (modified) clang/lib/Format/Format.cpp (+19)
  • (added) clang/lib/Format/NumericLiteralCaseFixer.cpp (+368)
  • (added) clang/lib/Format/NumericLiteralCaseFixer.h (+32)
  • (modified) clang/unittests/Format/CMakeLists.txt (+1)
  • (added) clang/unittests/Format/NumericLiteralCaseTest.cpp (+354)
diff --git a/clang/docs/ClangFormatStyleOptions.rst b/clang/docs/ClangFormatStyleOptions.rst
index 02986a94a656c..abc73b0ae183c 100644
--- a/clang/docs/ClangFormatStyleOptions.rst
+++ b/clang/docs/ClangFormatStyleOptions.rst
@@ -4555,7 +4555,6 @@ the configuration (without a prefix: ``Auto``).
     So inserting a trailing comma counteracts bin-packing.
 
 
-
 .. _IntegerLiteralSeparator:
 
 **IntegerLiteralSeparator** (``IntegerLiteralSeparatorStyle``) :versionbadge:`clang-format 16` :ref:`¶ <IntegerLiteralSeparator>`
@@ -5076,6 +5075,78 @@ the configuration (without a prefix: ``Auto``).
 
   For example: TESTSUITE
 
+.. _NumericLiteralCase:
+
+**NumericLiteralCase** (``NumericLiteralCaseStyle``) :versionbadge:`clang-format 21` :ref:`¶ <NumericLiteralCase>`
+  Controls character case in numeric literals.
+
+  Possible values for each nexted configuration flag:
+
+  * ``0`` (Default) Do not modify characters.
+
+  * ``-1`` Convert characters to lower case.
+
+  * ``1`` Convert characters to upper case.
+
+  .. code-block:: yaml
+
+    # Example of usage:
+    NumericLiteralCaseStyle:
+      PrefixCase: -1
+      HexDigitCase: 1
+      FloatExponentSeparatorCase: 0
+      SuffixCase: -1
+
+  .. code-block:: c++
+
+    // Lower case prefix, upper case hexadecimal digits, lower case suffix
+    unsigned int 0xDEAFBEEFull;
+
+  Nested configuration flags:
+
+  * ``int PrefixCase`` Control numeric constant prefix case.
+
+   .. code-block:: c++
+
+      // PrefixCase: 1
+      int a = 0B101 | 0XF0;
+      // PrefixCase: -1
+      int a = 0b101 | 0xF0;
+      // PrefixCase: 0
+      int c = 0b101 | 0XF0;
+
+  * ``int HexDigitCase`` Control hexadecimal digit case.
+
+    .. code-block:: c++
+
+      // HexDigitCase: 1
+      int a = 0xBEAD;
+      // PrefixCase: -1
+      int b = 0xbead;
+      // PrefixCase: 0
+      int c = 0xBeAd;
+
+  * ``int FloatExponentSeparatorCase`` Control exponent separator case.
+
+    .. code-block:: c++
+
+      // FloatExponentSeparatorCase: 1
+      float a = 6.02E+23;
+      // FloatExponentSeparatorCase: -1
+      float b = 6.02e+23;
+
+  * ``int SuffixCase`` Control suffix case.
+
+    .. code-block:: c++
+
+      // SuffixCase: 1
+      unsigned long long a = 1ULL;
+      // SuffixCase: -1
+      unsigned long long a = 1ull;
+      // SuffixCase: 0
+      unsigned long long c = 1uLL;
+
+
 .. _ObjCBinPackProtocolList:
 
 **ObjCBinPackProtocolList** (``BinPackStyle``) :versionbadge:`clang-format 7` :ref:`¶ <ObjCBinPackProtocolList>`
diff --git a/clang/docs/ReleaseNotes.rst b/clang/docs/ReleaseNotes.rst
index 4a2edae7509de..f45363f86c135 100644
--- a/clang/docs/ReleaseNotes.rst
+++ b/clang/docs/ReleaseNotes.rst
@@ -230,6 +230,8 @@ AST Matchers
 
 clang-format
 ------------
+- Add ``NumericLiteralCase`` option for for enforcing character case in
+  numeric literals.
 
 libclang
 --------
diff --git a/clang/include/clang/Format/Format.h b/clang/include/clang/Format/Format.h
index 31582a40de866..301db5012b980 100644
--- a/clang/include/clang/Format/Format.h
+++ b/clang/include/clang/Format/Format.h
@@ -3100,6 +3100,54 @@ struct FormatStyle {
   /// \version 11
   TrailingCommaStyle InsertTrailingCommas;
 
+  /// Character case format for different components of a numeric literal.
+  ///
+  /// For all options, ``0`` leave the case unchanged, ``-1``
+  /// uses lower case and, ``1`` uses upper case.
+  ///
+  struct NumericLiteralCaseStyle {
+    /// Format numeric constant prefixes.
+    /// \code{.text}
+    ///   /* -1: lower case */ b = 0x01;
+    ///   /*  0: don't care */
+    ///   /*  1: upper case */ b = 0X01;
+    /// \endcode
+    int8_t PrefixCase;
+    /// Format hexadecimal digit case.
+    /// \code{.text}
+    ///   /* -1: lower case */ b = 0xabcdef;
+    ///   /*  0: don't care */
+    ///   /*  1: upper case */ b = 0xABCDEF;
+    /// \endcode
+    int8_t HexDigitCase;
+    /// Format exponent separator character case in floating point literals.
+    /// \code{.text}
+    ///   /* -1: lower case */ b = 6.02e23;
+    ///   /*  0: don't care */
+    ///   /*  1: upper case */ b = 6.02E23;
+    /// \endcode
+    int8_t FloatExponentSeparatorCase;
+    /// Format suffix case. This option excludes case-specific reserved
+    /// suffixes, such as ``min`` in C++.
+    /// \code{.text}
+    ///   /* -1: lower case */ b = 10u;
+    ///   /*  0: don't care */
+    ///   /*  1: upper case */ b = 10U;
+    /// \endcode
+    int8_t SuffixCase;
+
+    bool operator==(const NumericLiteralCaseStyle &R) const {
+      return PrefixCase == R.PrefixCase && HexDigitCase == R.HexDigitCase &&
+             FloatExponentSeparatorCase == R.FloatExponentSeparatorCase &&
+             SuffixCase == R.SuffixCase;
+    }
+  };
+
+  /// Format numeric literals for languages that support flexible character case
+  /// in numeric literal constants.
+  /// \version 22
+  NumericLiteralCaseStyle NumericLiteralCase;
+
   /// Separator format of integer literals of different bases.
   ///
   /// If negative, remove separators. If  ``0``, leave the literal as is. If
@@ -5424,6 +5472,7 @@ struct FormatStyle {
            IndentWrappedFunctionNames == R.IndentWrappedFunctionNames &&
            InsertBraces == R.InsertBraces &&
            InsertNewlineAtEOF == R.InsertNewlineAtEOF &&
+           NumericLiteralCase == R.NumericLiteralCase &&
            IntegerLiteralSeparator == R.IntegerLiteralSeparator &&
            JavaImportGroups == R.JavaImportGroups &&
            JavaScriptQuotes == R.JavaScriptQuotes &&
diff --git a/clang/lib/Format/CMakeLists.txt b/clang/lib/Format/CMakeLists.txt
index 9f4939824fdb8..a003f1a951af6 100644
--- a/clang/lib/Format/CMakeLists.txt
+++ b/clang/lib/Format/CMakeLists.txt
@@ -13,6 +13,7 @@ add_clang_library(clangFormat
   MacroExpander.cpp
   MatchFilePath.cpp
   NamespaceEndCommentsFixer.cpp
+  NumericLiteralCaseFixer.cpp
   ObjCPropertyAttributeOrderFixer.cpp
   QualifierAlignmentFixer.cpp
   SortJavaScriptImports.cpp
diff --git a/clang/lib/Format/Format.cpp b/clang/lib/Format/Format.cpp
index 063780721423f..711a3e7501328 100644
--- a/clang/lib/Format/Format.cpp
+++ b/clang/lib/Format/Format.cpp
@@ -16,6 +16,7 @@
 #include "DefinitionBlockSeparator.h"
 #include "IntegerLiteralSeparatorFixer.h"
 #include "NamespaceEndCommentsFixer.h"
+#include "NumericLiteralCaseFixer.h"
 #include "ObjCPropertyAttributeOrderFixer.h"
 #include "QualifierAlignmentFixer.h"
 #include "SortJavaScriptImports.h"
@@ -382,6 +383,16 @@ struct ScalarEnumerationTraits<FormatStyle::IndentExternBlockStyle> {
   }
 };
 
+template <> struct MappingTraits<FormatStyle::NumericLiteralCaseStyle> {
+  static void mapping(IO &IO, FormatStyle::NumericLiteralCaseStyle &Base) {
+    IO.mapOptional("PrefixCase", Base.PrefixCase);
+    IO.mapOptional("HexDigitCase", Base.HexDigitCase);
+    IO.mapOptional("FloatExponentSeparatorCase",
+                   Base.FloatExponentSeparatorCase);
+    IO.mapOptional("SuffixCase", Base.SuffixCase);
+  }
+};
+
 template <> struct MappingTraits<FormatStyle::IntegerLiteralSeparatorStyle> {
   static void mapping(IO &IO, FormatStyle::IntegerLiteralSeparatorStyle &Base) {
     IO.mapOptional("Binary", Base.Binary);
@@ -1093,6 +1104,7 @@ template <> struct MappingTraits<FormatStyle> {
     IO.mapOptional("InsertBraces", Style.InsertBraces);
     IO.mapOptional("InsertNewlineAtEOF", Style.InsertNewlineAtEOF);
     IO.mapOptional("InsertTrailingCommas", Style.InsertTrailingCommas);
+    IO.mapOptional("NumericLiteralCase", Style.NumericLiteralCase);
     IO.mapOptional("IntegerLiteralSeparator", Style.IntegerLiteralSeparator);
     IO.mapOptional("JavaImportGroups", Style.JavaImportGroups);
     IO.mapOptional("JavaScriptQuotes", Style.JavaScriptQuotes);
@@ -1618,6 +1630,9 @@ FormatStyle getLLVMStyle(FormatStyle::LanguageKind Language) {
   LLVMStyle.InsertBraces = false;
   LLVMStyle.InsertNewlineAtEOF = false;
   LLVMStyle.InsertTrailingCommas = FormatStyle::TCS_None;
+  LLVMStyle.NumericLiteralCase = {/*PrefixCase=*/0, /*HexDigitCase=*/0,
+                                  /*FloatExponentSeparatorCase=*/0,
+                                  /*SuffixCase=*/0};
   LLVMStyle.IntegerLiteralSeparator = {
       /*Binary=*/0,  /*BinaryMinDigits=*/0,
       /*Decimal=*/0, /*DecimalMinDigits=*/0,
@@ -3872,6 +3887,10 @@ reformat(const FormatStyle &Style, StringRef Code,
     return IntegerLiteralSeparatorFixer().process(Env, Expanded);
   });
 
+  Passes.emplace_back([&](const Environment &Env) {
+    return NumericLiteralCaseFixer().process(Env, Expanded);
+  });
+
   if (Style.isCpp()) {
     if (Style.QualifierAlignment != FormatStyle::QAS_Leave)
       addQualifierAlignmentFixerPasses(Expanded, Passes);
diff --git a/clang/lib/Format/NumericLiteralCaseFixer.cpp b/clang/lib/Format/NumericLiteralCaseFixer.cpp
new file mode 100644
index 0000000000000..88adaf83fe381
--- /dev/null
+++ b/clang/lib/Format/NumericLiteralCaseFixer.cpp
@@ -0,0 +1,368 @@
+//===--- NumericLiteralCaseFixer.cpp -----------------------*- C++ -*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+///
+/// \file
+/// This file implements NumericLiteralCaseFixer that standardizes character
+/// case within numeric literal constants.
+///
+//===----------------------------------------------------------------------===//
+
+#include "NumericLiteralCaseFixer.h"
+
+#include "llvm/ADT/StringExtras.h"
+
+#include <algorithm>
+
+namespace clang {
+namespace format {
+
+using CharTransformFn = char (*)(char C);
+namespace {
+
+/// @brief Collection of std::transform predicates for each part of a numeric
+/// literal
+struct FormatParameters {
+  FormatParameters(FormatStyle::LanguageKind Language,
+                   const FormatStyle::NumericLiteralCaseStyle &CaseStyle);
+
+  CharTransformFn Prefix;
+  CharTransformFn HexDigit;
+  CharTransformFn FloatExponentSeparator;
+  CharTransformFn Suffix;
+
+  char Separator;
+};
+
+/// @brief Parse a single numeric constant from text into ranges that are
+/// appropriate for applying NumericLiteralCaseStyle rules.
+class QuickNumericalConstantParser {
+public:
+  QuickNumericalConstantParser(const StringRef &IntegerLiteral,
+                               const FormatParameters &Transforms);
+
+  /// @brief Reformats the numeric constant if needed.
+  /// Calling this method invalidates the object's state.
+  /// @return std::nullopt if no reformatting is required. std::option<>
+  /// containing the reformatted string otherwise.
+  std::optional<std::string> formatIfNeeded() &&;
+
+private:
+  const StringRef &IntegerLiteral;
+  const FormatParameters &Transforms;
+
+  std::string Formatted;
+
+  std::string::iterator PrefixBegin;
+  std::string::iterator PrefixEnd;
+  std::string::iterator HexDigitBegin;
+  std::string::iterator HexDigitEnd;
+  std::string::iterator FloatExponentSeparatorBegin;
+  std::string::iterator FloatExponentSeparatorEnd;
+  std::string::iterator SuffixBegin;
+  std::string::iterator SuffixEnd;
+
+  void parse();
+  void applyFormatting();
+};
+
+} // namespace
+
+static char noOpTransform(char C) { return C; }
+
+static CharTransformFn getTransform(int8_t config_value) {
+  switch (config_value) {
+  case -1:
+    return llvm::toLower;
+  case 1:
+    return llvm::toUpper;
+  default:
+    return noOpTransform;
+  }
+}
+
+/// @brief Test if Suffix matches a C++ literal reserved by the library.
+/// Matches against all suffixes reserved in the C++23 standard
+static bool matchesReservedSuffix(StringRef Suffix) {
+  static const std::set<StringRef> ReservedSuffixes = {
+      "h", "min", "s", "ms", "us", "ns", "il", "i", "if", "d", "y",
+  };
+
+  return ReservedSuffixes.find(Suffix) != ReservedSuffixes.end();
+}
+
+FormatParameters::FormatParameters(
+    FormatStyle::LanguageKind Language,
+    const FormatStyle::NumericLiteralCaseStyle &CaseStyle)
+    : Prefix(getTransform(CaseStyle.PrefixCase)),
+      HexDigit(getTransform(CaseStyle.HexDigitCase)),
+      FloatExponentSeparator(
+          getTransform(CaseStyle.FloatExponentSeparatorCase)),
+      Suffix(getTransform(CaseStyle.SuffixCase)) {
+  switch (Language) {
+  case FormatStyle::LK_CSharp:
+  case FormatStyle::LK_Java:
+  case FormatStyle::LK_JavaScript:
+    Separator = '_';
+    break;
+  case FormatStyle::LK_C:
+  case FormatStyle::LK_Cpp:
+  case FormatStyle::LK_ObjC:
+  default:
+    Separator = '\'';
+  }
+}
+
+QuickNumericalConstantParser::QuickNumericalConstantParser(
+    const StringRef &IntegerLiteral, const FormatParameters &Transforms)
+    : IntegerLiteral(IntegerLiteral), Transforms(Transforms),
+      Formatted(IntegerLiteral), PrefixBegin(Formatted.begin()),
+      PrefixEnd(Formatted.begin()), HexDigitBegin(Formatted.begin()),
+      HexDigitEnd(Formatted.begin()),
+      FloatExponentSeparatorBegin(Formatted.begin()),
+      FloatExponentSeparatorEnd(Formatted.begin()),
+      SuffixBegin(Formatted.begin()), SuffixEnd(Formatted.begin()) {}
+
+void QuickNumericalConstantParser::parse() {
+  auto Cur = Formatted.begin();
+  auto End = Formatted.cend();
+
+  bool IsHex = false;
+  bool IsFloat = false;
+
+  // Find the range that contains the prefix.
+  PrefixBegin = Cur;
+  if (*Cur != '0') {
+  } else {
+    ++Cur;
+    const char C = *Cur;
+    switch (C) {
+    case 'x':
+    case 'X':
+      IsHex = true;
+      ++Cur;
+      break;
+    case 'b':
+    case 'B':
+      ++Cur;
+      break;
+    case 'o':
+    case 'O':
+      // Javascript uses 0o as octal prefix.
+      ++Cur;
+      break;
+    default:
+      break;
+    }
+  }
+  PrefixEnd = Cur;
+
+  // Find the range that contains hex digits.
+  HexDigitBegin = Cur;
+  if (IsHex) {
+    while (Cur != End) {
+      const char C = *Cur;
+      if (llvm::isHexDigit(C)) {
+      } else if (C == Transforms.Separator) {
+      } else if (C == '.') {
+        IsFloat = true;
+      } else {
+        break;
+      }
+      ++Cur;
+    }
+  }
+  HexDigitEnd = Cur;
+  if (Cur == End)
+    return;
+
+  // Find the range that contains a floating point exponent separator.
+  // Hex digits have already been scanned through the decimal point.
+  // Decimal/octal/binary literals must fast forward through the decimal first.
+  if (!IsHex) {
+    while (Cur != End) {
+      const char C = *Cur;
+      if (llvm::isDigit(C)) {
+      } else if (C == Transforms.Separator) {
+      } else if (C == '.') {
+        IsFloat = true;
+      } else {
+        break;
+      }
+      ++Cur;
+    }
+  }
+
+  const char LSep = IsHex ? 'p' : 'e';
+  const char USep = IsHex ? 'P' : 'E';
+  // The next character of a floating point literal will either be the
+  // separator, or the start of a suffix.
+  FloatExponentSeparatorBegin = Cur;
+  if (IsFloat) {
+    const char C = *Cur;
+    if ((C == LSep) || (C == USep))
+      ++Cur;
+  }
+  FloatExponentSeparatorEnd = Cur;
+  if (Cur == End)
+    return;
+
+  // Fast forward through the exponent part of a floating point literal.
+  if (!IsFloat) {
+  } else if (FloatExponentSeparatorBegin == FloatExponentSeparatorEnd) {
+  } else {
+    while (Cur != End) {
+      const char C = *Cur;
+      if (llvm::isDigit(C)) {
+      } else if (C == '+') {
+      } else if (C == '-') {
+      } else {
+        break;
+      }
+      ++Cur;
+    }
+  }
+  if (Cur == End)
+    return;
+
+  // Find the range containing a suffix if any.
+  SuffixBegin = Cur;
+  size_t const SuffixLen = End - Cur;
+  StringRef suffix(&(*SuffixBegin), SuffixLen);
+  if (!matchesReservedSuffix(suffix)) {
+    while (Cur != End) {
+      const char C = *Cur;
+      if (C == '_') {
+        // In C++, it is idiomatic, but NOT standard to define user-defined
+        // literals with a leading '_'. Omit user defined literals from
+        // transformation.
+        break;
+      } else {
+      }
+      ++Cur;
+    }
+  }
+  SuffixEnd = Cur;
+}
+
+void QuickNumericalConstantParser::applyFormatting() {
+
+  auto Start = Formatted.cbegin();
+  auto End = Formatted.cend();
+
+  assert((Start <= PrefixBegin) && (End >= PrefixBegin) &&
+         "PrefixBegin is out of bounds");
+  assert((Start <= PrefixEnd) && (End >= PrefixEnd) &&
+         "PrefixEnd is out of bounds");
+  assert((Start <= HexDigitBegin) && (End >= HexDigitBegin) &&
+         "HexDigitBegin is out of bounds");
+  assert((Start <= HexDigitEnd) && (End >= HexDigitEnd) &&
+         "HexDigitEnd is out of bounds");
+  assert((Start <= FloatExponentSeparatorBegin) &&
+         (End >= FloatExponentSeparatorBegin) &&
+         "FloatExponentSeparatorBegin is out of bounds");
+  assert((Start <= FloatExponentSeparatorEnd) &&
+         (End >= FloatExponentSeparatorEnd) &&
+         "FloatExponentSeparatorEnd is out of bounds");
+  assert((Start <= SuffixBegin) && (End >= SuffixBegin) &&
+         "SuffixBegin is out of bounds");
+  assert((Start <= SuffixEnd) && (End >= SuffixEnd) &&
+         "SuffixEnd is out of bounds");
+
+  std::transform(PrefixBegin, PrefixEnd, PrefixBegin, Transforms.Prefix);
+  std::transform(HexDigitBegin, HexDigitEnd, HexDigitBegin,
+                 Transforms.HexDigit);
+  std::transform(FloatExponentSeparatorBegin, FloatExponentSeparatorEnd,
+                 FloatExponentSeparatorBegin,
+                 Transforms.FloatExponentSeparator);
+  std::transform(SuffixBegin, SuffixEnd, SuffixBegin, Transforms.Suffix);
+}
+
+std::optional<std::string> QuickNumericalConstantParser::formatIfNeeded() && {
+  parse();
+  applyFormatting();
+
+  return (Formatted == IntegerLiteral)
+             ? std::nullopt
+             : std::make_optional<std::string>(std::move(Formatted));
+}
+
+std::pair<tooling::Replacements, unsigned>
+NumericLiteralCaseFixer::process(const Environment &Env,
+                                 const FormatStyle &Style) {
+  switch (Style.Language) {
+  case FormatStyle::LK_C:
+  case FormatStyle::LK_Cpp:
+  case FormatStyle::LK_ObjC:
+  case FormatStyle::LK_CSharp:
+  case FormatStyle::LK_Java:
+  case FormatStyle::LK_JavaScript:
+    break;
+  default:
+    return {};
+  }
+
+  const auto &CaseStyle = Style.NumericLiteralCase;
+
+  const FormatStyle::NumericLiteralCaseStyle no_case_style{};
+  const bool SkipCaseFormatting = CaseStyle == no_case_style;
+
+  if (SkipCaseFormatting)
+    return {};
+
+  const FormatParameters Transforms{Style.Language, CaseStyle};
+
+  const auto &SourceMgr = Env.getSourceManager();
+  AffectedRangeManager AffectedRangeMgr(SourceMgr, Env.getCharRanges());
+
+  const auto ID = Env.getFileID();
+  const auto LangOpts = getFormattingLangOpts(Style);
+  Lexer Lex(ID, SourceMgr.getBufferOrFake(ID), SourceMgr, LangOpts);
+  Lex.SetCommentRetentionState(true);
+
+  Token Tok;
+  tooling::Replacements Result;
+  bool Skip = false;
+
+  while (!Lex.LexFromRawLexer(Tok)) {
+    // Skip tokens that are too small to contain a formattable literal.
+    auto Length = Tok.getLength();
+    if (Length < 2)
+      continue;
+
+    // Service clang-format off/on comments.
+    auto Location = Tok.getLocation();
+    auto Text = StringRef(SourceMgr.getCharacterData(Location), Length);
+    if (Tok.is(tok::comment)) {
+      if (isClangFormatOff(Text))
+        Skip = true;
+      else if (isClangFormatOn(Text))
+        Skip = false;
+      continue;
+    }
+
+    if (Skip || Tok.isNot(tok::numeric_constant) ||
+        !AffectedRangeMgr.affectsCharSourceRange(
+            CharSourceRange::getCharRange(Location, Tok.getEndLoc()))) {
+      continue;
+    }
+
+    const auto Formatted =
+        QuickNumericalConstantParser(Text, Transforms).formatIfNeeded();
+    if (Formatted) {
+      assert(*Formatted != Text && "QuickNumericalConstantParser returned an "
+                                   "unchanged value instead of nullopt");
+      cantFail(Result.add(
+          tooling::Replacement(SourceMgr, Location, Length, *Formatted)));
+    }
+  }
+
+  return {Result, 0};
+}
+
+} // namespace format
+} // namespace clang
diff --git a/clang/lib/Format/NumericLiteralCaseFixer.h b/clang/lib/Format/NumericLiteralCaseFixer.h
new file mode 100644
index 0000000000000..265d7343c468b
--- /dev/null
+++ b/clang/lib/Format/NumericLiteralCaseFixer.h
@@ -0,0 +1,32 @@
+//===--- NumericLiteralCaseFixer.h -------------------------*- C++ -*-===//
+//
+// Part of the LLVM Projec...
[truncated]

Copy link
Contributor

@JustinStitt JustinStitt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a few comments, most of them nits or typos. This isn't a code area I've reviewed much hence the minimal code comments.

Looks great though!

@JustinStitt
Copy link
Contributor

JustinStitt commented Jul 31, 2025

Looking at the CI it seems the clang/test/Format/docs_updated.test test failed. This may be due to incorrect formatting of your style option in ClangFormatStyleOptions.rst but I am not actually sure. Check it out, though.

Copy link
Contributor

@HazardyKnusperkeks HazardyKnusperkeks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A bit to do, but I like the proposed feature.

@HazardyKnusperkeks
Copy link
Contributor

Looking at the CI it seems the clang/test/Format/docs_updated.test test failed. This may be due to incorrect formatting of your style option in ClangFormatStyleOptions.rst but I am not actually sure. Check it out, though.

Most likely the docs were not generated with the script, but manually, otherwise it wouldn't mismatch with version 21 vs 22.

@30Wedge
Copy link
Contributor Author

30Wedge commented Aug 2, 2025

Most likely the docs were not generated with the script, but manually

Yup! I manually edited ClangFormatStyleOptions.rst in the first commit. Much easier to have a script do it for you 🙂

In the fixup, this file is regenerated with: python clang/docs/tools/dump_format_style.py

Copy link
Contributor

@JustinStitt JustinStitt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, can't give any comprehensive code critiques but you addressed my nits and such so LGTM.

@github-actions
Copy link

github-actions bot commented Aug 3, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

@owenca
Copy link
Contributor

owenca commented Aug 3, 2025

Please wait for @HazardyKnusperkeks and @mydeveloperday.

@owenca owenca requested a review from mydeveloperday August 3, 2025 08:38
Copy link
Contributor

@HazardyKnusperkeks HazardyKnusperkeks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not finished reviewing the changes, but must leave now. ;)

@owenca
Copy link
Contributor

owenca commented Aug 3, 2025

I suggest NumericLiteralCase, Prefix, HexDigit, ExponentLetter, and Suffix for the option names and Leave, Lower, and Upper for the enum values. For example:

NumericLiteralCase:
  Prefix:         Lower # 0x, 0b, etc.
  HexDigit:       Upper # ABCDEF
  ExponentLetter: Lower # e and p
  Suffix:         Lower # ull, bf16, etc.

@HazardyKnusperkeks
Copy link
Contributor

I suggest NumericLiteralCase, Prefix, HexDigit, ExponentLetter, and Suffix for the option names and Leave, Lower, and Upper for the enum values. For example:

NumericLiteralCase:
  Prefix:         Lower # 0x, 0b, etc.
  HexDigit:       Upper # ABCDEF
  ExponentLetter: Lower # e and p
  Suffix:         Lower # ull, bf16, etc.

That is way better.

@30Wedge
Copy link
Contributor Author

30Wedge commented Aug 4, 2025

I suggest NumericLiteralCase, Prefix, HexDigit, ExponentLetter, and Suffix for the option names and Leave, Lower, and Upper for the enum values.

I agree. So much more concise.

Copy link
Contributor

@owenca owenca left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that we don't need to add a separate formatting pass for this new option as changing the case of letters in numeric literals has no impact on any existing passes. IMO, the best place to handle this is in FormatTokenLexer::getNextToken(). For example:

--- a/clang/lib/Format/FormatTokenLexer.cpp
+++ b/clang/lib/Format/FormatTokenLexer.cpp
@@ -1313,6 +1313,9 @@ FormatToken *FormatTokenLexer::getNextToken() {
     }
     WhitespaceLength += Text.size();
     readRawToken(*FormatTok);
+    if (FormatTok->Finalized || FormatTok->isNot(tok::numeric_constant))
+      continue;
+    // Handle Style.NumericLiteralCase here.
   }
 
   if (FormatTok->is(tok::unknown))

@30Wedge
Copy link
Contributor Author

30Wedge commented Aug 5, 2025

IMO, the best place to handle this is in FormatTokenLexer::getNextToken().

I see you have much more experience in this part of the codebase, but I have some hangups because I don't understand the implications of doing this move myself. Here's what I'm thinking; are these valid concerns?

The FormatTokenLexer class seems like it does the work of parsing a file into tokens for use downstream by all of the other formatters. What are the implications of running NumericLiteralCaseFixer or any other reformatting at this stage for all consumers? Do consumers of lexed FormatTokens assume they are receiving a faithful representation of the underlying file? Are there architecture issues regarding separation of concerns that come up if we do formatting directly in the lexing stage?

Separately, I think that all classes that subclass TokenAnalyzer and use TokenAnalyzer::process() would wind up calling FormatTokenLexer::lex() -- which would end up running NumericLiteralCaseFixer reformatting redundantly for each other pass. Please correct me if I'm reading this wrong.
I could see how that is still lower overhead than adding a totally separate pass, but it still seems odd to run the same reformatting function many separate times in unrelated passes.
Maybe it would be better to add NumericLiteralCaseFixer into some other existing pass instead?

@owenca
Copy link
Contributor

owenca commented Aug 6, 2025

IMO, the best place to handle this is in FormatTokenLexer::getNextToken().

I see you have much more experience in this part of the codebase, but I have some hangups because I don't understand the implications of doing this move myself. Here's what I'm thinking; are these valid concerns?

The FormatTokenLexer class seems like it does the work of parsing a file into tokens for use downstream by all of the other formatters. What are the implications of running NumericLiteralCaseFixer or any other reformatting at this stage for all consumers? Do consumers of lexed FormatTokens assume they are receiving a faithful representation of the underlying file? Are there architecture issues regarding separation of concerns that come up if we do formatting directly in the lexing stage?

Separately, I think that all classes that subclass TokenAnalyzer and use TokenAnalyzer::process() would wind up calling FormatTokenLexer::lex() -- which would end up running NumericLiteralCaseFixer reformatting redundantly for each other pass. Please correct me if I'm reading this wrong. I could see how that is still lower overhead than adding a totally separate pass, but it still seems odd to run the same reformatting function many separate times in unrelated passes. Maybe it would be better to add NumericLiteralCaseFixer into some other existing pass instead?

You are absolutely right! I was wrong about handling NumericLiteralCase in FormatTokenLexer and totally agree with you that it'd be better to do that in an existing pass.

@30Wedge
Copy link
Contributor Author

30Wedge commented Aug 7, 2025

Cool, thanks for hearing me out! I am working on handling NumericLiteralCase in the same pass as IntegerLiteralSeparatorFixer; that seemed natural since they both only modify single numeric_constant tokens. I won't have time to get to it until next week.

@30Wedge 30Wedge requested a review from owenca August 26, 2025 00:23
Copy link
Contributor

@mydeveloperday mydeveloperday left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice!

@30Wedge 30Wedge requested a review from owenca August 27, 2025 02:12
Copy link
Contributor

@owenca owenca left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other compiler warnings:

clang/unittests/Format/NumericLiteralCaseTest.cpp:65:23: warning: unused variable 'K' [-Wunused-variable]
   65 |   constexpr StringRef K{"k = 0x0;"};
      |                       ^
clang/unittests/Format/NumericLiteralCaseTest.cpp:66:23: warning: unused variable 'L' [-Wunused-variable]
   66 |   constexpr StringRef L{"l = 0xA;"};
      |                       ^

Copy link
Contributor

@owenca owenca left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to do e.g. ninja clean if you missed compiler warnings.

@30Wedge 30Wedge requested a review from owenca September 9, 2025 20:40
Copy link
Contributor

@owenca owenca left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM except a few missed nits.

@owenca owenca changed the title [clang-format] Add an option to format integer and float literal case [clang-format] Add an option to format numeric literal case Sep 10, 2025
@30Wedge 30Wedge force-pushed the format-integer-case branch from 9d1fd48 to ad46ad8 Compare September 10, 2025 13:50
@30Wedge
Copy link
Contributor Author

30Wedge commented Sep 10, 2025

Addressed missed nits #151590 (comment), squashed commits and rebased onto main. Thank you all for the review comments.

@owenca owenca removed the clang Clang issues not falling into any other category label Sep 11, 2025
@llvmbot llvmbot added the clang Clang issues not falling into any other category label Sep 12, 2025
@owenca owenca removed the clang Clang issues not falling into any other category label Sep 12, 2025
@owenca owenca merged commit 220d705 into llvm:main Sep 12, 2025
9 of 11 checks passed
@github-actions
Copy link

@30Wedge Congratulations on having your first Pull Request (PR) merged into the LLVM Project!

Your changes will be combined with recent changes from other authors, then tested by our build bots. If there is a problem with a build, you may receive a report in an email or a comment on this PR.

Please check whether problems have been caused by your change specifically, as the builds can include changes from many authors. It is not uncommon for your change to be included in a build that fails due to someone else's changes, or infrastructure issues.

How to do this, and the rest of the post-merge process, is covered in detail here.

If your change does cause a problem, it may be reverted, or you can revert it yourself. This is a normal part of LLVM development. You can fix your changes and open a new PR to merge them again.

If you don't get any reports, no action is required from you. Your changes are working as expected, well done!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants