Skip to content

Conversation

kgerlich
Copy link

… macro

This adds a new configuration option SpaceBeforeUnderscoreParens to control spacing between underscore and opening parenthesis. This is specifically designed for the gettext internationalization macro '_()' commonly used in GNU projects like GDB.

The option:

  • Defaults to true for LLVM style (preserving existing behavior)
  • Defaults to false for GNU style (removes space before '_(')
  • Only affects single underscore tokens '_', not other identifiers
  • Leaves all other spacing rules unchanged

Examples:
GNU style with SpaceBeforeUnderscoreParens=false:
printf(("Hello")); // No space before '('
my_func (arg); // Space before other functions preserved

LLVM style with SpaceBeforeUnderscoreParens=true:
printf(_("Hello")); // Standard spacing rules apply

This addresses the common pattern in GNU software where gettext messages use '_()' without spaces, improving consistency with GNU coding standards.

Copy link

Thank you for submitting a Pull Request (PR) to the LLVM Project!

This PR will be automatically labeled and the relevant teams will be notified.

If you wish to, you can add reviewers by using the "Reviewers" section on this page.

If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using @ followed by their GitHub username.

If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers.

If you have further questions, they may be answered by the LLVM GitHub User Guide.

You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums.

@llvmbot
Copy link
Member

llvmbot commented Sep 20, 2025

@llvm/pr-subscribers-clang

@llvm/pr-subscribers-clang-format

Author: None (kgerlich)

Changes

… macro

This adds a new configuration option SpaceBeforeUnderscoreParens to control spacing between underscore and opening parenthesis. This is specifically designed for the gettext internationalization macro '_()' commonly used in GNU projects like GDB.

The option:

  • Defaults to true for LLVM style (preserving existing behavior)
  • Defaults to false for GNU style (removes space before '_(')
  • Only affects single underscore tokens '_', not other identifiers
  • Leaves all other spacing rules unchanged

Examples:
GNU style with SpaceBeforeUnderscoreParens=false:
printf(("Hello")); // No space before '('
my_func (arg); // Space before other functions preserved

LLVM style with SpaceBeforeUnderscoreParens=true:
printf(_("Hello")); // Standard spacing rules apply

This addresses the common pattern in GNU software where gettext messages use '_()' without spaces, improving consistency with GNU coding standards.


Full diff: https://github.com/llvm/llvm-project/pull/159925.diff

4 Files Affected:

  • (modified) clang/include/clang/Format/Format.h (+10)
  • (modified) clang/lib/Format/Format.cpp (+4)
  • (modified) clang/lib/Format/TokenAnnotator.cpp (+6)
  • (modified) clang/unittests/Format/FormatTest.cpp (+36)
diff --git a/clang/include/clang/Format/Format.h b/clang/include/clang/Format/Format.h
index 342fefcfc408c..1643c575fcbba 100644
--- a/clang/include/clang/Format/Format.h
+++ b/clang/include/clang/Format/Format.h
@@ -4667,6 +4667,16 @@ struct FormatStyle {
   /// \version 7
   bool SpaceBeforeInheritanceColon;
 
+  /// If ``false``, spaces will be removed between underscore and an opening
+  /// parenthesis. This is specifically for the gettext macro ``_()`` commonly
+  /// used in GNU projects.
+  /// \code
+  ///    true:                                  false:
+  ///    _ (message);                   vs.     _(message);
+  /// \endcode
+  /// \version 19
+  bool SpaceBeforeUnderscoreParens;
+
   /// If ``true``, a space will be added before a JSON colon. For other
   /// languages, e.g. JavaScript, use ``SpacesInContainerLiterals`` instead.
   /// \code
diff --git a/clang/lib/Format/Format.cpp b/clang/lib/Format/Format.cpp
index 24a36d97a6fa9..9181e9a9ecd26 100644
--- a/clang/lib/Format/Format.cpp
+++ b/clang/lib/Format/Format.cpp
@@ -1219,6 +1219,8 @@ template <> struct MappingTraits<FormatStyle> {
                    Style.SpaceBeforeCtorInitializerColon);
     IO.mapOptional("SpaceBeforeInheritanceColon",
                    Style.SpaceBeforeInheritanceColon);
+    IO.mapOptional("SpaceBeforeUnderscoreParens",
+                   Style.SpaceBeforeUnderscoreParens);
     IO.mapOptional("SpaceBeforeJsonColon", Style.SpaceBeforeJsonColon);
     IO.mapOptional("SpaceBeforeParens", Style.SpaceBeforeParens);
     IO.mapOptional("SpaceBeforeParensOptions", Style.SpaceBeforeParensOptions);
@@ -1713,6 +1715,7 @@ FormatStyle getLLVMStyle(FormatStyle::LanguageKind Language) {
   LLVMStyle.SpaceBeforeCpp11BracedList = false;
   LLVMStyle.SpaceBeforeCtorInitializerColon = true;
   LLVMStyle.SpaceBeforeInheritanceColon = true;
+  LLVMStyle.SpaceBeforeUnderscoreParens = true;
   LLVMStyle.SpaceBeforeJsonColon = false;
   LLVMStyle.SpaceBeforeParens = FormatStyle::SBPO_ControlStatements;
   LLVMStyle.SpaceBeforeParensOptions = {};
@@ -2044,6 +2047,7 @@ FormatStyle getGNUStyle() {
   Style.FixNamespaceComments = false;
   Style.KeepFormFeed = true;
   Style.SpaceBeforeParens = FormatStyle::SBPO_Always;
+  Style.SpaceBeforeUnderscoreParens = false;
   return Style;
 }
 
diff --git a/clang/lib/Format/TokenAnnotator.cpp b/clang/lib/Format/TokenAnnotator.cpp
index d97f56751ea69..9af90184d157e 100644
--- a/clang/lib/Format/TokenAnnotator.cpp
+++ b/clang/lib/Format/TokenAnnotator.cpp
@@ -4901,6 +4901,12 @@ bool TokenAnnotator::spaceRequiredBetween(const AnnotatedLine &Line,
     // Handle builtins like identifiers.
     if (Line.Type != LT_PreprocessorDirective &&
         (Left.Tok.getIdentifierInfo() || Left.is(tok::r_paren))) {
+      // Check for special case: single underscore token (gettext macro)
+      if (Left.Tok.getIdentifierInfo() && !Style.SpaceBeforeUnderscoreParens) {
+        StringRef TokenText = Left.TokenText;
+        if (TokenText == "_")
+          return false;
+      }
       return spaceRequiredBeforeParens(Right);
     }
     return false;
diff --git a/clang/unittests/Format/FormatTest.cpp b/clang/unittests/Format/FormatTest.cpp
index d9db06667d802..4ca1f8fbfa363 100644
--- a/clang/unittests/Format/FormatTest.cpp
+++ b/clang/unittests/Format/FormatTest.cpp
@@ -17852,6 +17852,42 @@ TEST_F(FormatTest, ConfigurableSpacesInParens) {
                Spaces);
 }
 
+TEST_F(FormatTest, SpaceBeforeUnderscoreParens) {
+  // Test with SpaceBeforeParens = Always to clearly show the difference
+  FormatStyle Style = getLLVMStyle();
+  Style.SpaceBeforeParens = FormatStyle::SBPO_Always;
+
+  // Default LLVM style should have SpaceBeforeUnderscoreParens = true
+  EXPECT_TRUE(Style.SpaceBeforeUnderscoreParens);
+  verifyFormat("func (arg);", Style);       // All functions should have space
+  verifyFormat("my_func (arg);", Style);    // All functions should have space  
+  verifyFormat("_ (message);", Style);      // Single underscore should have space when SpaceBeforeUnderscoreParens=true
+  verifyFormat("underscore_ (param);", Style); // Other underscores should have space
+
+  // Now test with SpaceBeforeUnderscoreParens = false but SpaceBeforeParens = Always
+  Style.SpaceBeforeUnderscoreParens = false;
+  verifyFormat("func (arg);", Style);       // Non-underscore functions should still have space
+  verifyFormat("my_func (arg);", Style);    // Functions with underscores should still have space
+  verifyFormat("_(message);", Style);       // Single underscore (gettext macro) should NOT have space
+  verifyFormat("underscore_ (param);", Style); // Other underscores should still have space
+  verifyFormat("_private_func (data);", Style); // Functions starting with underscore but not single _ should have space
+
+  // Test GNU style (should have SpaceBeforeUnderscoreParens = false by default)
+  FormatStyle GNUStyle = getGNUStyle();
+  EXPECT_FALSE(GNUStyle.SpaceBeforeUnderscoreParens);
+  EXPECT_EQ(GNUStyle.SpaceBeforeParens, FormatStyle::SBPO_Always); // GNU style should have SpaceBeforeParens = Always
+  verifyFormat("func (arg);", GNUStyle);    // GNU style has SpaceBeforeParens = Always
+  verifyFormat("my_func (arg);", GNUStyle); // Functions with underscores should have space
+  verifyFormat("_(message);", GNUStyle);    // Single underscore (gettext macro) should NOT have space
+  verifyFormat("_private_func (data);", GNUStyle); // Other functions should have space
+
+  // Test mixed scenarios with GNU style
+  verifyFormat("printf (_(\"Hello\"));\n"
+               "func (arg);\n"
+               "_(\"World\");",
+               GNUStyle);
+}
+
 TEST_F(FormatTest, ConfigurableSpacesInSquareBrackets) {
   verifyFormat("int a[5];");
   verifyFormat("a[3] += 42;");

@kgerlich
Copy link
Author

I have not had the pleasure to contribute to LLVM before and I'm new to the rules. I work on GDB open source and I would like to get this obstacle solved. Thanks.

@owenca @mydeveloperday - Could you please review this clang-format enhancement?

This adds support for GNU gettext macro spacing that would benefit projects like GDB.

Copy link

github-actions bot commented Sep 20, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

@kgerlich kgerlich force-pushed the feature/space-before-underscore-parens branch 2 times, most recently from 0898bd6 to fc3192b Compare September 20, 2025 19:44
@llvmbot llvmbot added the clang Clang issues not falling into any other category label Sep 20, 2025
@kgerlich kgerlich force-pushed the feature/space-before-underscore-parens branch from 296b17a to 83bcaeb Compare September 20, 2025 20:12
/// _ (message); vs. _(message);
/// \endcode
/// \version 19
bool SpaceBeforeUnderscoreParens;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
bool SpaceBeforeUnderscoreParens;
bool SpaceBetweenUnderscoreParens;

A bit more concise?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for reviewing this. renamed.

@@ -4901,6 +4901,12 @@ bool TokenAnnotator::spaceRequiredBetween(const AnnotatedLine &Line,
// Handle builtins like identifiers.
if (Line.Type != LT_PreprocessorDirective &&
(Left.Tok.getIdentifierInfo() || Left.is(tok::r_paren))) {
// Check for special case: single underscore token (gettext macro)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Check for special case: single underscore token (gettext macro)
// Check for special case: single underscore token (gettext macro).

Comments end in full stop.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addressed all the comments.

Comment on lines 4905 to 4907
if (Left.Tok.getIdentifierInfo() && !Style.SpaceBeforeUnderscoreParens) {
StringRef TokenText = Left.TokenText;
if (TokenText == "_")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (Left.Tok.getIdentifierInfo() && !Style.SpaceBeforeUnderscoreParens) {
StringRef TokenText = Left.TokenText;
if (TokenText == "_")
if (Left.Tok.getIdentifierInfo() && !Style.SpaceBeforeUnderscoreParens && Left.TokenText == "_")

I'd merge that.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

merged into one line with wrap.

@@ -17852,6 +17852,59 @@ TEST_F(FormatTest, ConfigurableSpacesInParens) {
Spaces);
}

TEST_F(FormatTest, SpaceBeforeUnderscoreParens) {
// Test with SpaceBeforeParens = Always to clearly show the difference
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can drop all of the test comments.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all dropped.

@kgerlich kgerlich force-pushed the feature/space-before-underscore-parens branch from 83bcaeb to 2d82763 Compare September 22, 2025 05:39
@owenca
Copy link
Contributor

owenca commented Sep 22, 2025

In general, we want the new option to meet the requirements. Can you link to the relevant section in GNU coding standards? It seems that this can be supported without a new option.

@kgerlich kgerlich force-pushed the feature/space-before-underscore-parens branch from 2d82763 to c28a884 Compare September 22, 2025 07:17
@klausgerlicher
Copy link

In general, we want the new option to meet the requirements. Can you link to the relevant section in GNU coding standards? It seems that this can be supported without a new option.

Hi @owenca thanks for providing feedback so quickly.

So in general I read from this that in GNU style mode, this could generally avoid separating "_(" and we wouldn not need an option. Could you please explain though what you mean with meet the requirements here? I haven't had any exposure to LLVM policies, just trying to fix some behaviour which was seen over a few years.

@owenca
Copy link
Contributor

owenca commented Sep 22, 2025

In general, we want the new option to meet the requirements. Can you link to the relevant section in GNU coding standards? It seems that this can be supported without a new option.

Hi @owenca thanks for providing feedback so quickly.

So in general I read from this that in GNU style mode, this could generally avoid separating "_(" and we wouldn not need an option. Could you please explain though what you mean with meet the requirements here? I haven't had any exposure to LLVM policies, just trying to fix some behaviour which was seen over a few years.

If click the link, you will see the requirements at the end of the section.

@klausgerlicher
Copy link

In general, we want the new option to meet the requirements. Can you link to the relevant section in GNU coding standards? It seems that this can be supported without a new option.

Hi @owenca thanks for providing feedback so quickly.
So in general I read from this that in GNU style mode, this could generally avoid separating "_(" and we wouldn not need an option. Could you please explain though what you mean with meet the requirements here? I haven't had any exposure to LLVM policies, just trying to fix some behaviour which was seen over a few years.

If click the link, you will see the requirements at the end of the section.

I read up on this a bit:

I could not find a good indication of this in the GNU coding docs.

However in the gettext() documentation the underscore macro is described. https://www.gnu.org/software/gettext/manual/html_node/Mark-Keywords.html

Many packages use ‘_’ (a simple underline) as a keyword, and write ‘_("Translatable string")’ instead of ‘gettext ("Translatable string")’. Further, the coding rule, from GNU standards, wanting that there is a space between the keyword and the opening parenthesis is relaxed, in practice, for this particular usage. So, the textual overhead per translatable string is reduced to only three characters: the underline and the two parentheses. However, even if GNU gettext uses this convention internally, it does not offer it officially. The real, genuine keyword is truly ‘gettext’ indeed. It is fairly easy for those wanting to use ‘_’ instead of ‘gettext’ to declare:

So this would indicate that in GNU style we could do this w/o adding an option.

Copy link
Contributor

@HazardyKnusperkeks HazardyKnusperkeks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do you want it to do without an option? Just never adding a space between _ and ( might not be the thing somebody else wants.

/// _ (message); vs. _(message);
/// \endcode
/// \version 19
bool SpaceBetweenUnderscoreParens;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please resort.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

resorted, thanks.

GDB would have two out of the three requirements @owenca mentioned to allow for a new option. I'm trying to see if in the GDB coding style docs, the _() macro can be described which would then meet all reqs.

…t macro

This adds a new configuration option SpaceBetweenUnderscoreParens to control
spacing between underscore and opening parenthesis. This is specifically
designed for the gettext internationalization macro '_()' commonly used in
GNU projects like GDB.

The option:
- Defaults to true for LLVM style (preserving existing behavior)
- Defaults to false for GNU style (removes space before '_(')
- Only affects single underscore tokens '_', not other identifiers
- Leaves all other spacing rules unchanged

Examples:
  GNU style with SpaceBetweenUnderscoreParens=false:
    printf(_("Hello"));     // No space before '_('
    my_func (arg);            // Space before other functions preserved

  LLVM style with SpaceBetweenUnderscoreParens=true:
    printf(_ ("Hello"));     // Standard spacing rules apply

This addresses the common pattern in GNU software where gettext messages
use '_()' without spaces, improving consistency with GNU coding standards.
@kgerlich kgerlich force-pushed the feature/space-before-underscore-parens branch from c28a884 to 03b6ccc Compare September 22, 2025 10:36
@owenca
Copy link
Contributor

owenca commented Sep 23, 2025

How do you want it to do without an option? Just never adding a space between _ and ( might not be the thing somebody else wants.

If this were in the GNU coding standards, I might agree to adding some special handling for GNU without adding a new option. Regardless, I'm inclined to oppose an option this niche.

@kgerlich you can make clang-format leave _( alone by using the OneLineFormatOffRegex option? For example:

OneLineFormatOffRegex: ^_$

@simark
Copy link

simark commented Sep 23, 2025

How do you want it to do without an option? Just never adding a space between _ and ( might not be the thing somebody else wants.

If this were in the GNU coding standards, I might agree to adding some special handling for GNU without adding a new option. Regardless, I'm inclined to oppose an option this niche.

@kgerlich you can make clang-format leave _( alone by using the OneLineFormatOffRegex option? For example:

OneLineFormatOffRegex: ^_$

Hmm, but that would turn off formatting entirely for those lines? Ideally, we would still want other aspects of formatting to work.

This exception to the "space before parenthesis" applies to pretty much all GNU projects, so I don't think it's that niche. For instance:

If it really helps, we can try to get the rule into the GNU coding standards (I was thinking on this page).

@klausgerlicher
Copy link

I've spoken to one of the GNU maintainers (Bruno Haible) and he tells me that this behaviour for GNU is well documented:

This convention is documented in multiple places in the GNU gettext
documentation [1][2][3][4].

> This omission causes issues with clang-format, which treats _() as
> a regular function call and consistently inserts a space between _ and (
> during formatting. While minor, this behavior is inconvenient for
> projects following GNU conventions.

[2] actually documents the omission of the space in this context:
  "Many packages use ‘_’ (a simple underline) as a keyword, and write
   ‘_("Translatable string")’ instead of ‘gettext ("Translatable string")’.
   Further, the coding rule, from GNU standards, wanting that there is
   a space between the keyword and the opening parenthesis is relaxed,
   in practice, for this particular usage."

This has been documented like this for more than 20 years.

...there are more than 500000
occurrences of this pattern in the Debian GNU/Linux source code? [5]

[1] https://www.gnu.org/software/gettext/manual/html_node/Overview.html
[2] https://www.gnu.org/software/gettext/manual/html_node/Mark-Keywords.html
[3] https://www.gnu.org/software/gettext/manual/html_node/Libraries.html
[4] https://www.gnu.org/software/gettext/manual/html_node/C.html
[5] https://codesearch.debian.net/search?q=+_%28%22&literal=1

So we have all the three requirements that you stated:

  • be used in a project of significant size (have dozens of contributors).
  • have a publicly accessible style guide.
  • have a person willing to contribute and maintain patches: I would be willing to stand in for this.

@owenca, would this be enough to convince you to add this option?

@collinfunk
Copy link

Adding a +1 for a change like this.

Echoing Bruno's thoughts that this has been common practice for a long time, it probably just hasn't been reported since most GNU projects, including the ones I commit to, do not use code formatters.

Style.SpaceBeforeCtorInitializerColon);
IO.mapOptional("SpaceBeforeInheritanceColon",
Style.SpaceBeforeInheritanceColon);
IO.mapOptional("SpaceBetweenUnderscoreParens",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resort here (and below) also.

@owenca
Copy link
Contributor

owenca commented Sep 24, 2025

If this were in the GNU coding standards, I might agree to adding some special handling for GNU without adding a new option. Regardless, I'm inclined to oppose an option this niche.
@kgerlich you can make clang-format leave _( alone by using the OneLineFormatOffRegex option? For example:

OneLineFormatOffRegex: ^_$

Hmm, but that would turn off formatting entirely for those lines? Ideally, we would still want other aspects of formatting to work.

Yes, that's what that option is for.

This exception to the "space before parenthesis" applies to pretty much all GNU projects, so I don't think it's that niche. For instance:

It's niche or too specific because it would only apply to a single underscore as the function name.

If it really helps, we can try to get the rule into the GNU coding standards (I was thinking on this page).

It should go in the Formatting section I linked above, and specifically the paragraph below:

We find it easier to read a program when it has spaces before the open-parentheses and after the commas.
Especially after the commas.

So we have all the three requirements that you stated:

  • be used in a project of significant size (have dozens of contributors).

This is met by default as GNU is one of the predefined styles clang-format supports.

  • have a publicly accessible style guide.

See above. After it's updated, we can discuss how to support the behavior. I still prefer making OneLineFormatOffRegex: ^_$ the default for GNU to adding an exception to SBPO_Always like the following:

--- a/clang/lib/Format/TokenAnnotator.cpp
+++ b/clang/lib/Format/TokenAnnotator.cpp
@@ -4503,8 +4503,15 @@ unsigned TokenAnnotator::splitPenalty(const AnnotatedLine &Line,
 }
 
 bool TokenAnnotator::spaceRequiredBeforeParens(const FormatToken &Right) const {
-  if (Style.SpaceBeforeParens == FormatStyle::SBPO_Always)
+  if (Style.SpaceBeforeParens == FormatStyle::SBPO_Always) {
+    if (Style.BasedOnStyle.equals_insensitive("gnu")) {
+      const auto *Left = Right.Previous;
+      assert(Left);
+      if (Left->TokenText == "_")
+        return false;
+    }
     return true;
+  }
   if (Right.is(TT_OverloadedOperatorLParen) &&
       Style.SpaceBeforeParensOptions.AfterOverloadedOperator) {
     return true;

@mydeveloperday can you also chime in?

@mydeveloperday
Copy link
Contributor

Just because people use BasedOnStyle: GNU doesn't mean they are a gnu project it just means that's the closest

Any change like this should have a option just to prevent us from breaking someone somewhere

It can be on by default in gnu but should be Leave if we don't currently do anything

@simark
Copy link

simark commented Oct 1, 2025

Here's an idea for an oddly-specific setting, but less oddly-specific than the proposed one: a setting that is a list of identifiers that are exceptions to the "space before parenthesis" rule, which we'd use like:

SpaceBeforeIdentifierExceptions:
  - "_"

This has the advantage that _ is not hardcoded in clang-format. If we want to omit the space for another identifier, we can. For instance, the gettext doc also suggests N_ as an alternative, so some people may want to handle it the same way. Finally, by being configurable, there is some chance that this will be useful to someone else out there.

@owenca
Copy link
Contributor

owenca commented Oct 5, 2025

I would add SBPO_GNU:

SpaceBeforeParens:
  GNU # same as Always except when preceded by an underscore as the function name

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang Clang issues not falling into any other category clang-format
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants