-
Notifications
You must be signed in to change notification settings - Fork 15.2k
[clang-format] Add SpaceBeforeUnderscoreParens option for GNU gettext… #159925
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
[clang-format] Add SpaceBeforeUnderscoreParens option for GNU gettext… #159925
Conversation
Thank you for submitting a Pull Request (PR) to the LLVM Project! This PR will be automatically labeled and the relevant teams will be notified. If you wish to, you can add reviewers by using the "Reviewers" section on this page. If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers. If you have further questions, they may be answered by the LLVM GitHub User Guide. You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums. |
@llvm/pr-subscribers-clang @llvm/pr-subscribers-clang-format Author: None (kgerlich) Changes… macro This adds a new configuration option SpaceBeforeUnderscoreParens to control spacing between underscore and opening parenthesis. This is specifically designed for the gettext internationalization macro '_()' commonly used in GNU projects like GDB. The option:
Examples: LLVM style with SpaceBeforeUnderscoreParens=true: This addresses the common pattern in GNU software where gettext messages use '_()' without spaces, improving consistency with GNU coding standards. Full diff: https://github.com/llvm/llvm-project/pull/159925.diff 4 Files Affected:
diff --git a/clang/include/clang/Format/Format.h b/clang/include/clang/Format/Format.h
index 342fefcfc408c..1643c575fcbba 100644
--- a/clang/include/clang/Format/Format.h
+++ b/clang/include/clang/Format/Format.h
@@ -4667,6 +4667,16 @@ struct FormatStyle {
/// \version 7
bool SpaceBeforeInheritanceColon;
+ /// If ``false``, spaces will be removed between underscore and an opening
+ /// parenthesis. This is specifically for the gettext macro ``_()`` commonly
+ /// used in GNU projects.
+ /// \code
+ /// true: false:
+ /// _ (message); vs. _(message);
+ /// \endcode
+ /// \version 19
+ bool SpaceBeforeUnderscoreParens;
+
/// If ``true``, a space will be added before a JSON colon. For other
/// languages, e.g. JavaScript, use ``SpacesInContainerLiterals`` instead.
/// \code
diff --git a/clang/lib/Format/Format.cpp b/clang/lib/Format/Format.cpp
index 24a36d97a6fa9..9181e9a9ecd26 100644
--- a/clang/lib/Format/Format.cpp
+++ b/clang/lib/Format/Format.cpp
@@ -1219,6 +1219,8 @@ template <> struct MappingTraits<FormatStyle> {
Style.SpaceBeforeCtorInitializerColon);
IO.mapOptional("SpaceBeforeInheritanceColon",
Style.SpaceBeforeInheritanceColon);
+ IO.mapOptional("SpaceBeforeUnderscoreParens",
+ Style.SpaceBeforeUnderscoreParens);
IO.mapOptional("SpaceBeforeJsonColon", Style.SpaceBeforeJsonColon);
IO.mapOptional("SpaceBeforeParens", Style.SpaceBeforeParens);
IO.mapOptional("SpaceBeforeParensOptions", Style.SpaceBeforeParensOptions);
@@ -1713,6 +1715,7 @@ FormatStyle getLLVMStyle(FormatStyle::LanguageKind Language) {
LLVMStyle.SpaceBeforeCpp11BracedList = false;
LLVMStyle.SpaceBeforeCtorInitializerColon = true;
LLVMStyle.SpaceBeforeInheritanceColon = true;
+ LLVMStyle.SpaceBeforeUnderscoreParens = true;
LLVMStyle.SpaceBeforeJsonColon = false;
LLVMStyle.SpaceBeforeParens = FormatStyle::SBPO_ControlStatements;
LLVMStyle.SpaceBeforeParensOptions = {};
@@ -2044,6 +2047,7 @@ FormatStyle getGNUStyle() {
Style.FixNamespaceComments = false;
Style.KeepFormFeed = true;
Style.SpaceBeforeParens = FormatStyle::SBPO_Always;
+ Style.SpaceBeforeUnderscoreParens = false;
return Style;
}
diff --git a/clang/lib/Format/TokenAnnotator.cpp b/clang/lib/Format/TokenAnnotator.cpp
index d97f56751ea69..9af90184d157e 100644
--- a/clang/lib/Format/TokenAnnotator.cpp
+++ b/clang/lib/Format/TokenAnnotator.cpp
@@ -4901,6 +4901,12 @@ bool TokenAnnotator::spaceRequiredBetween(const AnnotatedLine &Line,
// Handle builtins like identifiers.
if (Line.Type != LT_PreprocessorDirective &&
(Left.Tok.getIdentifierInfo() || Left.is(tok::r_paren))) {
+ // Check for special case: single underscore token (gettext macro)
+ if (Left.Tok.getIdentifierInfo() && !Style.SpaceBeforeUnderscoreParens) {
+ StringRef TokenText = Left.TokenText;
+ if (TokenText == "_")
+ return false;
+ }
return spaceRequiredBeforeParens(Right);
}
return false;
diff --git a/clang/unittests/Format/FormatTest.cpp b/clang/unittests/Format/FormatTest.cpp
index d9db06667d802..4ca1f8fbfa363 100644
--- a/clang/unittests/Format/FormatTest.cpp
+++ b/clang/unittests/Format/FormatTest.cpp
@@ -17852,6 +17852,42 @@ TEST_F(FormatTest, ConfigurableSpacesInParens) {
Spaces);
}
+TEST_F(FormatTest, SpaceBeforeUnderscoreParens) {
+ // Test with SpaceBeforeParens = Always to clearly show the difference
+ FormatStyle Style = getLLVMStyle();
+ Style.SpaceBeforeParens = FormatStyle::SBPO_Always;
+
+ // Default LLVM style should have SpaceBeforeUnderscoreParens = true
+ EXPECT_TRUE(Style.SpaceBeforeUnderscoreParens);
+ verifyFormat("func (arg);", Style); // All functions should have space
+ verifyFormat("my_func (arg);", Style); // All functions should have space
+ verifyFormat("_ (message);", Style); // Single underscore should have space when SpaceBeforeUnderscoreParens=true
+ verifyFormat("underscore_ (param);", Style); // Other underscores should have space
+
+ // Now test with SpaceBeforeUnderscoreParens = false but SpaceBeforeParens = Always
+ Style.SpaceBeforeUnderscoreParens = false;
+ verifyFormat("func (arg);", Style); // Non-underscore functions should still have space
+ verifyFormat("my_func (arg);", Style); // Functions with underscores should still have space
+ verifyFormat("_(message);", Style); // Single underscore (gettext macro) should NOT have space
+ verifyFormat("underscore_ (param);", Style); // Other underscores should still have space
+ verifyFormat("_private_func (data);", Style); // Functions starting with underscore but not single _ should have space
+
+ // Test GNU style (should have SpaceBeforeUnderscoreParens = false by default)
+ FormatStyle GNUStyle = getGNUStyle();
+ EXPECT_FALSE(GNUStyle.SpaceBeforeUnderscoreParens);
+ EXPECT_EQ(GNUStyle.SpaceBeforeParens, FormatStyle::SBPO_Always); // GNU style should have SpaceBeforeParens = Always
+ verifyFormat("func (arg);", GNUStyle); // GNU style has SpaceBeforeParens = Always
+ verifyFormat("my_func (arg);", GNUStyle); // Functions with underscores should have space
+ verifyFormat("_(message);", GNUStyle); // Single underscore (gettext macro) should NOT have space
+ verifyFormat("_private_func (data);", GNUStyle); // Other functions should have space
+
+ // Test mixed scenarios with GNU style
+ verifyFormat("printf (_(\"Hello\"));\n"
+ "func (arg);\n"
+ "_(\"World\");",
+ GNUStyle);
+}
+
TEST_F(FormatTest, ConfigurableSpacesInSquareBrackets) {
verifyFormat("int a[5];");
verifyFormat("a[3] += 42;");
|
I have not had the pleasure to contribute to LLVM before and I'm new to the rules. I work on GDB open source and I would like to get this obstacle solved. Thanks. @owenca @mydeveloperday - Could you please review this clang-format enhancement? This adds support for GNU gettext macro spacing that would benefit projects like GDB. |
✅ With the latest revision this PR passed the C/C++ code formatter. |
0898bd6
to
fc3192b
Compare
296b17a
to
83bcaeb
Compare
clang/include/clang/Format/Format.h
Outdated
/// _ (message); vs. _(message); | ||
/// \endcode | ||
/// \version 19 | ||
bool SpaceBeforeUnderscoreParens; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bool SpaceBeforeUnderscoreParens; | |
bool SpaceBetweenUnderscoreParens; |
A bit more concise?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for reviewing this. renamed.
clang/lib/Format/TokenAnnotator.cpp
Outdated
@@ -4901,6 +4901,12 @@ bool TokenAnnotator::spaceRequiredBetween(const AnnotatedLine &Line, | |||
// Handle builtins like identifiers. | |||
if (Line.Type != LT_PreprocessorDirective && | |||
(Left.Tok.getIdentifierInfo() || Left.is(tok::r_paren))) { | |||
// Check for special case: single underscore token (gettext macro) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// Check for special case: single underscore token (gettext macro) | |
// Check for special case: single underscore token (gettext macro). |
Comments end in full stop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
addressed all the comments.
clang/lib/Format/TokenAnnotator.cpp
Outdated
if (Left.Tok.getIdentifierInfo() && !Style.SpaceBeforeUnderscoreParens) { | ||
StringRef TokenText = Left.TokenText; | ||
if (TokenText == "_") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (Left.Tok.getIdentifierInfo() && !Style.SpaceBeforeUnderscoreParens) { | |
StringRef TokenText = Left.TokenText; | |
if (TokenText == "_") | |
if (Left.Tok.getIdentifierInfo() && !Style.SpaceBeforeUnderscoreParens && Left.TokenText == "_") |
I'd merge that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
merged into one line with wrap.
@@ -17852,6 +17852,59 @@ TEST_F(FormatTest, ConfigurableSpacesInParens) { | |||
Spaces); | |||
} | |||
|
|||
TEST_F(FormatTest, SpaceBeforeUnderscoreParens) { | |||
// Test with SpaceBeforeParens = Always to clearly show the difference |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can drop all of the test comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
all dropped.
83bcaeb
to
2d82763
Compare
In general, we want the new option to meet the requirements. Can you link to the relevant section in GNU coding standards? It seems that this can be supported without a new option. |
2d82763
to
c28a884
Compare
Hi @owenca thanks for providing feedback so quickly. So in general I read from this that in GNU style mode, this could generally avoid separating "_(" and we wouldn not need an option. Could you please explain though what you mean with meet the requirements here? I haven't had any exposure to LLVM policies, just trying to fix some behaviour which was seen over a few years. |
If click the link, you will see the requirements at the end of the section. |
I read up on this a bit: I could not find a good indication of this in the GNU coding docs. However in the gettext() documentation the underscore macro is described. https://www.gnu.org/software/gettext/manual/html_node/Mark-Keywords.html
So this would indicate that in GNU style we could do this w/o adding an option. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do you want it to do without an option? Just never adding a space between _
and (
might not be the thing somebody else wants.
clang/include/clang/Format/Format.h
Outdated
/// _ (message); vs. _(message); | ||
/// \endcode | ||
/// \version 19 | ||
bool SpaceBetweenUnderscoreParens; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please resort.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
resorted, thanks.
GDB would have two out of the three requirements @owenca mentioned to allow for a new option. I'm trying to see if in the GDB coding style docs, the _() macro can be described which would then meet all reqs.
…t macro This adds a new configuration option SpaceBetweenUnderscoreParens to control spacing between underscore and opening parenthesis. This is specifically designed for the gettext internationalization macro '_()' commonly used in GNU projects like GDB. The option: - Defaults to true for LLVM style (preserving existing behavior) - Defaults to false for GNU style (removes space before '_(') - Only affects single underscore tokens '_', not other identifiers - Leaves all other spacing rules unchanged Examples: GNU style with SpaceBetweenUnderscoreParens=false: printf(_("Hello")); // No space before '_(' my_func (arg); // Space before other functions preserved LLVM style with SpaceBetweenUnderscoreParens=true: printf(_ ("Hello")); // Standard spacing rules apply This addresses the common pattern in GNU software where gettext messages use '_()' without spaces, improving consistency with GNU coding standards.
c28a884
to
03b6ccc
Compare
If this were in the GNU coding standards, I might agree to adding some special handling for GNU without adding a new option. Regardless, I'm inclined to oppose an option this niche. @kgerlich you can make clang-format leave
|
Hmm, but that would turn off formatting entirely for those lines? Ideally, we would still want other aspects of formatting to work. This exception to the "space before parenthesis" applies to pretty much all GNU projects, so I don't think it's that niche. For instance: If it really helps, we can try to get the rule into the GNU coding standards (I was thinking on this page). |
I've spoken to one of the GNU maintainers (Bruno Haible) and he tells me that this behaviour for GNU is well documented:
So we have all the three requirements that you stated:
@owenca, would this be enough to convince you to add this option? |
Adding a +1 for a change like this. Echoing Bruno's thoughts that this has been common practice for a long time, it probably just hasn't been reported since most GNU projects, including the ones I commit to, do not use code formatters. |
Style.SpaceBeforeCtorInitializerColon); | ||
IO.mapOptional("SpaceBeforeInheritanceColon", | ||
Style.SpaceBeforeInheritanceColon); | ||
IO.mapOptional("SpaceBetweenUnderscoreParens", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Resort here (and below) also.
Yes, that's what that option is for.
It's niche or too specific because it would only apply to a single underscore as the function name.
It should go in the Formatting section I linked above, and specifically the paragraph below:
This is met by default as GNU is one of the predefined styles clang-format supports.
See above. After it's updated, we can discuss how to support the behavior. I still prefer making --- a/clang/lib/Format/TokenAnnotator.cpp
+++ b/clang/lib/Format/TokenAnnotator.cpp
@@ -4503,8 +4503,15 @@ unsigned TokenAnnotator::splitPenalty(const AnnotatedLine &Line,
}
bool TokenAnnotator::spaceRequiredBeforeParens(const FormatToken &Right) const {
- if (Style.SpaceBeforeParens == FormatStyle::SBPO_Always)
+ if (Style.SpaceBeforeParens == FormatStyle::SBPO_Always) {
+ if (Style.BasedOnStyle.equals_insensitive("gnu")) {
+ const auto *Left = Right.Previous;
+ assert(Left);
+ if (Left->TokenText == "_")
+ return false;
+ }
return true;
+ }
if (Right.is(TT_OverloadedOperatorLParen) &&
Style.SpaceBeforeParensOptions.AfterOverloadedOperator) {
return true; @mydeveloperday can you also chime in? |
Just because people use BasedOnStyle: GNU doesn't mean they are a gnu project it just means that's the closest Any change like this should have a option just to prevent us from breaking someone somewhere It can be on by default in gnu but should be Leave if we don't currently do anything |
Here's an idea for an oddly-specific setting, but less oddly-specific than the proposed one: a setting that is a list of identifiers that are exceptions to the "space before parenthesis" rule, which we'd use like:
This has the advantage that |
I would add
|
… macro
This adds a new configuration option SpaceBeforeUnderscoreParens to control spacing between underscore and opening parenthesis. This is specifically designed for the gettext internationalization macro '_()' commonly used in GNU projects like GDB.
The option:
Examples:
GNU style with SpaceBeforeUnderscoreParens=false:
printf(("Hello")); // No space before '('
my_func (arg); // Space before other functions preserved
LLVM style with SpaceBeforeUnderscoreParens=true:
printf(_("Hello")); // Standard spacing rules apply
This addresses the common pattern in GNU software where gettext messages use '_()' without spaces, improving consistency with GNU coding standards.