-
Notifications
You must be signed in to change notification settings - Fork 15.2k
[clang-format] clang-format-ignore: Add support for double asterisk patterns #110560
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@llvm/pr-subscribers-clang-format Author: Ameer J (ameerj) ChangesAdds closes #110160 Full diff: https://github.com/llvm/llvm-project/pull/110560.diff 2 Files Affected:
diff --git a/clang/lib/Format/MatchFilePath.cpp b/clang/lib/Format/MatchFilePath.cpp
index 062b334dcdd8fd..3d7861499b9a46 100644
--- a/clang/lib/Format/MatchFilePath.cpp
+++ b/clang/lib/Format/MatchFilePath.cpp
@@ -49,11 +49,29 @@ bool matchFilePath(StringRef Pattern, StringRef FilePath) {
return false;
break;
case '*': {
- while (++I < EOP && Pattern[I] == '*') { // Skip consecutive stars.
+ if (I + 1 < EOP && Pattern[I + 1] == '*') {
+ // Handle '**' pattern
+ while (++I < EOP && Pattern[I] == '*') { // Skip consecutive stars.
+ }
+ if (I == EOP)
+ return true; // '**' at the end matches everything
+ if (Pattern[I] == Separator) {
+ // Try to match the rest of the pattern without consuming the
+ // separator for the case where we want to match "zero" directories
+ // e.g. "a/**/b" matches "a/b"
+ if (matchFilePath(Pattern.substr(I + 1), FilePath.substr(J)))
+ return true;
+ }
+ while (J < End) {
+ if (matchFilePath(Pattern.substr(I), FilePath.substr(J)))
+ return true;
+ ++J;
+ }
+ return false;
}
const auto K = FilePath.find(Separator, J); // Index of next `Separator`.
const bool NoMoreSeparatorsInFilePath = K == StringRef::npos;
- if (I == EOP) // `Pattern` ends with a star.
+ if (++I == EOP) // `Pattern` ends with a star.
return NoMoreSeparatorsInFilePath;
// `Pattern` ends with a lone backslash.
if (Pattern[I] == '\\' && ++I == EOP)
diff --git a/clang/unittests/Format/MatchFilePathTest.cpp b/clang/unittests/Format/MatchFilePathTest.cpp
index 28f665635718e5..a6df090a802128 100644
--- a/clang/unittests/Format/MatchFilePathTest.cpp
+++ b/clang/unittests/Format/MatchFilePathTest.cpp
@@ -164,6 +164,40 @@ TEST_F(MatchFilePathTest, Path) {
EXPECT_FALSE(match("foo\\", R"(foo*\)"));
}
+TEST_F(MatchFilePathTest, DoubleAsterisk) {
+ EXPECT_TRUE(match("a/b/c/d.cpp", "**b**"));
+ EXPECT_TRUE(match("a/b/c/d.cpp", "**/b/**"));
+ EXPECT_TRUE(match("a/b/c/d_e.cpp", "**d_*"));
+ EXPECT_TRUE(match("a/b/c/d_e.cpp", "**/d_*"));
+ EXPECT_TRUE(match("a/b/c/d_e.cpp", "**d_**"));
+ EXPECT_TRUE(match("a/b/c/d_e.cpp", "**/d_**"));
+
+ EXPECT_TRUE(match("a/b/c/d_e.cpp", "**/b/c/**"));
+
+ EXPECT_TRUE(match("a/b/c/d_e.cpp", "a/**/b/c/d_e.cpp"));
+ EXPECT_TRUE(match("a/b/c/d_e.cpp", "a/**/c/d_e.cpp"));
+ EXPECT_TRUE(match("a/b/c/d_e.cpp", "a/**/b/**/d_e.cpp"));
+ EXPECT_TRUE(match("a/b/c/d_e.cpp", "**/b/**/d_e.cpp"));
+ EXPECT_TRUE(match("a/b/c/d_e.cpp", "a/**/**/b/**"));
+
+ EXPECT_FALSE(match("a/b/c/d_e.cpp", "**/d"));
+ EXPECT_FALSE(match("a/b/c/d_e.cpp", "**/b/d"));
+ EXPECT_FALSE(match("a/b/c/d_e.cpp", "**/b/d/**"));
+ EXPECT_FALSE(match("a/b/c/d_e.cpp", "**/b/c/"));
+
+ // Multiple consecutive asterisks are treated as **
+ EXPECT_TRUE(match("a/b/c/d.cpp", "***b****"));
+ EXPECT_TRUE(match("a/b/c/d.cpp", "****/b/***"));
+ EXPECT_TRUE(match("a/b/c/d_e.cpp", "***d_**"));
+ EXPECT_TRUE(match("a/b/c/d_e.cpp", "****/d_*"));
+ EXPECT_TRUE(match("a/b/c/d_e.cpp", "***/b/c/*****"));
+
+ EXPECT_FALSE(match("a/b/c/d_e.cpp", "*****/d"));
+ EXPECT_FALSE(match("a/b/c/d_e.cpp", "***/b/d"));
+ EXPECT_FALSE(match("a/b/c/d_e.cpp", "*****/b/d/***"));
+ EXPECT_FALSE(match("a/b/c/d_e.cpp", "***/b/c"));
+}
+
} // namespace
} // namespace format
} // namespace clang
|
| EXPECT_FALSE(match("a/b/c/d_e.cpp", "*****/d")); | ||
| EXPECT_FALSE(match("a/b/c/d_e.cpp", "***/b/d")); | ||
| EXPECT_FALSE(match("a/b/c/d_e.cpp", "*****/b/d/***")); | ||
| EXPECT_FALSE(match("a/b/c/d_e.cpp", "***/b/c")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should there be some tests with repetition?
project/tests/lib/tests/module/test_abc.cpp
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we are doing this an saying "Its like .gitignore" should we support
/*
!/foo/bar
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should there be some tests with repetition?
I'm not sure I understand what you are asking.
Can you provide examples of the path and the ignore pattern that you expect to be problematic?
if we are doing this an saying "Its like .gitignore" should we support
I am only indicating that the ** glob pattern is implemented in .gitignore, and we follow the same rules they do for this specific pattern.
I do not think the current implementation could easily support theese glob patterns the same way .gitignore does.
My understanding is that there is some "state" where a lower level ! negates an earlier match/ignore?
I think the current implementation only considers one pattern in the file at a time, in isolation of all other lines in the .clang-format-ignore file.
@owenca may understand better and have an opinion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we are doing this an saying "Its like .gitignore" should we support
I am only indicating that the
**glob pattern is implemented in .gitignore, and we follow the same rules they do for this specific pattern.
When I added .clang-format-ignore, I chose a POSIX glob-like model and intentionally stayed clear of the .gitignore model, which is, IMO, confusing and overcomplicated. I prefer to not add ** now to partially match .gitignore unless there's a good reason and strong demand for it. There's also a possibility of regression as ** is currently matched the same way as a single *.
I think the current implementation only considers one pattern in the file at a time, in isolation of all other lines in the .clang-format-ignore file.
Yes, the current behavior is to check the patterns in the .clang-format-ignore file one pattern at a time until a match is found.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unless there's a good reason
This came up as I have been working on adding support for clang-format within AMD's PAL repository and found clang-format-ignore to be extremely limiting.
There are 2 key reasons I am adding this feature:
- Simplifying Subdirectory Exclusion:
**allows for easy exclusion of entire subdirectories. Currently you need to specify each depth level in a subdirectory individually. - Pattern-Based File Ignoring:
**simplifies ignoring specific file patterns that may appear at any level within a directory tree, which is especially useful when the exact depth of these files can vary
These two cases came up in the PAL repo in the following manner:
- There are various
importedlibraries added throughout the codebase. We would need to ignore all of the imported subdirs in their entirety - Various files exist throughout the project that should always be excluded. Generated files, shader files, etc.
The current way to ignore these kinds of patterns is something like this:
imported/*
imported/*/*
imported/*/*/*
imported/*/*/*/*
imported/*/*/*/*/*
imported/*/*/*/*/*/*
...
src/*/*/ignore_me.cpp
src/*/*/*/ignore_me.cpp
src/*/*/*/*/ignore_me.cpp
src/*/*/*/*/*/ignore_me.cpp
....
Which can be simplified into
imported/**
src/**/ignore_me.cpp
and strong demand for it.
How do we gauge how much of a demand there is for this feature?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unless there's a good reason
This came up as I have been working on adding support for clang-format within AMD's PAL repository and found clang-format-ignore to be extremely limiting.
How is it extremely limited? Just because it doesn't support ** in the middle of the pathnames? How did you manage before clang-format gained the .clang-format-ignore functionality?
There are 2 key reasons I am adding this feature:
- Simplifying Subdirectory Exclusion:
**allows for easy exclusion of entire subdirectories. Currently you need to specify each depth level in a subdirectory individually.
We have DisableFormat for that.
- Pattern-Based File Ignoring:
**simplifies ignoring specific file patterns that may appear at any level within a directory tree, which is especially useful when the exact depth of these files can vary
I was conscious of this use case, but you can do something like find . -name dont-format-me >> .clang-format-ignore even though it's less than ideal.
and strong demand for it.
How do we gauge how much of a demand there is for this feature?
See #52975 (comment).
If we were to add ** to match gitignore, we should also handle it at the beginning and the end of a pathname.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How did you manage before clang-format gained the .clang-format-ignore functionality?
PAL has not used clang-format at all, my contributions have been an effort to support the PAL formatting style so we can start to use the tool.
- Simplifying Subdirectory Exclusion:
**allows for easy exclusion of entire subdirectories. Currently you need to specify each depth level in a subdirectory individually.We have
DisableFormatfor that.
I was not aware of DisableFormat, it's not as intuitive as having everything in the ignore file but it can be a compromise if you're completely opposed to adding the ** change.
- Pattern-Based File Ignoring:
**simplifies ignoring specific file patterns that may appear at any level within a directory tree, which is especially useful when the exact depth of these files can varyI was conscious of this use case, but you can do something like
find . -name dont-format-me >> .clang-format-ignoreeven though it's less than ideal.
Like you said, this is not ideal as it relies on anyone adding/moving files that may need to be ignored to regenerate the clang-format-ignore file.
** could handle these cases without a need for manual intervention.
If we were to add
**to match gitignore, we should also handle it at the beginning and the end of a pathname.
Are you seeing a gap in the unit tests I have? I think I have these cases covered but can add more specific unit tests if you have an example.
|
@mydeveloperday @owenca |
|
@HazardyKnusperkeks Can I get your eyes on this PR please? |
| case '*': { | ||
| while (++I < EOP && Pattern[I] == '*') { // Skip consecutive stars. | ||
| if (I + 1 < EOP && Pattern[I + 1] == '*') { | ||
| // Handle '**' pattern |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| // Handle '**' pattern | |
| // Handle '**' pattern. |
Always end in full stop.
| while (++I < EOP && Pattern[I] == '*') { // Skip consecutive stars. | ||
| if (I + 1 < EOP && Pattern[I + 1] == '*') { | ||
| // Handle '**' pattern | ||
| while (++I < EOP && Pattern[I] == '*') { // Skip consecutive stars. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds like a find_if.
|
I want to express support for this feature. Just today I attempted to integrate .clang-format-ignore and was really confused by fact that it cannot ignore recursively. Project I wanted to integrate it to has a lot of generated files which is all over the place. It's very inconvenient have to use that pattern for every directory I want to ignore:
Just having single .clang-format-ignore file with |
|
See #121404. |
|
Closing in favor of #121404 |
Adds
**support for.clang-format-ignoreto support similar pattern matching to .gitignorecloses #110160