Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
22 changes: 20 additions & 2 deletions clang/lib/Format/MatchFilePath.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -49,11 +49,29 @@ bool matchFilePath(StringRef Pattern, StringRef FilePath) {
return false;
break;
case '*': {
while (++I < EOP && Pattern[I] == '*') { // Skip consecutive stars.
if (I + 1 < EOP && Pattern[I + 1] == '*') {
// Handle '**' pattern
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Handle '**' pattern
// Handle '**' pattern.

Always end in full stop.

while (++I < EOP && Pattern[I] == '*') { // Skip consecutive stars.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds like a find_if.

}
if (I == EOP)
return true; // '**' at the end matches everything
if (Pattern[I] == Separator) {
// Try to match the rest of the pattern without consuming the
// separator for the case where we want to match "zero" directories
// e.g. "a/**/b" matches "a/b"
if (matchFilePath(Pattern.substr(I + 1), FilePath.substr(J)))
return true;
}
while (J < End) {
if (matchFilePath(Pattern.substr(I), FilePath.substr(J)))
return true;
++J;
}
return false;
}
const auto K = FilePath.find(Separator, J); // Index of next `Separator`.
const bool NoMoreSeparatorsInFilePath = K == StringRef::npos;
if (I == EOP) // `Pattern` ends with a star.
if (++I == EOP) // `Pattern` ends with a star.
return NoMoreSeparatorsInFilePath;
// `Pattern` ends with a lone backslash.
if (Pattern[I] == '\\' && ++I == EOP)
Expand Down
34 changes: 34 additions & 0 deletions clang/unittests/Format/MatchFilePathTest.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -164,6 +164,40 @@ TEST_F(MatchFilePathTest, Path) {
EXPECT_FALSE(match("foo\\", R"(foo*\)"));
}

TEST_F(MatchFilePathTest, DoubleAsterisk) {
EXPECT_TRUE(match("a/b/c/d.cpp", "**b**"));
EXPECT_TRUE(match("a/b/c/d.cpp", "**/b/**"));
EXPECT_TRUE(match("a/b/c/d_e.cpp", "**d_*"));
EXPECT_TRUE(match("a/b/c/d_e.cpp", "**/d_*"));
EXPECT_TRUE(match("a/b/c/d_e.cpp", "**d_**"));
EXPECT_TRUE(match("a/b/c/d_e.cpp", "**/d_**"));

EXPECT_TRUE(match("a/b/c/d_e.cpp", "**/b/c/**"));

EXPECT_TRUE(match("a/b/c/d_e.cpp", "a/**/b/c/d_e.cpp"));
EXPECT_TRUE(match("a/b/c/d_e.cpp", "a/**/c/d_e.cpp"));
EXPECT_TRUE(match("a/b/c/d_e.cpp", "a/**/b/**/d_e.cpp"));
EXPECT_TRUE(match("a/b/c/d_e.cpp", "**/b/**/d_e.cpp"));
EXPECT_TRUE(match("a/b/c/d_e.cpp", "a/**/**/b/**"));

EXPECT_FALSE(match("a/b/c/d_e.cpp", "**/d"));
EXPECT_FALSE(match("a/b/c/d_e.cpp", "**/b/d"));
EXPECT_FALSE(match("a/b/c/d_e.cpp", "**/b/d/**"));
EXPECT_FALSE(match("a/b/c/d_e.cpp", "**/b/c/"));

// Multiple consecutive asterisks are treated as **
EXPECT_TRUE(match("a/b/c/d.cpp", "***b****"));
EXPECT_TRUE(match("a/b/c/d.cpp", "****/b/***"));
EXPECT_TRUE(match("a/b/c/d_e.cpp", "***d_**"));
EXPECT_TRUE(match("a/b/c/d_e.cpp", "****/d_*"));
EXPECT_TRUE(match("a/b/c/d_e.cpp", "***/b/c/*****"));

EXPECT_FALSE(match("a/b/c/d_e.cpp", "*****/d"));
EXPECT_FALSE(match("a/b/c/d_e.cpp", "***/b/d"));
EXPECT_FALSE(match("a/b/c/d_e.cpp", "*****/b/d/***"));
EXPECT_FALSE(match("a/b/c/d_e.cpp", "***/b/c"));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should there be some tests with repetition?

project/tests/lib/tests/module/test_abc.cpp

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we are doing this an saying "Its like .gitignore" should we support

/*
!/foo/bar

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should there be some tests with repetition?

I'm not sure I understand what you are asking.
Can you provide examples of the path and the ignore pattern that you expect to be problematic?

if we are doing this an saying "Its like .gitignore" should we support

I am only indicating that the ** glob pattern is implemented in .gitignore, and we follow the same rules they do for this specific pattern.

I do not think the current implementation could easily support theese glob patterns the same way .gitignore does.
My understanding is that there is some "state" where a lower level ! negates an earlier match/ignore?
I think the current implementation only considers one pattern in the file at a time, in isolation of all other lines in the .clang-format-ignore file.

@owenca may understand better and have an opinion.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we are doing this an saying "Its like .gitignore" should we support

I am only indicating that the ** glob pattern is implemented in .gitignore, and we follow the same rules they do for this specific pattern.

When I added .clang-format-ignore, I chose a POSIX glob-like model and intentionally stayed clear of the .gitignore model, which is, IMO, confusing and overcomplicated. I prefer to not add ** now to partially match .gitignore unless there's a good reason and strong demand for it. There's also a possibility of regression as ** is currently matched the same way as a single *.

I think the current implementation only considers one pattern in the file at a time, in isolation of all other lines in the .clang-format-ignore file.

Yes, the current behavior is to check the patterns in the .clang-format-ignore file one pattern at a time until a match is found.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unless there's a good reason

This came up as I have been working on adding support for clang-format within AMD's PAL repository and found clang-format-ignore to be extremely limiting.

There are 2 key reasons I am adding this feature:

  • Simplifying Subdirectory Exclusion: ** allows for easy exclusion of entire subdirectories. Currently you need to specify each depth level in a subdirectory individually.
  • Pattern-Based File Ignoring: ** simplifies ignoring specific file patterns that may appear at any level within a directory tree, which is especially useful when the exact depth of these files can vary

These two cases came up in the PAL repo in the following manner:

  • There are various imported libraries added throughout the codebase. We would need to ignore all of the imported subdirs in their entirety
  • Various files exist throughout the project that should always be excluded. Generated files, shader files, etc.

The current way to ignore these kinds of patterns is something like this:

imported/*
imported/*/*
imported/*/*/*
imported/*/*/*/*
imported/*/*/*/*/*
imported/*/*/*/*/*/*
...

src/*/*/ignore_me.cpp
src/*/*/*/ignore_me.cpp
src/*/*/*/*/ignore_me.cpp
src/*/*/*/*/*/ignore_me.cpp
....

Which can be simplified into

imported/**
src/**/ignore_me.cpp

and strong demand for it.

How do we gauge how much of a demand there is for this feature?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unless there's a good reason

This came up as I have been working on adding support for clang-format within AMD's PAL repository and found clang-format-ignore to be extremely limiting.

How is it extremely limited? Just because it doesn't support ** in the middle of the pathnames? How did you manage before clang-format gained the .clang-format-ignore functionality?

There are 2 key reasons I am adding this feature:

  • Simplifying Subdirectory Exclusion: ** allows for easy exclusion of entire subdirectories. Currently you need to specify each depth level in a subdirectory individually.

We have DisableFormat for that.

  • Pattern-Based File Ignoring: ** simplifies ignoring specific file patterns that may appear at any level within a directory tree, which is especially useful when the exact depth of these files can vary

I was conscious of this use case, but you can do something like find . -name dont-format-me >> .clang-format-ignore even though it's less than ideal.

and strong demand for it.

How do we gauge how much of a demand there is for this feature?

See #52975 (comment).

If we were to add ** to match gitignore, we should also handle it at the beginning and the end of a pathname.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How did you manage before clang-format gained the .clang-format-ignore functionality?

PAL has not used clang-format at all, my contributions have been an effort to support the PAL formatting style so we can start to use the tool.

  • Simplifying Subdirectory Exclusion: ** allows for easy exclusion of entire subdirectories. Currently you need to specify each depth level in a subdirectory individually.

We have DisableFormat for that.

I was not aware of DisableFormat, it's not as intuitive as having everything in the ignore file but it can be a compromise if you're completely opposed to adding the ** change.

  • Pattern-Based File Ignoring: ** simplifies ignoring specific file patterns that may appear at any level within a directory tree, which is especially useful when the exact depth of these files can vary

I was conscious of this use case, but you can do something like find . -name dont-format-me >> .clang-format-ignore even though it's less than ideal.

Like you said, this is not ideal as it relies on anyone adding/moving files that may need to be ignored to regenerate the clang-format-ignore file.

** could handle these cases without a need for manual intervention.

If we were to add ** to match gitignore, we should also handle it at the beginning and the end of a pathname.

Are you seeing a gap in the unit tests I have? I think I have these cases covered but can add more specific unit tests if you have an example.

}

} // namespace
} // namespace format
} // namespace clang
Loading