-
Notifications
You must be signed in to change notification settings - Fork 15.4k
[clang][include-cleaner]skip stdlib recogntion only when there are defintion with body in main file. #95797
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…fintion with body in main file. When a definition is being located, standard library function recognition is skipped if the function has a definition with body in the main file
|
@llvm/pr-subscribers-clang-tools-extra Author: Congcong Cai (HerrCai0907) ChangesWhen a definition is being located, standard library function recognition is skipped if the function has a definition with body in the main file Full diff: https://github.com/llvm/llvm-project/pull/95797.diff 2 Files Affected:
diff --git a/clang-tools-extra/include-cleaner/lib/LocateSymbol.cpp b/clang-tools-extra/include-cleaner/lib/LocateSymbol.cpp
index 9148d36a5038f..2ddc807871af1 100644
--- a/clang-tools-extra/include-cleaner/lib/LocateSymbol.cpp
+++ b/clang-tools-extra/include-cleaner/lib/LocateSymbol.cpp
@@ -13,6 +13,7 @@
#include "clang/AST/DeclCXX.h"
#include "clang/AST/DeclTemplate.h"
#include "clang/Tooling/Inclusions/StandardLibrary.h"
+#include "llvm/ADT/STLExtras.h"
#include "llvm/Support/Casting.h"
#include "llvm/Support/raw_ostream.h"
#include <utility>
@@ -39,13 +40,16 @@ Hints declHints(const Decl *D) {
}
std::vector<Hinted<SymbolLocation>> locateDecl(const Decl &D) {
- std::vector<Hinted<SymbolLocation>> Result;
- // FIXME: Should we also provide physical locations?
- if (auto SS = tooling::stdlib::Recognizer()(&D)) {
- Result.push_back({*SS, Hints::CompleteSymbol});
- if (!D.hasBody())
- return Result;
- }
+ SourceManager &SM = D.getASTContext().getSourceManager();
+ bool HasBodyInMainFile = llvm::any_of(D.redecls(), [&](Decl *Redecl) {
+ return Redecl->hasBody() && SM.isInMainFile(Redecl->getLocation());
+ });
+ // if decl has body and in main file, we don't need to do further search.
+ if (!HasBodyInMainFile)
+ // FIXME: Should we also provide physical locations?
+ if (auto SS = tooling::stdlib::Recognizer()(&D))
+ return {{*SS, Hints::CompleteSymbol}};
+
// FIXME: Signal foreign decls, e.g. a forward declaration not owned by a
// library. Some useful signals could be derived by checking the DeclContext.
// Most incidental forward decls look like:
@@ -53,6 +57,7 @@ std::vector<Hinted<SymbolLocation>> locateDecl(const Decl &D) {
// class SourceManager; // likely an incidental forward decl.
// namespace my_own_ns {}
// }
+ std::vector<Hinted<SymbolLocation>> Result;
for (auto *Redecl : D.redecls())
Result.push_back({Redecl->getLocation(), declHints(Redecl)});
return Result;
diff --git a/clang-tools-extra/include-cleaner/unittests/FindHeadersTest.cpp b/clang-tools-extra/include-cleaner/unittests/FindHeadersTest.cpp
index fdcbf25fd628c..f1eadd113dead 100644
--- a/clang-tools-extra/include-cleaner/unittests/FindHeadersTest.cpp
+++ b/clang-tools-extra/include-cleaner/unittests/FindHeadersTest.cpp
@@ -631,12 +631,9 @@ TEST_F(HeadersForSymbolTest, StandardHeaders) {
TEST_F(HeadersForSymbolTest, NonStandardHeaders) {
Inputs.Code = "void assert() {}";
buildAST();
- EXPECT_THAT(
- headersFor("assert"),
- // Respect the ordering from the stdlib mapping.
- UnorderedElementsAre(physicalHeader("input.mm"),
- tooling::stdlib::Header::named("<cassert>"),
- tooling::stdlib::Header::named("<assert.h>")));
+ EXPECT_THAT(headersFor("assert"),
+ // Respect the ordering from the stdlib mapping.
+ UnorderedElementsAre(physicalHeader("input.mm")));
}
TEST_F(HeadersForSymbolTest, ExporterNoNameMatch) {
|
kadircet
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hi @HerrCai0907 !
Left some detailed comments in the patch. Due to timezone differences, I highly suspect this will be fixed quickly and we have some extra delays for getting changes into releases (and I am going to be OOO next week).
Hence while you're addressing the issues here I am going to revert the original commit, to make sure include-cleaner doesn't stay with this regression for a longer period.
| SourceManager &SM = D.getASTContext().getSourceManager(); | ||
| bool HasBodyInMainFile = llvm::any_of(D.redecls(), [&](Decl *Redecl) { | ||
| return Redecl->hasBody() && SM.isInMainFile(Redecl->getLocation()); | ||
| }); | ||
| // if decl has body and in main file, we don't need to do further search. | ||
| if (!HasBodyInMainFile) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i don't think it's actually "right" layering-wise to perform any sourcelocation <-> fileid mappings here. it's subtle and expensive, especially for every-redecl of stdlib symbols, which might be forward declared many times in STL (but also for any other symbol).
Hence we actually have logic down the line that maps sourcelocations to files (it's also expensive today, but is in a single place and takes care of edge cases, like macros). at least this layering ensures that if we want to introduce caching/optimizations one day, we can rely on having certain separation between different kinds of mappings.
Moreover, I don't follow the check around a redeclaration having body only in the main file. I feel like you're focusing on a very specific issue you have and not solving the issue in general. e.g. what if the user only has forward declaration of the symbol in the main file? why don't we want to suppress in that case? or what if the user has body/declaration in a different header file, which also seems pretty common.
I think if you want to improve this case, you really need to address // FIXME: Should we also provide physical locations?, which is definitely doable, we just need to make sure ranking is still done accordingly.
I am still strongly arguing that we should never insert a custom header if there's a name-collision with a stdlib symbol. Hence we should make sure all physical locations for a stdlib symbol is downranked, i'd recommend stripping completesymbol signal from these symbols.
You also need to consider the macro-side of this. you're addressing this situation for decls, but the user might as well have #define assert(X) X in their main file (or some of the other headers). we need to make sure this same logic also works for that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree. This patch is a temporary fix to avoid breaking it for a long time. I think avoiding to include decl which already in the same file is always fine.
And I agree we need to consider more about how to handle this case elegantly because std library use more and more short name function which make it harder to avoid to use the same name.
I think more about this issue today. In my opinion, the basic requirements should be:
- If any decl (with or without body) is defined in user written header, we should use user written header
- If any decl is match stdlib, we should use stdlib
- the other we just use what we can find.
So the priority should be: decl in user written file > stdlib recognition > other found.
WDYT? If you also agree this high level feature request, then we can think about implement more detail.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
glad to hear that we're aligned here!
decl in user written file > stdlib recognition > other found. makes sense to me in theory, but i don't think it's implementable in practice. moreover i don't understand what the 3rd category around "other" means.
telling apart a "physical" stdlib header (e.g. <__utility/pair.h>) from a user header (e.g. <my_stdlib/pair.h>) isn't possible. hence when something goes wrong, we'll always "insert" the wrong header (implementation detail, rather than public header for a stdlib symbol).
Hence I'd suggest going with:
- "virtual" stdlib header (the one provided by
clang::tooling::stdlib::Header) - any other physical location, providing the decl
This means we might do the wrong thing when the user has their own stdlib implementation (mixing your own symbols with standard library is a recipe for disaster, i don't think we need to care about that state). But we'll at least stop complaining once they include the "right" header. Moreover if need be we can improve handling for this case by turning of virtual stdlib support completely, when we detect we're working in an environment with a custom stdlib implementaiton.
WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
telling apart a "physical" stdlib header (e.g. <__utility/pair.h>) from a user header (e.g. <my_stdlib/pair.h>) isn't possible. hence when something goes wrong, we'll always "insert" the wrong header (implementation detail, rather than public header for a stdlib symbol).
I aggree it is not possible to identifier the user-defined stdlib with stdlib. But it is possible to identifier user-defined function with stdlib function which may have totally difference meanings. e.g. log to print something and log in <cmath>
We may distinguish them by quota include or angle brackets? Since normally c++ developer will only use #include <> for system header file but include "" for the other file.
When a definition is being located, standard library function recognition is skipped if the function has a definition with body in the main file