Skip to content

Conversation

@HerrCai0907
Copy link
Contributor

When a definition is being located, standard library function recognition is skipped if the function has a definition with body in the main file

…fintion with body in main file.

When a definition is being located, standard library function recognition is skipped if the function has a definition with body in the main file
@llvmbot
Copy link
Member

llvmbot commented Jun 17, 2024

@llvm/pr-subscribers-clang-tools-extra

Author: Congcong Cai (HerrCai0907)

Changes

When a definition is being located, standard library function recognition is skipped if the function has a definition with body in the main file


Full diff: https://github.com/llvm/llvm-project/pull/95797.diff

2 Files Affected:

  • (modified) clang-tools-extra/include-cleaner/lib/LocateSymbol.cpp (+12-7)
  • (modified) clang-tools-extra/include-cleaner/unittests/FindHeadersTest.cpp (+3-6)
diff --git a/clang-tools-extra/include-cleaner/lib/LocateSymbol.cpp b/clang-tools-extra/include-cleaner/lib/LocateSymbol.cpp
index 9148d36a5038f..2ddc807871af1 100644
--- a/clang-tools-extra/include-cleaner/lib/LocateSymbol.cpp
+++ b/clang-tools-extra/include-cleaner/lib/LocateSymbol.cpp
@@ -13,6 +13,7 @@
 #include "clang/AST/DeclCXX.h"
 #include "clang/AST/DeclTemplate.h"
 #include "clang/Tooling/Inclusions/StandardLibrary.h"
+#include "llvm/ADT/STLExtras.h"
 #include "llvm/Support/Casting.h"
 #include "llvm/Support/raw_ostream.h"
 #include <utility>
@@ -39,13 +40,16 @@ Hints declHints(const Decl *D) {
 }
 
 std::vector<Hinted<SymbolLocation>> locateDecl(const Decl &D) {
-  std::vector<Hinted<SymbolLocation>> Result;
-  // FIXME: Should we also provide physical locations?
-  if (auto SS = tooling::stdlib::Recognizer()(&D)) {
-    Result.push_back({*SS, Hints::CompleteSymbol});
-    if (!D.hasBody())
-      return Result;
-  }
+  SourceManager &SM = D.getASTContext().getSourceManager();
+  bool HasBodyInMainFile = llvm::any_of(D.redecls(), [&](Decl *Redecl) {
+    return Redecl->hasBody() && SM.isInMainFile(Redecl->getLocation());
+  });
+  // if decl has body and in main file, we don't need to do further search.
+  if (!HasBodyInMainFile)
+    // FIXME: Should we also provide physical locations?
+    if (auto SS = tooling::stdlib::Recognizer()(&D))
+      return {{*SS, Hints::CompleteSymbol}};
+
   // FIXME: Signal foreign decls, e.g. a forward declaration not owned by a
   // library. Some useful signals could be derived by checking the DeclContext.
   // Most incidental forward decls look like:
@@ -53,6 +57,7 @@ std::vector<Hinted<SymbolLocation>> locateDecl(const Decl &D) {
   //   class SourceManager; // likely an incidental forward decl.
   //   namespace my_own_ns {}
   //   }
+  std::vector<Hinted<SymbolLocation>> Result;
   for (auto *Redecl : D.redecls())
     Result.push_back({Redecl->getLocation(), declHints(Redecl)});
   return Result;
diff --git a/clang-tools-extra/include-cleaner/unittests/FindHeadersTest.cpp b/clang-tools-extra/include-cleaner/unittests/FindHeadersTest.cpp
index fdcbf25fd628c..f1eadd113dead 100644
--- a/clang-tools-extra/include-cleaner/unittests/FindHeadersTest.cpp
+++ b/clang-tools-extra/include-cleaner/unittests/FindHeadersTest.cpp
@@ -631,12 +631,9 @@ TEST_F(HeadersForSymbolTest, StandardHeaders) {
 TEST_F(HeadersForSymbolTest, NonStandardHeaders) {
   Inputs.Code = "void assert() {}";
   buildAST();
-  EXPECT_THAT(
-      headersFor("assert"),
-      // Respect the ordering from the stdlib mapping.
-      UnorderedElementsAre(physicalHeader("input.mm"),
-                           tooling::stdlib::Header::named("<cassert>"),
-                           tooling::stdlib::Header::named("<assert.h>")));
+  EXPECT_THAT(headersFor("assert"),
+              // Respect the ordering from the stdlib mapping.
+              UnorderedElementsAre(physicalHeader("input.mm")));
 }
 
 TEST_F(HeadersForSymbolTest, ExporterNoNameMatch) {

HerrCai0907 referenced this pull request Jun 17, 2024
… as standard library functions in misc-include-cleaner (#94923)

Fixes: #93335
For decl with body, we should provide physical locations also. Because
it may be the function which have the same name as std library.
Copy link
Member

@kadircet kadircet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hi @HerrCai0907 !

Left some detailed comments in the patch. Due to timezone differences, I highly suspect this will be fixed quickly and we have some extra delays for getting changes into releases (and I am going to be OOO next week).
Hence while you're addressing the issues here I am going to revert the original commit, to make sure include-cleaner doesn't stay with this regression for a longer period.

Comment on lines +43 to +48
SourceManager &SM = D.getASTContext().getSourceManager();
bool HasBodyInMainFile = llvm::any_of(D.redecls(), [&](Decl *Redecl) {
return Redecl->hasBody() && SM.isInMainFile(Redecl->getLocation());
});
// if decl has body and in main file, we don't need to do further search.
if (!HasBodyInMainFile)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i don't think it's actually "right" layering-wise to perform any sourcelocation <-> fileid mappings here. it's subtle and expensive, especially for every-redecl of stdlib symbols, which might be forward declared many times in STL (but also for any other symbol).

Hence we actually have logic down the line that maps sourcelocations to files (it's also expensive today, but is in a single place and takes care of edge cases, like macros). at least this layering ensures that if we want to introduce caching/optimizations one day, we can rely on having certain separation between different kinds of mappings.

Moreover, I don't follow the check around a redeclaration having body only in the main file. I feel like you're focusing on a very specific issue you have and not solving the issue in general. e.g. what if the user only has forward declaration of the symbol in the main file? why don't we want to suppress in that case? or what if the user has body/declaration in a different header file, which also seems pretty common.

I think if you want to improve this case, you really need to address // FIXME: Should we also provide physical locations?, which is definitely doable, we just need to make sure ranking is still done accordingly.

I am still strongly arguing that we should never insert a custom header if there's a name-collision with a stdlib symbol. Hence we should make sure all physical locations for a stdlib symbol is downranked, i'd recommend stripping completesymbol signal from these symbols.

You also need to consider the macro-side of this. you're addressing this situation for decls, but the user might as well have #define assert(X) X in their main file (or some of the other headers). we need to make sure this same logic also works for that.

Copy link
Contributor Author

@HerrCai0907 HerrCai0907 Jun 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree. This patch is a temporary fix to avoid breaking it for a long time. I think avoiding to include decl which already in the same file is always fine.

And I agree we need to consider more about how to handle this case elegantly because std library use more and more short name function which make it harder to avoid to use the same name.

I think more about this issue today. In my opinion, the basic requirements should be:

  1. If any decl (with or without body) is defined in user written header, we should use user written header
  2. If any decl is match stdlib, we should use stdlib
  3. the other we just use what we can find.

So the priority should be: decl in user written file > stdlib recognition > other found.

WDYT? If you also agree this high level feature request, then we can think about implement more detail.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

glad to hear that we're aligned here!

decl in user written file > stdlib recognition > other found. makes sense to me in theory, but i don't think it's implementable in practice. moreover i don't understand what the 3rd category around "other" means.

telling apart a "physical" stdlib header (e.g. <__utility/pair.h>) from a user header (e.g. <my_stdlib/pair.h>) isn't possible. hence when something goes wrong, we'll always "insert" the wrong header (implementation detail, rather than public header for a stdlib symbol).

Hence I'd suggest going with:

  1. "virtual" stdlib header (the one provided by clang::tooling::stdlib::Header)
  2. any other physical location, providing the decl

This means we might do the wrong thing when the user has their own stdlib implementation (mixing your own symbols with standard library is a recipe for disaster, i don't think we need to care about that state). But we'll at least stop complaining once they include the "right" header. Moreover if need be we can improve handling for this case by turning of virtual stdlib support completely, when we detect we're working in an environment with a custom stdlib implementaiton.

WDYT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

telling apart a "physical" stdlib header (e.g. <__utility/pair.h>) from a user header (e.g. <my_stdlib/pair.h>) isn't possible. hence when something goes wrong, we'll always "insert" the wrong header (implementation detail, rather than public header for a stdlib symbol).

I aggree it is not possible to identifier the user-defined stdlib with stdlib. But it is possible to identifier user-defined function with stdlib function which may have totally difference meanings. e.g. log to print something and log in <cmath>

We may distinguish them by quota include or angle brackets? Since normally c++ developer will only use #include <> for system header file but include "" for the other file.

@HerrCai0907 HerrCai0907 deleted the fix/94923 branch November 18, 2024 09:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants