-
Notifications
You must be signed in to change notification settings - Fork 15.4k
[clang][ssaf] Introduce entity abstraction for SSAF #169131
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from 19 commits
61f84a4
d1f0e79
8da5617
0859de9
3ad2d3d
6b840ce
7467cf7
5750baf
c25b2bf
325f74d
ffc5104
d488921
df0e696
f7d033b
a7c5aa7
2c8c699
5b514e9
ba89f06
8bde909
4cab5a1
30ac699
97c31ee
140e84f
a6807ee
2fb3d37
db34258
691482c
0bf64a4
e8d8805
8889e79
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,44 @@ | ||
| //===- ASTMapping.h - AST to SSAF Entity mapping ----------------*- C++ -*-===// | ||
| // | ||
| // Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. | ||
| // See https://llvm.org/LICENSE.txt for license information. | ||
| // SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception | ||
| // | ||
| //===----------------------------------------------------------------------===// | ||
|
|
||
| #ifndef LLVM_CLANG_ANALYSIS_SCALABLE_ASTENTITYMAPPING_H | ||
| #define LLVM_CLANG_ANALYSIS_SCALABLE_ASTENTITYMAPPING_H | ||
|
|
||
| #include "clang/Analysis/Scalable/Model/EntityName.h" | ||
| #include "clang/AST/Decl.h" | ||
| #include "llvm/ADT/StringRef.h" | ||
| #include <optional> | ||
|
|
||
| namespace clang::ssaf { | ||
|
|
||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. For both of these lookup functions -- I expect these to be used heavily in analysis, so it would be beneficial if they were shorter names. Currently, the names encode a lot of the (existing) type information. Is that necessary/helpful? Could you instead go with a simpler scheme like Separately: constructing these is typically expensive, so we use a cache. Consider including a cache object in this library as well. I think that will be the correct choice for most use cases.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sounds good, let me update the names. Re:caching - I imagine you are totally right and we will do that. Since we are building everything from scratch, I would prefer to wait with such optimizations until we understand the use cases a little better though. For example - I imagine the cache should probably keep |
||
| /// Maps a declaration to an EntityName. | ||
| /// | ||
| /// Supported declaration types for entity mapping: | ||
| /// - Functions and methods | ||
| /// - Global Variables | ||
| /// - Function parameters | ||
| /// - Struct/class/union type definitions | ||
| /// - Struct/class/union fields | ||
steakhal marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| /// | ||
| /// Implicit declarations and compiler builtins are not mapped. | ||
| /// | ||
| /// \param D The declaration to map. Must not be null. | ||
| /// | ||
| /// \return An EntityName if the declaration can be mapped, std::nullopt otherwise. | ||
| std::optional<EntityName> getLocalEntityNameForDecl(const Decl* D); | ||
|
||
|
|
||
| /// Maps a function return type to an EntityName. | ||
|
||
| /// | ||
| /// \param FD The function declaration. Must not be null. | ||
| /// | ||
| /// \return An EntityName for the function's return type. | ||
| std::optional<EntityName> getLocalEntityNameForFunctionReturn(const FunctionDecl* FD); | ||
|
|
||
| } // namespace clang::ssaf | ||
|
|
||
| #endif // LLVM_CLANG_ANALYSIS_SCALABLE_ASTENTITYMAPPING_H | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,115 @@ | ||
| //===- BuildNamespace.h -----------------------------------------*- C++ -*-===// | ||
| // | ||
| // Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. | ||
| // See https://llvm.org/LICENSE.txt for license information. | ||
| // SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception | ||
| // | ||
| //===----------------------------------------------------------------------===// | ||
| // | ||
| // This file defines BuildNamespace and NestedBuildNamespace classes that | ||
| // represent build namespaces in the Scalable Static Analysis Framework. | ||
| // | ||
| // Build namespaces provide an abstraction for grouping program entities (such | ||
| // as those in a shared library or compilation unit) to enable analysis of | ||
| // software projects constructed from individual components. | ||
| // | ||
| //===----------------------------------------------------------------------===// | ||
|
|
||
| #ifndef LLVM_CLANG_ANALYSIS_SCALABLE_MODEL_BUILDNAMESPACE_H | ||
| #define LLVM_CLANG_ANALYSIS_SCALABLE_MODEL_BUILDNAMESPACE_H | ||
|
|
||
| #include "llvm/ADT/STLExtras.h" | ||
| #include "llvm/ADT/StringRef.h" | ||
| #include <optional> | ||
| #include <string> | ||
| #include <vector> | ||
|
|
||
| namespace clang::ssaf { | ||
|
|
||
| enum class BuildNamespaceKind : unsigned short { | ||
| CompilationUnit, | ||
| LinkUnit | ||
| }; | ||
|
|
||
| llvm::StringRef toString(BuildNamespaceKind BNK); | ||
steakhal marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
| std::optional<BuildNamespaceKind> parseBuildNamespaceKind(llvm::StringRef Str); | ||
|
|
||
| /// Represents a single namespace in the build process. | ||
| /// | ||
| /// A BuildNamespace groups program entities, such as those belonging to a | ||
| /// compilation unit or link unit (e.g., a shared library). Each namespace has a | ||
| /// kind (CompilationUnit or LinkUnit) and a unique identifier name within that | ||
| /// kind. | ||
| /// | ||
| /// BuildNamespaces can be composed into NestedBuildNamespace to represent | ||
| /// hierarchical namespace structures that model how software is constructed from | ||
| /// its components. | ||
| class BuildNamespace { | ||
| BuildNamespaceKind Kind; | ||
| std::string Name; | ||
|
|
||
| auto asTuple() const { return std::tie(Kind, Name); } | ||
|
|
||
| public: | ||
| BuildNamespace(BuildNamespaceKind Kind, llvm::StringRef Name) | ||
| : Kind(Kind), Name(Name.str()) {} | ||
|
|
||
| static BuildNamespace makeTU(llvm::StringRef CompilationId); | ||
|
|
||
| bool operator==(const BuildNamespace& Other) const; | ||
| bool operator!=(const BuildNamespace& Other) const; | ||
| bool operator<(const BuildNamespace& Other) const; | ||
|
|
||
| friend class SerializationFormat; | ||
| }; | ||
|
|
||
| /// Represents a hierarchical sequence of build namespaces. | ||
| /// | ||
| /// A NestedBuildNamespace captures namespace qualification for program entities | ||
| /// by maintaining an ordered sequence of BuildNamespace steps. This models how | ||
| /// entities are organized through multiple steps of the build process, such as | ||
| /// first being part of a compilation unit, then incorporated into a link unit. | ||
| /// | ||
| /// For example, an entity might be qualified by a compilation unit namespace | ||
| /// followed by a shared library namespace. | ||
| class NestedBuildNamespace { | ||
| friend class SerializationFormat; | ||
|
|
||
| std::vector<BuildNamespace> Namespaces; | ||
|
|
||
| public: | ||
| NestedBuildNamespace() = default; | ||
|
|
||
| explicit NestedBuildNamespace(const std::vector<BuildNamespace>& Namespaces) | ||
| : Namespaces(Namespaces) {} | ||
|
|
||
| explicit NestedBuildNamespace(const BuildNamespace& N) { | ||
| Namespaces.push_back(N); | ||
| } | ||
|
|
||
| static NestedBuildNamespace makeTU(llvm::StringRef CompilationId); | ||
|
||
|
|
||
| /// Creates a new NestedBuildNamespace by appending additional namespace. | ||
| /// | ||
| /// \param Namespace The namespace to append. | ||
| NestedBuildNamespace makeQualified(NestedBuildNamespace Namespace) const { | ||
| auto Copy = *this; | ||
| Copy.Namespaces.reserve(Copy.Namespaces.size() + Namespace.Namespaces.size()); | ||
| llvm::append_range(Copy.Namespaces, Namespace.Namespaces); | ||
| return Copy; | ||
| } | ||
|
|
||
| bool empty() const; | ||
|
|
||
| bool operator==(const NestedBuildNamespace& Other) const; | ||
| bool operator!=(const NestedBuildNamespace& Other) const; | ||
| bool operator<(const NestedBuildNamespace& Other) const; | ||
|
|
||
| friend class JSONWriter; | ||
| friend class LinkUnitResolution; | ||
| }; | ||
|
|
||
| } // namespace clang::ssaf | ||
|
|
||
| #endif // LLVM_CLANG_ANALYSIS_SCALABLE_MODEL_BUILDNAMESPACE_H | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,50 @@ | ||
| //===- EntityName.h ---------------------------------------------*- C++ -*-===// | ||
| // | ||
| // Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. | ||
| // See https://llvm.org/LICENSE.txt for license information. | ||
| // SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception | ||
| // | ||
| //===----------------------------------------------------------------------===// | ||
|
|
||
| #ifndef LLVM_CLANG_ANALYSIS_SCALABLE_MODEL_ENTITYNAME_H | ||
| #define LLVM_CLANG_ANALYSIS_SCALABLE_MODEL_ENTITYNAME_H | ||
|
|
||
| #include "clang/Analysis/Scalable/Model/BuildNamespace.h" | ||
| #include "llvm/ADT/SmallString.h" | ||
| #include "llvm/ADT/StringRef.h" | ||
| #include <string> | ||
|
|
||
| namespace clang::ssaf { | ||
|
|
||
| /// Uniquely identifies an entity in a program. | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I have some concerns about how certain entities will be uniquely identified. There are lots of edge cases as the result of the compilation model of C and C++. Specifically, the same entity might be forward declared in multiple compilation units (in entirely different header files). Do we consider those to be the same entity or not? Or do we only care about complete types? We do not need to answer all of these questions but I think we should start planning around these scenarios early and document the behavior via some test.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There is a fair amount of complexity in relating entities indeed. |
||
| /// | ||
| /// EntityName provides a globally unique identifier for program entities that remains | ||
| /// stable across compilation boundaries. This enables whole-program analysis to track | ||
| /// and relate entities across separately compiled translation units. | ||
| class EntityName { | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. nit: Consider
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes. Please see: |
||
| std::string USR; | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think it would be helpful to explain the roles of these fields. Perhaps copy some of the information from the PR description. To the casual reader "USR" won't have much meaning and since the type is string that won't inform them either. Additionally, the need for the suffix won't be obvious. Similarly, the role of Alternatively, provide a detailed explanation on the class comments or the constructor.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I would actually prefer to keep the implementation details opaque from the users with the idea that we might want to change them later. I've added comments saying that. In theory I could make the constructor private and declare the |
||
| llvm::SmallString<16> Suffix; | ||
steakhal marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| NestedBuildNamespace Namespace; | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm afraid of the size implications of including this. USRs are already large, but including an arbitrarily large vector as well, in each ID, threatens to make this unusable for large scale application. Does the entity name need the full vector, or could something like a unique 64-bit ID suffice?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. My plan is to actually use 64-bit IDs to represent entities in most contexts. A follow-up PR introduces that: I imagine The compilation unit IDs and link unit IDs that the namespaces use as names will be supplied to the framework by external tools. We imagine the implementation will need to be space-efficient and while this API doesn't impose such restriction, it is unlikely that we could use long strings. With the current API design (with For summary analysis part, when we might need to represent all entities of a program in memory at once, having even a single instance of EntityName for each entity in memory might require too much memory for analysis of large programs and we might need a different representation than |
||
|
|
||
| auto asTuple() const { return std::tie(USR, Suffix, Namespace); } | ||
|
|
||
| public: | ||
| EntityName(llvm::StringRef USR, llvm::StringRef Suffix, | ||
| NestedBuildNamespace Namespace); | ||
|
|
||
| bool operator==(const EntityName& Other) const; | ||
| bool operator!=(const EntityName& Other) const; | ||
| bool operator<(const EntityName& Other) const; | ||
|
|
||
| /// Creates a new EntityName with additional build namespace qualification. | ||
| /// | ||
| /// \param Namespace The namespace steps to append to this entity's namespace. | ||
| EntityName makeQualified(NestedBuildNamespace Namespace) const; | ||
|
|
||
| friend class LinkUnitResolution; | ||
| friend class SerializationFormat; | ||
| }; | ||
|
|
||
| } // namespace clang::ssaf | ||
|
|
||
| #endif // LLVM_CLANG_ANALYSIS_SCALABLE_MODEL_ENTITYNAME_H | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,82 @@ | ||
| //===- ASTMapping.cpp - AST to SSAF Entity mapping --------------*- C++ -*-===// | ||
| // | ||
| // Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. | ||
| // See https://llvm.org/LICENSE.txt for license information. | ||
| // SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception | ||
| // | ||
| //===----------------------------------------------------------------------===// | ||
| // | ||
| // This file implements utilities for mapping AST declarations to SSAF entities. | ||
| // | ||
| //===----------------------------------------------------------------------===// | ||
|
|
||
| #include "clang/Analysis/Scalable/ASTEntityMapping.h" | ||
| #include "clang/AST/Decl.h" | ||
| #include "clang/Analysis/Scalable/Model/BuildNamespace.h" | ||
| #include "clang/Index/USRGeneration.h" | ||
| #include "llvm/ADT/SmallString.h" | ||
|
|
||
| namespace clang::ssaf { | ||
|
|
||
| std::optional<EntityName> getLocalEntityNameForDecl(const Decl* D) { | ||
| if (!D) | ||
| return std::nullopt; | ||
|
|
||
| if (D->isImplicit()) | ||
| return std::nullopt; | ||
|
|
||
| if (isa<FunctionDecl>(D) && cast<FunctionDecl>(D)->getBuiltinID()) | ||
| return std::nullopt; | ||
|
|
||
| if (!isa<FunctionDecl, ParmVarDecl, VarDecl, FieldDecl, RecordDecl>(D)) | ||
| return std::nullopt; | ||
|
|
||
| llvm::SmallString<16> Suffix; | ||
| const Decl *USRDecl = D; | ||
|
|
||
| // For parameters, use the parent function's USR with parameter index as suffix | ||
| if (const auto * PVD = dyn_cast<ParmVarDecl>(D)) { | ||
| const auto *FD = dyn_cast_or_null<FunctionDecl>(PVD->getParentFunctionOrMethod()); | ||
| if (!FD) | ||
| return std::nullopt; | ||
| USRDecl = FD; | ||
|
|
||
| const auto ParamIdx = PVD->getFunctionScopeIndex(); | ||
| llvm::raw_svector_ostream OS(Suffix); | ||
| // Parameter uses function's USR with 1-based index as suffix | ||
| OS << (ParamIdx + 1); | ||
| } | ||
|
|
||
| llvm::SmallString<128> USRBuf; | ||
| if (clang::index::generateUSRForDecl(USRDecl, USRBuf)) { | ||
|
||
| return std::nullopt; | ||
| } | ||
|
|
||
| if (USRBuf.empty()) | ||
| return std::nullopt; | ||
|
|
||
| return EntityName(USRBuf.str(), Suffix, {}); | ||
| } | ||
|
|
||
| std::optional<EntityName> getLocalEntityNameForFunctionReturn(const FunctionDecl* FD) { | ||
| if (!FD) | ||
| return std::nullopt; | ||
|
|
||
| if (FD->isImplicit()) | ||
| return std::nullopt; | ||
|
|
||
| if (FD->getBuiltinID()) | ||
| return std::nullopt; | ||
|
|
||
| llvm::SmallString<128> USRBuf; | ||
| if (clang::index::generateUSRForDecl(FD, USRBuf)) { | ||
| return std::nullopt; | ||
| } | ||
|
|
||
| if (USRBuf.empty()) | ||
| return std::nullopt; | ||
|
|
||
| return EntityName(USRBuf.str(), /*Suffix=*/"0", /*Namespace=*/{}); | ||
| } | ||
|
|
||
| } // namespace clang::ssaf | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,19 @@ | ||
| set(LLVM_LINK_COMPONENTS | ||
| Support | ||
| ) | ||
|
|
||
| add_clang_library(clangAnalysisScalable | ||
| ASTEntityMapping.cpp | ||
| Model/BuildNamespace.cpp | ||
| Model/EntityName.cpp | ||
|
|
||
| LINK_LIBS | ||
| clangAST | ||
| clangASTMatchers | ||
| clangBasic | ||
| clangIndex | ||
| clangLex | ||
| clangFrontend | ||
|
|
||
| DEPENDS | ||
| ) |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,69 @@ | ||
| //===- BuildNamespace.cpp ---------------------------------------*- C++ -*-===// | ||
| // | ||
| // Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. | ||
| // See https://llvm.org/LICENSE.txt for license information. | ||
| // SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception | ||
| // | ||
| //===----------------------------------------------------------------------===// | ||
|
|
||
| #include "clang/Analysis/Scalable/Model/BuildNamespace.h" | ||
| #include "llvm/Support/ErrorHandling.h" | ||
| #include <tuple> | ||
|
|
||
| namespace clang::ssaf { | ||
|
|
||
| llvm::StringRef toString(BuildNamespaceKind BNK) { | ||
| switch(BNK) { | ||
| case BuildNamespaceKind::CompilationUnit: return "compilation_unit"; | ||
| case BuildNamespaceKind::LinkUnit: return "link_unit"; | ||
| } | ||
| llvm_unreachable("Unknown BuildNamespaceKind"); | ||
| } | ||
|
|
||
| std::optional<BuildNamespaceKind> parseBuildNamespaceKind(llvm::StringRef Str) { | ||
| if (Str == "compilation_unit") | ||
| return BuildNamespaceKind::CompilationUnit; | ||
| if (Str == "link_unit") | ||
| return BuildNamespaceKind::LinkUnit; | ||
| return std::nullopt; | ||
| } | ||
|
|
||
| BuildNamespace BuildNamespace::makeTU(llvm::StringRef CompilationId) { | ||
| return BuildNamespace{BuildNamespaceKind::CompilationUnit, CompilationId.str()}; | ||
| } | ||
|
|
||
| bool BuildNamespace::operator==(const BuildNamespace& Other) const { | ||
| return asTuple() == Other.asTuple(); | ||
| } | ||
|
|
||
| bool BuildNamespace::operator!=(const BuildNamespace& Other) const { | ||
| return !(*this == Other); | ||
| } | ||
|
|
||
| bool BuildNamespace::operator<(const BuildNamespace& Other) const { | ||
| return asTuple() < Other.asTuple(); | ||
| } | ||
|
|
||
| NestedBuildNamespace NestedBuildNamespace::makeTU(llvm::StringRef CompilationId) { | ||
| NestedBuildNamespace Result; | ||
| Result.Namespaces.push_back(BuildNamespace::makeTU(CompilationId)); | ||
| return Result; | ||
| } | ||
|
|
||
| bool NestedBuildNamespace::empty() const { | ||
| return Namespaces.empty(); | ||
| } | ||
|
|
||
| bool NestedBuildNamespace::operator==(const NestedBuildNamespace& Other) const { | ||
| return Namespaces == Other.Namespaces; | ||
| } | ||
|
|
||
| bool NestedBuildNamespace::operator!=(const NestedBuildNamespace& Other) const { | ||
| return !(*this == Other); | ||
| } | ||
|
|
||
| bool NestedBuildNamespace::operator<(const NestedBuildNamespace& Other) const { | ||
| return Namespaces < Other.Namespaces; | ||
| } | ||
|
|
||
| } // namespace clang::ssaf |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: header name and the comment does not match.