Skip to content
Open
Show file tree
Hide file tree
Changes from 19 commits
Commits
Show all changes
30 commits
Select commit Hold shift + click to select a range
61f84a4
[clang][ssaf] Introduce entity abstraction for SSAF
jkorous-apple Nov 21, 2025
d1f0e79
[clang][ssaf] Use nested namespace definitions
jkorous-apple Dec 3, 2025
8da5617
[clang][ssaf] Return StringRef from toString(BuildNamespaceKind)
jkorous-apple Dec 3, 2025
0859de9
[clang][ssaf] Fix header guards
jkorous-apple Dec 3, 2025
3ad2d3d
[clang][ssaf] Optimize NestedBuildNamespace::makeQualified
jkorous-apple Dec 3, 2025
6b840ce
[clang][ssaf] Add doc comments to makeQualified methods
jkorous-apple Dec 3, 2025
7467cf7
[clang][ssaf] Add asTuple helper to EntityName and BuildNamespace
jkorous-apple Dec 5, 2025
5750baf
[clang][ssaf] Simplify AST node type check with isa<>
jkorous-apple Dec 5, 2025
c25b2bf
[clang][ssaf] Add default value param names
jkorous-apple Dec 5, 2025
325f74d
[clang][ssaf][NFC] Make test helper to return canonical decl
jkorous-apple Dec 5, 2025
ffc5104
[clang][ssaf] Make a test assert more specific
jkorous-apple Dec 5, 2025
d488921
[clang][ssaf][NFC] Refactor implicit ctor decl search in a test
jkorous-apple Dec 5, 2025
df0e696
[clang][ssaf][NFC] Make a test assertion more specific
jkorous-apple Dec 5, 2025
f7d033b
[clang][ssaf][NFC] Add assertion to a test
jkorous-apple Dec 5, 2025
a7c5aa7
[clang][ssaf][NFC] Improve redeclaration entity name test
jkorous-apple Dec 5, 2025
2c8c699
[clang][ssaf][NFC] Refactor redeclaration tests
jkorous-apple Dec 5, 2025
5b514e9
[clang][ssaf][NFC] Add new testcases for function parameter entity name
jkorous-apple Dec 5, 2025
ba89f06
[clang][ssaf][NFC] Sort test source files in CMakeLists.txt
jkorous-apple Dec 5, 2025
8bde909
[clang][ssaf] Add doc comments for BuildNamespace
jkorous-apple Dec 6, 2025
4cab5a1
[clang][ssaf][NFC] Improve doc for getLocalEntityNameForFunctionReturn
jkorous-apple Dec 8, 2025
30ac699
[clang][ssaf] Shorten names of EntityName factory functions
jkorous-apple Dec 8, 2025
97c31ee
[clang][ssaf][NFC] Fix test for entity name of implicit declaration
jkorous-apple Dec 8, 2025
140e84f
[clang][ssaf][NFC] Rename and document makeTU in BuildNamespace
jkorous-apple Dec 8, 2025
a6807ee
[clang][ssaf][NFC] Add comment on implementation of EntityName being …
jkorous-apple Dec 8, 2025
2fb3d37
[clang][ssaf][NFC] Fix header comment
jkorous-apple Dec 8, 2025
db34258
[clang][ssaf] Remove braces from single-line if block
jkorous-apple Dec 8, 2025
691482c
[clang][ssaf] Apply clang-format
jkorous-apple Dec 8, 2025
0bf64a4
[clang][ssaf][NFC] Label C++ snippets in unit tests
jkorous-apple Dec 8, 2025
e8d8805
[clang][ssaf][NFC] clang-format tests
jkorous-apple Dec 8, 2025
8889e79
Merge branch 'main' into ssaf
jkorous-apple Dec 8, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
44 changes: 44 additions & 0 deletions clang/include/clang/Analysis/Scalable/ASTEntityMapping.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
//===- ASTMapping.h - AST to SSAF Entity mapping ----------------*- C++ -*-===//
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: header name and the comment does not match.

//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//
//===----------------------------------------------------------------------===//

#ifndef LLVM_CLANG_ANALYSIS_SCALABLE_ASTENTITYMAPPING_H
#define LLVM_CLANG_ANALYSIS_SCALABLE_ASTENTITYMAPPING_H

#include "clang/Analysis/Scalable/Model/EntityName.h"
#include "clang/AST/Decl.h"
#include "llvm/ADT/StringRef.h"
#include <optional>

namespace clang::ssaf {

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For both of these lookup functions -- I expect these to be used heavily in analysis, so it would be beneficial if they were shorter names. Currently, the names encode a lot of the (existing) type information. Is that necessary/helpful? Could you instead go with a simpler scheme like getEntity and getReturnEntity?

Separately: constructing these is typically expensive, so we use a cache. Consider including a cache object in this library as well. I think that will be the correct choice for most use cases.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, let me update the names.

Re:caching - I imagine you are totally right and we will do that. Since we are building everything from scratch, I would prefer to wait with such optimizations until we understand the use cases a little better though. For example - I imagine the cache should probably keep map<Decl *, EntityID> while entity IDs are not even introduced in this PR yet.
WDYT?

/// Maps a declaration to an EntityName.
///
/// Supported declaration types for entity mapping:
/// - Functions and methods
/// - Global Variables
/// - Function parameters
/// - Struct/class/union type definitions
/// - Struct/class/union fields
///
/// Implicit declarations and compiler builtins are not mapped.
///
/// \param D The declaration to map. Must not be null.
///
/// \return An EntityName if the declaration can be mapped, std::nullopt otherwise.
std::optional<EntityName> getLocalEntityNameForDecl(const Decl* D);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does the "Local" refer to in the name?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The names got shortened to getEntityName and getEntityNameForReturn.

The word "Local" referred to the fact that the resulting name would be TU-local, i.e., without BuildNamespace qualification.


/// Maps a function return type to an EntityName.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would help to spell out in more detail what you mean/why this specialization is necessary. It's not really the type that's being identified, or you could just separately identify the type. It's specifically the return entity of this function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's an interesting question.

I've tweaked the doc comment a bit but I suspect I am not fully answering your question.

I can imagine that waiting with adding this overload until I can show its use might be better. WDYT?

///
/// \param FD The function declaration. Must not be null.
///
/// \return An EntityName for the function's return type.
std::optional<EntityName> getLocalEntityNameForFunctionReturn(const FunctionDecl* FD);

} // namespace clang::ssaf

#endif // LLVM_CLANG_ANALYSIS_SCALABLE_ASTENTITYMAPPING_H
115 changes: 115 additions & 0 deletions clang/include/clang/Analysis/Scalable/Model/BuildNamespace.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,115 @@
//===- BuildNamespace.h -----------------------------------------*- C++ -*-===//
//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//
//===----------------------------------------------------------------------===//
//
// This file defines BuildNamespace and NestedBuildNamespace classes that
// represent build namespaces in the Scalable Static Analysis Framework.
//
// Build namespaces provide an abstraction for grouping program entities (such
// as those in a shared library or compilation unit) to enable analysis of
// software projects constructed from individual components.
//
//===----------------------------------------------------------------------===//

#ifndef LLVM_CLANG_ANALYSIS_SCALABLE_MODEL_BUILDNAMESPACE_H
#define LLVM_CLANG_ANALYSIS_SCALABLE_MODEL_BUILDNAMESPACE_H

#include "llvm/ADT/STLExtras.h"
#include "llvm/ADT/StringRef.h"
#include <optional>
#include <string>
#include <vector>

namespace clang::ssaf {

enum class BuildNamespaceKind : unsigned short {
CompilationUnit,
LinkUnit
};

llvm::StringRef toString(BuildNamespaceKind BNK);

std::optional<BuildNamespaceKind> parseBuildNamespaceKind(llvm::StringRef Str);

/// Represents a single namespace in the build process.
///
/// A BuildNamespace groups program entities, such as those belonging to a
/// compilation unit or link unit (e.g., a shared library). Each namespace has a
/// kind (CompilationUnit or LinkUnit) and a unique identifier name within that
/// kind.
///
/// BuildNamespaces can be composed into NestedBuildNamespace to represent
/// hierarchical namespace structures that model how software is constructed from
/// its components.
class BuildNamespace {
BuildNamespaceKind Kind;
std::string Name;

auto asTuple() const { return std::tie(Kind, Name); }

public:
BuildNamespace(BuildNamespaceKind Kind, llvm::StringRef Name)
: Kind(Kind), Name(Name.str()) {}

static BuildNamespace makeTU(llvm::StringRef CompilationId);

bool operator==(const BuildNamespace& Other) const;
bool operator!=(const BuildNamespace& Other) const;
bool operator<(const BuildNamespace& Other) const;

friend class SerializationFormat;
};

/// Represents a hierarchical sequence of build namespaces.
///
/// A NestedBuildNamespace captures namespace qualification for program entities
/// by maintaining an ordered sequence of BuildNamespace steps. This models how
/// entities are organized through multiple steps of the build process, such as
/// first being part of a compilation unit, then incorporated into a link unit.
///
/// For example, an entity might be qualified by a compilation unit namespace
/// followed by a shared library namespace.
class NestedBuildNamespace {
friend class SerializationFormat;

std::vector<BuildNamespace> Namespaces;

public:
NestedBuildNamespace() = default;

explicit NestedBuildNamespace(const std::vector<BuildNamespace>& Namespaces)
: Namespaces(Namespaces) {}

explicit NestedBuildNamespace(const BuildNamespace& N) {
Namespaces.push_back(N);
}

static NestedBuildNamespace makeTU(llvm::StringRef CompilationId);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please comment.


/// Creates a new NestedBuildNamespace by appending additional namespace.
///
/// \param Namespace The namespace to append.
NestedBuildNamespace makeQualified(NestedBuildNamespace Namespace) const {
auto Copy = *this;
Copy.Namespaces.reserve(Copy.Namespaces.size() + Namespace.Namespaces.size());
llvm::append_range(Copy.Namespaces, Namespace.Namespaces);
return Copy;
}

bool empty() const;

bool operator==(const NestedBuildNamespace& Other) const;
bool operator!=(const NestedBuildNamespace& Other) const;
bool operator<(const NestedBuildNamespace& Other) const;

friend class JSONWriter;
friend class LinkUnitResolution;
};

} // namespace clang::ssaf

#endif // LLVM_CLANG_ANALYSIS_SCALABLE_MODEL_BUILDNAMESPACE_H
50 changes: 50 additions & 0 deletions clang/include/clang/Analysis/Scalable/Model/EntityName.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
//===- EntityName.h ---------------------------------------------*- C++ -*-===//
//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//
//===----------------------------------------------------------------------===//

#ifndef LLVM_CLANG_ANALYSIS_SCALABLE_MODEL_ENTITYNAME_H
#define LLVM_CLANG_ANALYSIS_SCALABLE_MODEL_ENTITYNAME_H

#include "clang/Analysis/Scalable/Model/BuildNamespace.h"
#include "llvm/ADT/SmallString.h"
#include "llvm/ADT/StringRef.h"
#include <string>

namespace clang::ssaf {

/// Uniquely identifies an entity in a program.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have some concerns about how certain entities will be uniquely identified. There are lots of edge cases as the result of the compilation model of C and C++. Specifically, the same entity might be forward declared in multiple compilation units (in entirely different header files). Do we consider those to be the same entity or not? Or do we only care about complete types? We do not need to answer all of these questions but I think we should start planning around these scenarios early and document the behavior via some test.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a fair amount of complexity in relating entities indeed.
At this point we are just creating foundations that we will use in the future (hopefully soon) to implement entity linking and resolution. We know that we will need more data than just EntityName for that. We are also aware that until we actually implement it, this is only our best guess of what will be necessary. If we later find out the EntityName is missing something, we will enhance it. The right place to document, implement and test all that logic will be in the patches that introduce entity linker and the APIs for lookup in analysis result data.

///
/// EntityName provides a globally unique identifier for program entities that remains
/// stable across compilation boundaries. This enables whole-program analysis to track
/// and relate entities across separately compiled translation units.
class EntityName {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Consider EntityID -- its shorter and (IMO) more intuitive. I think of "name" more for circumstances where the identifier is readable. But, it's a matter of taste.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Please see:
#169131 (comment)

std::string USR;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be helpful to explain the roles of these fields. Perhaps copy some of the information from the PR description. To the casual reader "USR" won't have much meaning and since the type is string that won't inform them either. Additionally, the need for the suffix won't be obvious. Similarly, the role of Namespace in distinguishing between otherwise identical entities.

Alternatively, provide a detailed explanation on the class comments or the constructor.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would actually prefer to keep the implementation details opaque from the users with the idea that we might want to change them later. I've added comments saying that.

In theory I could make the constructor private and declare the getEntityName, etc. in ASTEntityMapping.h as friend-s. I was going to not make this a hard restriction until I get clarity on details of how this would work with custom serialization formats where deserialization will likely need to use the constructor again.

llvm::SmallString<16> Suffix;
NestedBuildNamespace Namespace;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm afraid of the size implications of including this. USRs are already large, but including an arbitrarily large vector as well, in each ID, threatens to make this unusable for large scale application.

Does the entity name need the full vector, or could something like a unique 64-bit ID suffice?

Copy link
Contributor Author

@jkorous-apple jkorous-apple Dec 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My plan is to actually use 64-bit IDs to represent entities in most contexts.

A follow-up PR introduces that:
jkorous-apple#1

I imagine EntityName will mostly be used for linking and for mapping entities back to the AST which we need for source code rewriting tools.

The compilation unit IDs and link unit IDs that the namespaces use as names will be supplied to the framework by external tools. We imagine the implementation will need to be space-efficient and while this API doesn't impose such restriction, it is unlikely that we could use long strings.

With the current API design (with EntityIdTable), during summary extraction, we would still have an instance of EntityName for each entity contributing a fact or being referred to by a summary. I plan to factor out the common namespace prefix with compilation unit id and have individual entity names unqualified.

For summary analysis part, when we might need to represent all entities of a program in memory at once, having even a single instance of EntityName for each entity in memory might require too much memory for analysis of large programs and we might need a different representation than EntityName, EntityId and EntityIdTable.


auto asTuple() const { return std::tie(USR, Suffix, Namespace); }

public:
EntityName(llvm::StringRef USR, llvm::StringRef Suffix,
NestedBuildNamespace Namespace);

bool operator==(const EntityName& Other) const;
bool operator!=(const EntityName& Other) const;
bool operator<(const EntityName& Other) const;

/// Creates a new EntityName with additional build namespace qualification.
///
/// \param Namespace The namespace steps to append to this entity's namespace.
EntityName makeQualified(NestedBuildNamespace Namespace) const;

friend class LinkUnitResolution;
friend class SerializationFormat;
};

} // namespace clang::ssaf

#endif // LLVM_CLANG_ANALYSIS_SCALABLE_MODEL_ENTITYNAME_H
1 change: 1 addition & 0 deletions clang/lib/Analysis/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -50,3 +50,4 @@ add_clang_library(clangAnalysis
add_subdirectory(plugins)
add_subdirectory(FlowSensitive)
add_subdirectory(LifetimeSafety)
add_subdirectory(Scalable)
82 changes: 82 additions & 0 deletions clang/lib/Analysis/Scalable/ASTEntityMapping.cpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,82 @@
//===- ASTMapping.cpp - AST to SSAF Entity mapping --------------*- C++ -*-===//
//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//
//===----------------------------------------------------------------------===//
//
// This file implements utilities for mapping AST declarations to SSAF entities.
//
//===----------------------------------------------------------------------===//

#include "clang/Analysis/Scalable/ASTEntityMapping.h"
#include "clang/AST/Decl.h"
#include "clang/Analysis/Scalable/Model/BuildNamespace.h"
#include "clang/Index/USRGeneration.h"
#include "llvm/ADT/SmallString.h"

namespace clang::ssaf {

std::optional<EntityName> getLocalEntityNameForDecl(const Decl* D) {
if (!D)
return std::nullopt;

if (D->isImplicit())
return std::nullopt;

if (isa<FunctionDecl>(D) && cast<FunctionDecl>(D)->getBuiltinID())
return std::nullopt;

if (!isa<FunctionDecl, ParmVarDecl, VarDecl, FieldDecl, RecordDecl>(D))
return std::nullopt;

llvm::SmallString<16> Suffix;
const Decl *USRDecl = D;

// For parameters, use the parent function's USR with parameter index as suffix
if (const auto * PVD = dyn_cast<ParmVarDecl>(D)) {
const auto *FD = dyn_cast_or_null<FunctionDecl>(PVD->getParentFunctionOrMethod());
if (!FD)
return std::nullopt;
USRDecl = FD;

const auto ParamIdx = PVD->getFunctionScopeIndex();
llvm::raw_svector_ostream OS(Suffix);
// Parameter uses function's USR with 1-based index as suffix
OS << (ParamIdx + 1);
}

llvm::SmallString<128> USRBuf;
if (clang::index::generateUSRForDecl(USRDecl, USRBuf)) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: braces could be dropped here.

return std::nullopt;
}

if (USRBuf.empty())
return std::nullopt;

return EntityName(USRBuf.str(), Suffix, {});
}

std::optional<EntityName> getLocalEntityNameForFunctionReturn(const FunctionDecl* FD) {
if (!FD)
return std::nullopt;

if (FD->isImplicit())
return std::nullopt;

if (FD->getBuiltinID())
return std::nullopt;

llvm::SmallString<128> USRBuf;
if (clang::index::generateUSRForDecl(FD, USRBuf)) {
return std::nullopt;
}

if (USRBuf.empty())
return std::nullopt;

return EntityName(USRBuf.str(), /*Suffix=*/"0", /*Namespace=*/{});
}

} // namespace clang::ssaf
19 changes: 19 additions & 0 deletions clang/lib/Analysis/Scalable/CMakeLists.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
set(LLVM_LINK_COMPONENTS
Support
)

add_clang_library(clangAnalysisScalable
ASTEntityMapping.cpp
Model/BuildNamespace.cpp
Model/EntityName.cpp

LINK_LIBS
clangAST
clangASTMatchers
clangBasic
clangIndex
clangLex
clangFrontend

DEPENDS
)
69 changes: 69 additions & 0 deletions clang/lib/Analysis/Scalable/Model/BuildNamespace.cpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
//===- BuildNamespace.cpp ---------------------------------------*- C++ -*-===//
//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//
//===----------------------------------------------------------------------===//

#include "clang/Analysis/Scalable/Model/BuildNamespace.h"
#include "llvm/Support/ErrorHandling.h"
#include <tuple>

namespace clang::ssaf {

llvm::StringRef toString(BuildNamespaceKind BNK) {
switch(BNK) {
case BuildNamespaceKind::CompilationUnit: return "compilation_unit";
case BuildNamespaceKind::LinkUnit: return "link_unit";
}
llvm_unreachable("Unknown BuildNamespaceKind");
}

std::optional<BuildNamespaceKind> parseBuildNamespaceKind(llvm::StringRef Str) {
if (Str == "compilation_unit")
return BuildNamespaceKind::CompilationUnit;
if (Str == "link_unit")
return BuildNamespaceKind::LinkUnit;
return std::nullopt;
}

BuildNamespace BuildNamespace::makeTU(llvm::StringRef CompilationId) {
return BuildNamespace{BuildNamespaceKind::CompilationUnit, CompilationId.str()};
}

bool BuildNamespace::operator==(const BuildNamespace& Other) const {
return asTuple() == Other.asTuple();
}

bool BuildNamespace::operator!=(const BuildNamespace& Other) const {
return !(*this == Other);
}

bool BuildNamespace::operator<(const BuildNamespace& Other) const {
return asTuple() < Other.asTuple();
}

NestedBuildNamespace NestedBuildNamespace::makeTU(llvm::StringRef CompilationId) {
NestedBuildNamespace Result;
Result.Namespaces.push_back(BuildNamespace::makeTU(CompilationId));
return Result;
}

bool NestedBuildNamespace::empty() const {
return Namespaces.empty();
}

bool NestedBuildNamespace::operator==(const NestedBuildNamespace& Other) const {
return Namespaces == Other.Namespaces;
}

bool NestedBuildNamespace::operator!=(const NestedBuildNamespace& Other) const {
return !(*this == Other);
}

bool NestedBuildNamespace::operator<(const NestedBuildNamespace& Other) const {
return Namespaces < Other.Namespaces;
}

} // namespace clang::ssaf
Loading
Loading