Skip to content
Merged
Show file tree
Hide file tree
Changes from 6 commits
Commits
Show all changes
30 commits
Select commit Hold shift + click to select a range
61f84a4
[clang][ssaf] Introduce entity abstraction for SSAF
jkorous-apple Nov 21, 2025
d1f0e79
[clang][ssaf] Use nested namespace definitions
jkorous-apple Dec 3, 2025
8da5617
[clang][ssaf] Return StringRef from toString(BuildNamespaceKind)
jkorous-apple Dec 3, 2025
0859de9
[clang][ssaf] Fix header guards
jkorous-apple Dec 3, 2025
3ad2d3d
[clang][ssaf] Optimize NestedBuildNamespace::makeQualified
jkorous-apple Dec 3, 2025
6b840ce
[clang][ssaf] Add doc comments to makeQualified methods
jkorous-apple Dec 3, 2025
7467cf7
[clang][ssaf] Add asTuple helper to EntityName and BuildNamespace
jkorous-apple Dec 5, 2025
5750baf
[clang][ssaf] Simplify AST node type check with isa<>
jkorous-apple Dec 5, 2025
c25b2bf
[clang][ssaf] Add default value param names
jkorous-apple Dec 5, 2025
325f74d
[clang][ssaf][NFC] Make test helper to return canonical decl
jkorous-apple Dec 5, 2025
ffc5104
[clang][ssaf] Make a test assert more specific
jkorous-apple Dec 5, 2025
d488921
[clang][ssaf][NFC] Refactor implicit ctor decl search in a test
jkorous-apple Dec 5, 2025
df0e696
[clang][ssaf][NFC] Make a test assertion more specific
jkorous-apple Dec 5, 2025
f7d033b
[clang][ssaf][NFC] Add assertion to a test
jkorous-apple Dec 5, 2025
a7c5aa7
[clang][ssaf][NFC] Improve redeclaration entity name test
jkorous-apple Dec 5, 2025
2c8c699
[clang][ssaf][NFC] Refactor redeclaration tests
jkorous-apple Dec 5, 2025
5b514e9
[clang][ssaf][NFC] Add new testcases for function parameter entity name
jkorous-apple Dec 5, 2025
ba89f06
[clang][ssaf][NFC] Sort test source files in CMakeLists.txt
jkorous-apple Dec 5, 2025
8bde909
[clang][ssaf] Add doc comments for BuildNamespace
jkorous-apple Dec 6, 2025
4cab5a1
[clang][ssaf][NFC] Improve doc for getLocalEntityNameForFunctionReturn
jkorous-apple Dec 8, 2025
30ac699
[clang][ssaf] Shorten names of EntityName factory functions
jkorous-apple Dec 8, 2025
97c31ee
[clang][ssaf][NFC] Fix test for entity name of implicit declaration
jkorous-apple Dec 8, 2025
140e84f
[clang][ssaf][NFC] Rename and document makeTU in BuildNamespace
jkorous-apple Dec 8, 2025
a6807ee
[clang][ssaf][NFC] Add comment on implementation of EntityName being …
jkorous-apple Dec 8, 2025
2fb3d37
[clang][ssaf][NFC] Fix header comment
jkorous-apple Dec 8, 2025
db34258
[clang][ssaf] Remove braces from single-line if block
jkorous-apple Dec 8, 2025
691482c
[clang][ssaf] Apply clang-format
jkorous-apple Dec 8, 2025
0bf64a4
[clang][ssaf][NFC] Label C++ snippets in unit tests
jkorous-apple Dec 8, 2025
e8d8805
[clang][ssaf][NFC] clang-format tests
jkorous-apple Dec 8, 2025
8889e79
Merge branch 'main' into ssaf
jkorous-apple Dec 8, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
44 changes: 44 additions & 0 deletions clang/include/clang/Analysis/Scalable/ASTEntityMapping.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
//===- ASTMapping.h - AST to SSAF Entity mapping ----------------*- C++ -*-===//
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: header name and the comment does not match.

//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//
//===----------------------------------------------------------------------===//

#ifndef LLVM_CLANG_ANALYSIS_SCALABLE_ASTENTITYMAPPING_H
#define LLVM_CLANG_ANALYSIS_SCALABLE_ASTENTITYMAPPING_H

#include "clang/Analysis/Scalable/Model/EntityName.h"
#include "clang/AST/Decl.h"
#include "llvm/ADT/StringRef.h"
#include <optional>

namespace clang::ssaf {

/// Maps a declaration to an EntityName.
///
/// Supported declaration types for entity mapping:
/// - Functions and methods
/// - Global Variables
/// - Function parameters
/// - Struct/class/union type definitions
/// - Struct/class/union fields
///
/// Implicit declarations and compiler builtins are not mapped.
///
/// \param D The declaration to map. Must not be null.
///
/// \return An EntityName if the declaration can be mapped, std::nullopt otherwise.
std::optional<EntityName> getLocalEntityNameForDecl(const Decl* D);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does the "Local" refer to in the name?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The names got shortened to getEntityName and getEntityNameForReturn.

The word "Local" referred to the fact that the resulting name would be TU-local, i.e., without BuildNamespace qualification.


/// Maps a function return type to an EntityName.
///
/// \param FD The function declaration. Must not be null.
///
/// \return An EntityName for the function's return type.
std::optional<EntityName> getLocalEntityNameForFunctionReturn(const FunctionDecl* FD);

} // namespace clang::ssaf

#endif // LLVM_CLANG_ANALYSIS_SCALABLE_ASTENTITYMAPPING_H
86 changes: 86 additions & 0 deletions clang/include/clang/Analysis/Scalable/Model/BuildNamespace.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,86 @@
//===- BuildNamespace.h -----------------------------------------*- C++ -*-===//
//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//
//===----------------------------------------------------------------------===//

#ifndef LLVM_CLANG_ANALYSIS_SCALABLE_MODEL_BUILDNAMESPACE_H
#define LLVM_CLANG_ANALYSIS_SCALABLE_MODEL_BUILDNAMESPACE_H

#include "llvm/ADT/STLExtras.h"
#include "llvm/ADT/StringRef.h"
#include <optional>
#include <string>
#include <vector>

namespace clang::ssaf {

enum class BuildNamespaceKind : unsigned short {
CompilationUnit,
LinkUnit
};

llvm::StringRef toString(BuildNamespaceKind BNK);

std::optional<BuildNamespaceKind> parseBuildNamespaceKind(llvm::StringRef Str);

/// Represents a single step in the build process.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you expand this comment? It's not immediately obvious how a step in the build process relates to the notion of "namespace". Alternatively (or maybe additionally), consider adding some background explanation to the beginning of the file.

class BuildNamespace {
BuildNamespaceKind Kind;
std::string Name;
public:
BuildNamespace(BuildNamespaceKind Kind, llvm::StringRef Name)
: Kind(Kind), Name(Name.str()) {}

static BuildNamespace makeTU(llvm::StringRef CompilationId);

bool operator==(const BuildNamespace& Other) const;
bool operator!=(const BuildNamespace& Other) const;
bool operator<(const BuildNamespace& Other) const;

friend class SerializationFormat;
};

/// Represents a sequence of steps in the build process.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it important to preserve the information what entities belong to the same step or could a NestedBuildNamespace be not a different type just the result of merging some BuildNamespaces?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me preface this by saying that the namespace design will possibly evolve when we actually implement entity linking. To some degree this is just an educated guess.

We could just use std::vector<BuildNamespace> instead of introducing another class but I expect that we will have operations on that type and having this class will allow the interfaces to enforce type correctness and to be self-descriptive. Alternatively, we could have BuildNamespace implemented as std::vector<std::pair<BuildNamespaceKind, std::string>> and sink the per-element logic to its implementation.

So, yes, I can imagine there's only a single type but can you please elaborate on why would you prefer that?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not have a strong preference or reason other than I feel like this would make the implementation a bit more concise and I usually prefer the more concise form until there is a need to split functionality out.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I imagine BuildNamespace covers namespace identification and will be used in the interface and as an implementation detail (at least initially) of NestedBuildNamespace. NestedBuildNamespace would cover the "nested" part by providing additional operations.

I imagine there will be interfaces and declarations where having the option to distinguish between them containing a single namespace level or some number of nested namespaces will improve clarity of the code.

I suggest we keep the current design and improve it once we see the use-cases. Would you be ok with that?

class NestedBuildNamespace {
friend class SerializationFormat;

std::vector<BuildNamespace> Namespaces;

public:
NestedBuildNamespace() = default;

explicit NestedBuildNamespace(const std::vector<BuildNamespace>& Namespaces)
: Namespaces(Namespaces) {}

explicit NestedBuildNamespace(const BuildNamespace& N) {
Namespaces.push_back(N);
}

static NestedBuildNamespace makeTU(llvm::StringRef CompilationId);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please comment.


/// Creates a new NestedBuildNamespace by appending additional namespace.
///
/// \param Namespace The namespace to append.
NestedBuildNamespace makeQualified(NestedBuildNamespace Namespace) const {
auto Copy = *this;
Copy.Namespaces.reserve(Copy.Namespaces.size() + Namespace.Namespaces.size());
llvm::append_range(Copy.Namespaces, Namespace.Namespaces);
return Copy;
}

bool empty() const;

bool operator==(const NestedBuildNamespace& Other) const;
bool operator!=(const NestedBuildNamespace& Other) const;
bool operator<(const NestedBuildNamespace& Other) const;

friend class JSONWriter;
friend class LinkUnitResolution;
};

} // namespace clang::ssaf

#endif // LLVM_CLANG_ANALYSIS_SCALABLE_MODEL_BUILDNAMESPACE_H
48 changes: 48 additions & 0 deletions clang/include/clang/Analysis/Scalable/Model/EntityName.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
//===- EntityName.h ---------------------------------------------*- C++ -*-===//
//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//
//===----------------------------------------------------------------------===//

#ifndef LLVM_CLANG_ANALYSIS_SCALABLE_MODEL_ENTITYNAME_H
#define LLVM_CLANG_ANALYSIS_SCALABLE_MODEL_ENTITYNAME_H

#include "clang/Analysis/Scalable/Model/BuildNamespace.h"
#include "llvm/ADT/SmallString.h"
#include "llvm/ADT/StringRef.h"
#include <string>

namespace clang::ssaf {

/// Uniquely identifies an entity in a program.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have some concerns about how certain entities will be uniquely identified. There are lots of edge cases as the result of the compilation model of C and C++. Specifically, the same entity might be forward declared in multiple compilation units (in entirely different header files). Do we consider those to be the same entity or not? Or do we only care about complete types? We do not need to answer all of these questions but I think we should start planning around these scenarios early and document the behavior via some test.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a fair amount of complexity in relating entities indeed.
At this point we are just creating foundations that we will use in the future (hopefully soon) to implement entity linking and resolution. We know that we will need more data than just EntityName for that. We are also aware that until we actually implement it, this is only our best guess of what will be necessary. If we later find out the EntityName is missing something, we will enhance it. The right place to document, implement and test all that logic will be in the patches that introduce entity linker and the APIs for lookup in analysis result data.

///
/// EntityName provides a globally unique identifier for program entities that remains
/// stable across compilation boundaries. This enables whole-program analysis to track
/// and relate entities across separately compiled translation units.
class EntityName {
std::string USR;
llvm::SmallString<16> Suffix;
NestedBuildNamespace Namespace;

public:
EntityName(llvm::StringRef USR, llvm::StringRef Suffix,
NestedBuildNamespace Namespace);

bool operator==(const EntityName& Other) const;
bool operator!=(const EntityName& Other) const;
bool operator<(const EntityName& Other) const;

/// Creates a new EntityName with additional build namespace qualification.
///
/// \param Namespace The namespace steps to append to this entity's namespace.
EntityName makeQualified(NestedBuildNamespace Namespace) const;

friend class LinkUnitResolution;
friend class SerializationFormat;
};

} // namespace clang::ssaf

#endif // LLVM_CLANG_ANALYSIS_SCALABLE_MODEL_ENTITYNAME_H
1 change: 1 addition & 0 deletions clang/lib/Analysis/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -50,3 +50,4 @@ add_clang_library(clangAnalysis
add_subdirectory(plugins)
add_subdirectory(FlowSensitive)
add_subdirectory(LifetimeSafety)
add_subdirectory(Scalable)
83 changes: 83 additions & 0 deletions clang/lib/Analysis/Scalable/ASTEntityMapping.cpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,83 @@
//===- ASTMapping.cpp - AST to SSAF Entity mapping --------------*- C++ -*-===//
//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//
//===----------------------------------------------------------------------===//
//
// This file implements utilities for mapping AST declarations to SSAF entities.
//
//===----------------------------------------------------------------------===//

#include "clang/Analysis/Scalable/ASTEntityMapping.h"
#include "clang/AST/Decl.h"
#include "clang/Analysis/Scalable/Model/BuildNamespace.h"
#include "clang/Index/USRGeneration.h"
#include "llvm/ADT/SmallString.h"

namespace clang::ssaf {

std::optional<EntityName> getLocalEntityNameForDecl(const Decl* D) {
if (!D)
return std::nullopt;

if (D->isImplicit())
return std::nullopt;

if (isa<FunctionDecl>(D) && cast<FunctionDecl>(D)->getBuiltinID())
return std::nullopt;

if (!isa<FunctionDecl>(D) && !isa<ParmVarDecl>(D) && !isa<VarDecl>(D) &&
!isa<FieldDecl>(D) && !isa<RecordDecl>(D))
return std::nullopt;

llvm::SmallString<16> Suffix;
const Decl *USRDecl = D;

// For parameters, use the parent function's USR with parameter index as suffix
if (const auto * PVD = dyn_cast<ParmVarDecl>(D)) {
const auto *FD = dyn_cast_or_null<FunctionDecl>(PVD->getParentFunctionOrMethod());
if (!FD)
return std::nullopt;
USRDecl = FD;

const auto ParamIdx = PVD->getFunctionScopeIndex();
llvm::raw_svector_ostream OS(Suffix);
// Parameter uses function's USR with 1-based index as suffix
OS << (ParamIdx + 1);
}

llvm::SmallString<128> USRBuf;
if (clang::index::generateUSRForDecl(USRDecl, USRBuf)) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: braces could be dropped here.

return std::nullopt;
}

if (USRBuf.empty())
return std::nullopt;

return EntityName(USRBuf.str(), Suffix, {});
}

std::optional<EntityName> getLocalEntityNameForFunctionReturn(const FunctionDecl* FD) {
if (!FD)
return std::nullopt;

if (FD->isImplicit())
return std::nullopt;

if (FD->getBuiltinID())
return std::nullopt;

llvm::SmallString<128> USRBuf;
if (clang::index::generateUSRForDecl(FD, USRBuf)) {
return std::nullopt;
}

if (USRBuf.empty())
return std::nullopt;

return EntityName(USRBuf.str(), "0", {});
}

} // namespace clang::ssaf
19 changes: 19 additions & 0 deletions clang/lib/Analysis/Scalable/CMakeLists.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
set(LLVM_LINK_COMPONENTS
Support
)

add_clang_library(clangAnalysisScalable
ASTEntityMapping.cpp
Model/BuildNamespace.cpp
Model/EntityName.cpp

LINK_LIBS
clangAST
clangASTMatchers
clangBasic
clangIndex
clangLex
clangFrontend

DEPENDS
)
70 changes: 70 additions & 0 deletions clang/lib/Analysis/Scalable/Model/BuildNamespace.cpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,70 @@
//===- BuildNamespace.cpp ---------------------------------------*- C++ -*-===//
//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//
//===----------------------------------------------------------------------===//

#include "clang/Analysis/Scalable/Model/BuildNamespace.h"
#include "llvm/Support/ErrorHandling.h"

namespace clang::ssaf {

llvm::StringRef toString(BuildNamespaceKind BNK) {
switch(BNK) {
case BuildNamespaceKind::CompilationUnit: return "compilation_unit";
case BuildNamespaceKind::LinkUnit: return "link_unit";
}
llvm_unreachable("Unknown BuildNamespaceKind");
}

std::optional<BuildNamespaceKind> parseBuildNamespaceKind(llvm::StringRef Str) {
if (Str == "compilation_unit")
return BuildNamespaceKind::CompilationUnit;
if (Str == "link_unit")
return BuildNamespaceKind::LinkUnit;
return std::nullopt;
}

BuildNamespace BuildNamespace::makeTU(llvm::StringRef CompilationId) {
return BuildNamespace{BuildNamespaceKind::CompilationUnit, CompilationId.str()};
}

bool BuildNamespace::operator==(const BuildNamespace& Other) const {
return Kind == Other.Kind && Name == Other.Name;
}

bool BuildNamespace::operator!=(const BuildNamespace& Other) const {
return !(*this == Other);
}

bool BuildNamespace::operator<(const BuildNamespace& Other) const {
if (Kind != Other.Kind)
return Kind < Other.Kind;
return Name < Other.Name;
}

NestedBuildNamespace NestedBuildNamespace::makeTU(llvm::StringRef CompilationId) {
NestedBuildNamespace Result;
Result.Namespaces.push_back(BuildNamespace::makeTU(CompilationId));
return Result;
}

bool NestedBuildNamespace::empty() const {
return Namespaces.empty();
}

bool NestedBuildNamespace::operator==(const NestedBuildNamespace& Other) const {
return Namespaces == Other.Namespaces;
}

bool NestedBuildNamespace::operator!=(const NestedBuildNamespace& Other) const {
return !(*this == Other);
}

bool NestedBuildNamespace::operator<(const NestedBuildNamespace& Other) const {
return Namespaces < Other.Namespaces;
}

} // namespace clang::ssaf
Loading
Loading