Skip to content

Conversation

@jkorous-apple
Copy link
Contributor

Add core abstractions for identifying program entities across compilation and link unit boundaries in the Scalable Static Analysis Framework (SSAF).

Introduces three key components:

  • BuildNamespace: Represents build artifacts (compilation units, link units)
  • EntityName: Globally unique entity identifiers across compilation boundaries
  • AST mapping: Functions to map Clang AST declarations to EntityNames

Entity identification uses Unified Symbol Resolution (USR) as the underlying mechanism, with extensions for sub-entities (parameters, return values) via suffixes. The abstraction allows whole-program analysis by providing stable identifiers that persist across separately compiled translation units.

Add core abstractions for identifying program entities across compilation
and link unit boundaries in the Scalable Static Analysis Framework (SSAF).

Introduces three key components:
- BuildNamespace: Represents build artifacts (compilation units, link units)
- EntityName: Globally unique entity identifiers across compilation boundaries
- AST mapping: Functions to map Clang AST declarations to EntityNames

Entity identification uses Unified Symbol Resolution (USR) as the underlying
mechanism, with extensions for sub-entities (parameters, return values) via
suffixes. The abstraction allows whole-program analysis by providing stable
identifiers that persist across separately compiled translation units.
@llvmbot llvmbot added clang Clang issues not falling into any other category clang:analysis labels Nov 22, 2025
@llvmbot
Copy link
Member

llvmbot commented Nov 22, 2025

@llvm/pr-subscribers-clang

Author: Jan Korous (jkorous-apple)

Changes

Add core abstractions for identifying program entities across compilation and link unit boundaries in the Scalable Static Analysis Framework (SSAF).

Introduces three key components:

  • BuildNamespace: Represents build artifacts (compilation units, link units)
  • EntityName: Globally unique entity identifiers across compilation boundaries
  • AST mapping: Functions to map Clang AST declarations to EntityNames

Entity identification uses Unified Symbol Resolution (USR) as the underlying mechanism, with extensions for sub-entities (parameters, return values) via suffixes. The abstraction allows whole-program analysis by providing stable identifiers that persist across separately compiled translation units.


Patch is 30.31 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/169131.diff

13 Files Affected:

  • (added) clang/include/clang/Analysis/Scalable/ASTEntityMapping.h (+46)
  • (added) clang/include/clang/Analysis/Scalable/Model/BuildNamespace.h (+84)
  • (added) clang/include/clang/Analysis/Scalable/Model/EntityName.h (+47)
  • (modified) clang/lib/Analysis/CMakeLists.txt (+1)
  • (added) clang/lib/Analysis/Scalable/ASTEntityMapping.cpp (+85)
  • (added) clang/lib/Analysis/Scalable/CMakeLists.txt (+19)
  • (added) clang/lib/Analysis/Scalable/Model/BuildNamespace.cpp (+72)
  • (added) clang/lib/Analysis/Scalable/Model/EntityName.cpp (+44)
  • (modified) clang/unittests/Analysis/CMakeLists.txt (+1)
  • (added) clang/unittests/Analysis/Scalable/ASTEntityMappingTest.cpp (+343)
  • (added) clang/unittests/Analysis/Scalable/BuildNamespaceTest.cpp (+99)
  • (added) clang/unittests/Analysis/Scalable/CMakeLists.txt (+18)
  • (added) clang/unittests/Analysis/Scalable/EntityNameTest.cpp (+62)
diff --git a/clang/include/clang/Analysis/Scalable/ASTEntityMapping.h b/clang/include/clang/Analysis/Scalable/ASTEntityMapping.h
new file mode 100644
index 0000000000000..a137e8b741821
--- /dev/null
+++ b/clang/include/clang/Analysis/Scalable/ASTEntityMapping.h
@@ -0,0 +1,46 @@
+//===- ASTMapping.h - AST to SSAF Entity mapping ----------------*- C++ -*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#ifndef LLVM_CLANG_ANALYSIS_SCALABLE_ASTMAPPING_H
+#define LLVM_CLANG_ANALYSIS_SCALABLE_ASTMAPPING_H
+
+#include "clang/Analysis/Scalable/Model/EntityName.h"
+#include "clang/AST/Decl.h"
+#include "llvm/ADT/StringRef.h"
+#include <optional>
+
+namespace clang {
+namespace ssaf {
+
+/// Maps a declaration to an EntityName.
+///
+/// Supported declaration types for entity mapping:
+/// - Functions and methods
+/// - Global Variables
+/// - Function parameters
+/// - Struct/class/union type definitions
+/// - Struct/class/union fields
+///
+/// Implicit declarations and compiler builtins are not mapped.
+///
+/// \param D The declaration to map. Must not be null.
+///
+/// \return An EntityName if the declaration can be mapped, std::nullopt otherwise.
+std::optional<EntityName> getLocalEntityNameForDecl(const Decl* D);
+
+/// Maps a function return type to an EntityName.
+///
+/// \param FD The function declaration. Must not be null.
+///
+/// \return An EntityName for the function's return type.
+std::optional<EntityName> getLocalEntityNameForFunctionReturn(const FunctionDecl* FD);
+
+} // namespace ssaf
+} // namespace clang
+
+#endif // LLVM_CLANG_ANALYSIS_SCALABLE_ASTMAPPING_H
diff --git a/clang/include/clang/Analysis/Scalable/Model/BuildNamespace.h b/clang/include/clang/Analysis/Scalable/Model/BuildNamespace.h
new file mode 100644
index 0000000000000..c4bf7146e461f
--- /dev/null
+++ b/clang/include/clang/Analysis/Scalable/Model/BuildNamespace.h
@@ -0,0 +1,84 @@
+//===- BuildNamespace.h -----------------------------------------*- C++ -*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#ifndef LLVM_CLANG_ANALYSIS_SCALABLE_BUILD_NAMESPACE_H
+#define LLVM_CLANG_ANALYSIS_SCALABLE_BUILD_NAMESPACE_H
+
+#include "llvm/ADT/StringRef.h"
+#include <optional>
+#include <string>
+#include <vector>
+
+namespace clang {
+namespace ssaf {
+
+enum class BuildNamespaceKind : unsigned short {
+  CompilationUnit,
+  LinkUnit
+};
+
+std::string toString(BuildNamespaceKind BNK);
+
+std::optional<BuildNamespaceKind> parseBuildNamespaceKind(llvm::StringRef Str);
+
+/// Represents a single step in the build process.
+class BuildNamespace {
+  BuildNamespaceKind Kind;
+  std::string Name;
+public:
+  BuildNamespace(BuildNamespaceKind Kind, llvm::StringRef Name)
+    : Kind(Kind), Name(Name.str()) {}
+
+  static BuildNamespace makeTU(llvm::StringRef CompilationId);
+
+  bool operator==(const BuildNamespace& Other) const;
+  bool operator!=(const BuildNamespace& Other) const;
+  bool operator<(const BuildNamespace& Other) const;
+
+  friend class SerializationFormat;
+};
+
+/// Represents a sequence of steps in the build process.
+class NestedBuildNamespace {
+  friend class SerializationFormat;
+
+  std::vector<BuildNamespace> Namespaces;
+
+public:
+  NestedBuildNamespace() = default;
+
+  explicit NestedBuildNamespace(const std::vector<BuildNamespace>& Namespaces)
+    : Namespaces(Namespaces) {}
+
+  explicit NestedBuildNamespace(const BuildNamespace& N) {
+    Namespaces.push_back(N);
+  }
+
+  static NestedBuildNamespace makeTU(llvm::StringRef CompilationId);
+
+  NestedBuildNamespace makeQualified(NestedBuildNamespace Namespace) {
+    auto Copy = *this;
+    for (const auto& N : Namespace.Namespaces)
+      Copy.Namespaces.push_back(N);
+    return Copy;
+  }
+
+  bool empty() const;
+
+  bool operator==(const NestedBuildNamespace& Other) const;
+  bool operator!=(const NestedBuildNamespace& Other) const;
+  bool operator<(const NestedBuildNamespace& Other) const;
+
+  friend class JSONWriter;
+  friend class LinkUnitResolution;
+};
+
+} // namespace ssaf
+} // namespace clang
+
+#endif // LLVM_CLANG_ANALYSIS_SCALABLE_BUILD_NAMESPACE_H
diff --git a/clang/include/clang/Analysis/Scalable/Model/EntityName.h b/clang/include/clang/Analysis/Scalable/Model/EntityName.h
new file mode 100644
index 0000000000000..7f11ef0589bf5
--- /dev/null
+++ b/clang/include/clang/Analysis/Scalable/Model/EntityName.h
@@ -0,0 +1,47 @@
+//===- EntityName.h ---------------------------------------------*- C++ -*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#ifndef LLVM_CLANG_ANALYSIS_SCALABLE_ENTITY_NAME_H
+#define LLVM_CLANG_ANALYSIS_SCALABLE_ENTITY_NAME_H
+
+#include "clang/Analysis/Scalable/Model/BuildNamespace.h"
+#include "llvm/ADT/SmallString.h"
+#include "llvm/ADT/StringRef.h"
+#include <string>
+
+namespace clang {
+namespace ssaf {
+
+/// Uniquely identifies an entity in a program.
+///
+/// EntityName provides a globally unique identifier for program entities that remains
+/// stable across compilation boundaries. This enables whole-program analysis to track
+/// and relate entities across separately compiled translation units.
+class EntityName {
+  std::string USR;
+  llvm::SmallString<16> Suffix;
+  NestedBuildNamespace Namespace;
+
+public:
+  EntityName(llvm::StringRef USR, llvm::StringRef Suffix,
+             NestedBuildNamespace Namespace);
+
+  bool operator==(const EntityName& Other) const;
+  bool operator!=(const EntityName& Other) const;
+  bool operator<(const EntityName& Other) const;
+
+  EntityName makeQualified(NestedBuildNamespace Namespace);
+
+  friend class LinkUnitResolution;
+  friend class SerializationFormat;
+};
+
+} // namespace ssaf
+} // namespace clang
+
+#endif // LLVM_CLANG_ANALYSIS_SCALABLE_ENTITY_NAME_H
diff --git a/clang/lib/Analysis/CMakeLists.txt b/clang/lib/Analysis/CMakeLists.txt
index 1dbd4153d856f..99a2ec684e149 100644
--- a/clang/lib/Analysis/CMakeLists.txt
+++ b/clang/lib/Analysis/CMakeLists.txt
@@ -50,3 +50,4 @@ add_clang_library(clangAnalysis
 add_subdirectory(plugins)
 add_subdirectory(FlowSensitive)
 add_subdirectory(LifetimeSafety)
+add_subdirectory(Scalable)
diff --git a/clang/lib/Analysis/Scalable/ASTEntityMapping.cpp b/clang/lib/Analysis/Scalable/ASTEntityMapping.cpp
new file mode 100644
index 0000000000000..87d05e8aa5dc3
--- /dev/null
+++ b/clang/lib/Analysis/Scalable/ASTEntityMapping.cpp
@@ -0,0 +1,85 @@
+//===- ASTMapping.cpp - AST to SSAF Entity mapping --------------*- C++ -*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+//
+// This file implements utilities for mapping AST declarations to SSAF entities.
+//
+//===----------------------------------------------------------------------===//
+
+#include "clang/Analysis/Scalable/ASTEntityMapping.h"
+#include "clang/AST/Decl.h"
+#include "clang/Analysis/Scalable/Model/BuildNamespace.h"
+#include "clang/Index/USRGeneration.h"
+#include "llvm/ADT/SmallString.h"
+
+namespace clang {
+namespace ssaf {
+
+std::optional<EntityName> getLocalEntityNameForDecl(const Decl* D) {
+  if (!D)
+    return std::nullopt;
+
+  if (D->isImplicit())
+    return std::nullopt;
+
+  if (isa<FunctionDecl>(D) && cast<FunctionDecl>(D)->getBuiltinID())
+    return std::nullopt;
+
+  if (!isa<FunctionDecl>(D) && !isa<ParmVarDecl>(D) && !isa<VarDecl>(D) &&
+      !isa<FieldDecl>(D) && !isa<RecordDecl>(D))
+    return std::nullopt;
+
+  llvm::SmallString<16> Suffix;
+  const Decl *USRDecl = D;
+
+  // For parameters, use the parent function's USR with parameter index as suffix
+  if (const auto * PVD = dyn_cast<ParmVarDecl>(D)) {
+    const auto *FD = dyn_cast_or_null<FunctionDecl>(PVD->getParentFunctionOrMethod());
+    if (!FD)
+      return std::nullopt;
+    USRDecl = FD;
+
+    const auto ParamIdx = PVD->getFunctionScopeIndex();
+    llvm::raw_svector_ostream OS(Suffix);
+    // Parameter uses function's USR with 1-based index as suffix
+    OS << (ParamIdx + 1);
+  }
+
+  llvm::SmallString<128> USRBuf;
+  if (clang::index::generateUSRForDecl(USRDecl, USRBuf)) {
+    return std::nullopt;
+  }
+
+  if (USRBuf.empty())
+    return std::nullopt;
+
+  return EntityName(USRBuf.str(), Suffix, {});
+}
+
+std::optional<EntityName> getLocalEntityNameForFunctionReturn(const FunctionDecl* FD) {
+  if (!FD)
+    return std::nullopt;
+
+  if (FD->isImplicit())
+    return std::nullopt;
+
+  if (FD->getBuiltinID())
+    return std::nullopt;
+
+  llvm::SmallString<128> USRBuf;
+  if (clang::index::generateUSRForDecl(FD, USRBuf)) {
+    return std::nullopt;
+  }
+
+  if (USRBuf.empty())
+    return std::nullopt;
+
+  return EntityName(USRBuf.str(), "0", {});
+}
+
+} // namespace ssaf
+} // namespace clang
diff --git a/clang/lib/Analysis/Scalable/CMakeLists.txt b/clang/lib/Analysis/Scalable/CMakeLists.txt
new file mode 100644
index 0000000000000..ea4693f102cb2
--- /dev/null
+++ b/clang/lib/Analysis/Scalable/CMakeLists.txt
@@ -0,0 +1,19 @@
+set(LLVM_LINK_COMPONENTS
+  Support
+  )
+
+add_clang_library(clangAnalysisScalable
+  ASTEntityMapping.cpp
+  Model/BuildNamespace.cpp
+  Model/EntityName.cpp
+
+  LINK_LIBS
+  clangAST
+  clangASTMatchers
+  clangBasic
+  clangIndex
+  clangLex
+  clangFrontend
+
+  DEPENDS
+  )
diff --git a/clang/lib/Analysis/Scalable/Model/BuildNamespace.cpp b/clang/lib/Analysis/Scalable/Model/BuildNamespace.cpp
new file mode 100644
index 0000000000000..5284a9a87a33a
--- /dev/null
+++ b/clang/lib/Analysis/Scalable/Model/BuildNamespace.cpp
@@ -0,0 +1,72 @@
+//===- BuildNamespace.cpp ---------------------------------------*- C++ -*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#include "clang/Analysis/Scalable/Model/BuildNamespace.h"
+#include "llvm/Support/ErrorHandling.h"
+
+namespace clang {
+namespace ssaf {
+
+std::string toString(BuildNamespaceKind BNK) {
+  switch(BNK) {
+    case BuildNamespaceKind::CompilationUnit: return "compilation_unit";
+    case BuildNamespaceKind::LinkUnit: return "link_unit";
+  }
+  llvm_unreachable("Unknown BuildNamespaceKind");
+}
+
+std::optional<BuildNamespaceKind> parseBuildNamespaceKind(llvm::StringRef Str) {
+  if (Str == "compilation_unit")
+    return BuildNamespaceKind::CompilationUnit;
+  if (Str == "link_unit")
+    return BuildNamespaceKind::LinkUnit;
+  return std::nullopt;
+}
+
+BuildNamespace BuildNamespace::makeTU(llvm::StringRef CompilationId) {
+  return BuildNamespace{BuildNamespaceKind::CompilationUnit, CompilationId.str()};
+}
+
+bool BuildNamespace::operator==(const BuildNamespace& Other) const {
+  return Kind == Other.Kind && Name == Other.Name;
+}
+
+bool BuildNamespace::operator!=(const BuildNamespace& Other) const {
+  return !(*this == Other);
+}
+
+bool BuildNamespace::operator<(const BuildNamespace& Other) const {
+  if (Kind != Other.Kind)
+    return Kind < Other.Kind;
+  return Name < Other.Name;
+}
+
+NestedBuildNamespace NestedBuildNamespace::makeTU(llvm::StringRef CompilationId) {
+  NestedBuildNamespace Result;
+  Result.Namespaces.push_back(BuildNamespace::makeTU(CompilationId));
+  return Result;
+}
+
+bool NestedBuildNamespace::empty() const {
+  return Namespaces.empty();
+}
+
+bool NestedBuildNamespace::operator==(const NestedBuildNamespace& Other) const {
+  return Namespaces == Other.Namespaces;
+}
+
+bool NestedBuildNamespace::operator!=(const NestedBuildNamespace& Other) const {
+  return !(*this == Other);
+}
+
+bool NestedBuildNamespace::operator<(const NestedBuildNamespace& Other) const {
+  return Namespaces < Other.Namespaces;
+}
+
+} // namespace ssaf
+} // namespace clang
diff --git a/clang/lib/Analysis/Scalable/Model/EntityName.cpp b/clang/lib/Analysis/Scalable/Model/EntityName.cpp
new file mode 100644
index 0000000000000..3404ecc58fac2
--- /dev/null
+++ b/clang/lib/Analysis/Scalable/Model/EntityName.cpp
@@ -0,0 +1,44 @@
+//===- EntityName.cpp -------------------------------------------*- C++ -*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#include "clang/Analysis/Scalable/Model/EntityName.h"
+
+namespace clang {
+namespace ssaf {
+
+EntityName::EntityName(llvm::StringRef USR, llvm::StringRef Suffix,
+                       NestedBuildNamespace Namespace)
+  : USR(USR.str()), Suffix(Suffix), Namespace(std::move(Namespace)) {}
+
+bool EntityName::operator==(const EntityName& Other) const {
+  return USR == Other.USR &&
+         Suffix == Other.Suffix &&
+         Namespace == Other.Namespace;
+}
+
+bool EntityName::operator!=(const EntityName& Other) const {
+  return !(*this == Other);
+}
+
+bool EntityName::operator<(const EntityName& Other) const {
+  if (USR != Other.USR)
+    return USR < Other.USR;
+  if (Suffix != Other.Suffix)
+    return Suffix.str() < Other.Suffix.str();
+  return Namespace < Other.Namespace;
+}
+
+EntityName EntityName::makeQualified(NestedBuildNamespace Namespace) {
+  auto Copy = *this;
+  Copy.Namespace = Copy.Namespace.makeQualified(Namespace);
+
+  return Copy;
+}
+
+} // namespace ssaf
+} // namespace clang
diff --git a/clang/unittests/Analysis/CMakeLists.txt b/clang/unittests/Analysis/CMakeLists.txt
index e0acf436b37c7..97e768b11db69 100644
--- a/clang/unittests/Analysis/CMakeLists.txt
+++ b/clang/unittests/Analysis/CMakeLists.txt
@@ -26,3 +26,4 @@ add_clang_unittest(ClangAnalysisTests
   )
 
 add_subdirectory(FlowSensitive)
+add_subdirectory(Scalable)
diff --git a/clang/unittests/Analysis/Scalable/ASTEntityMappingTest.cpp b/clang/unittests/Analysis/Scalable/ASTEntityMappingTest.cpp
new file mode 100644
index 0000000000000..8de0df246cb65
--- /dev/null
+++ b/clang/unittests/Analysis/Scalable/ASTEntityMappingTest.cpp
@@ -0,0 +1,343 @@
+//===- unittests/Analysis/Scalable/ASTEntityMappingTest.cpp --------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#include "clang/Analysis/Scalable/ASTEntityMapping.h"
+#include "clang/AST/ASTContext.h"
+#include "clang/AST/Decl.h"
+#include "clang/ASTMatchers/ASTMatchFinder.h"
+#include "clang/ASTMatchers/ASTMatchers.h"
+#include "clang/Tooling/Tooling.h"
+#include "gtest/gtest.h"
+
+using namespace clang::ast_matchers;
+
+namespace clang {
+namespace ssaf {
+namespace {
+
+// Helper function to find a declaration by name
+template <typename DeclType>
+const DeclType *findDecl(ASTContext &Ctx, StringRef Name) {
+  auto Matcher = namedDecl(hasName(Name)).bind("decl");
+  auto Matches = match(Matcher, Ctx);
+  if (Matches.empty())
+    return nullptr;
+  return Matches[0].getNodeAs<DeclType>("decl");
+}
+
+TEST(ASTEntityMappingTest, FunctionDecl) {
+  auto AST = tooling::buildASTFromCode("void foo() {}");
+  auto &Ctx = AST->getASTContext();
+
+  const auto *FD = findDecl<FunctionDecl>(Ctx, "foo");
+  ASSERT_NE(FD, nullptr);
+
+  auto EntityName = getLocalEntityNameForDecl(FD);
+  EXPECT_TRUE(EntityName.has_value());
+}
+
+TEST(ASTEntityMappingTest, VarDecl) {
+  auto AST = tooling::buildASTFromCode("int x = 42;");
+  auto &Ctx = AST->getASTContext();
+
+  const auto *VD = findDecl<VarDecl>(Ctx, "x");
+  ASSERT_NE(VD, nullptr);
+
+  auto EntityName = getLocalEntityNameForDecl(VD);
+  EXPECT_TRUE(EntityName.has_value());
+}
+
+TEST(ASTEntityMappingTest, ParmVarDecl) {
+  auto AST = tooling::buildASTFromCode("void foo(int x) {}");
+  auto &Ctx = AST->getASTContext();
+
+  const auto *FD = findDecl<FunctionDecl>(Ctx, "foo");
+  ASSERT_NE(FD, nullptr);
+  ASSERT_GT(FD->param_size(), 0u);
+
+  const auto *PVD = FD->getParamDecl(0);
+  ASSERT_NE(PVD, nullptr);
+
+  auto EntityName = getLocalEntityNameForDecl(PVD);
+  EXPECT_TRUE(EntityName.has_value());
+}
+
+TEST(ASTEntityMappingTest, RecordDecl) {
+  auto AST = tooling::buildASTFromCode("struct S {};");
+  auto &Ctx = AST->getASTContext();
+
+  const auto *RD = findDecl<RecordDecl>(Ctx, "S");
+  ASSERT_NE(RD, nullptr);
+
+  auto EntityName = getLocalEntityNameForDecl(RD);
+  EXPECT_TRUE(EntityName.has_value());
+}
+
+TEST(ASTEntityMappingTest, FieldDecl) {
+  auto AST = tooling::buildASTFromCode("struct S { int field; };");
+  auto &Ctx = AST->getASTContext();
+
+  const auto *FD = findDecl<FieldDecl>(Ctx, "field");
+  ASSERT_NE(FD, nullptr);
+
+  auto EntityName = getLocalEntityNameForDecl(FD);
+  EXPECT_TRUE(EntityName.has_value());
+}
+
+TEST(ASTEntityMappingTest, NullDecl) {
+  auto EntityName = getLocalEntityNameForDecl(nullptr);
+  EXPECT_FALSE(EntityName.has_value());
+}
+
+TEST(ASTEntityMappingTest, ImplicitDecl) {
+  auto AST = tooling::buildASTFromCode(R"(
+    struct S {
+      S() = default;
+    };
+  )", "test.cpp", std::make_shared<PCHContainerOperations>());
+  auto &Ctx = AST->getASTContext();
+
+  const auto *RD = findDecl<CXXRecordDecl>(Ctx, "S");
+  ASSERT_NE(RD, nullptr);
+
+  // Find the implicitly-declared copy constructor
+  for (const auto *Ctor : RD->ctors()) {
+    if (Ctor->isCopyConstructor() && Ctor->isImplicit()) {
+      auto EntityName = getLocalEntityNameForDecl(Ctor);
+      EXPECT_FALSE(EntityName.has_value());
+      return;
+    }
+  }
+}
+
+TEST(ASTEntityMappingTest, BuiltinFunction) {
+  auto AST = tooling::buildASTFromCode(R"(
+    void test() {
+      __builtin_memcpy(0, 0, 0);
+    }
+  )");
+  auto &Ctx = AST->getASTContext();
+
+  // Find the builtin call
+  auto Matcher = callExpr().bind("call");
+  auto Matches = match(Matcher, Ctx);
+  ASSERT_FALSE(Matches.empty());
+
+  const auto *CE = Matches[0].getNodeAs<CallExpr>("call");
+  ASSERT_NE(CE, nullptr);
+
+  const auto *Callee = CE->getDirectCallee();
+  if (Callee && Callee->getBuiltinID()) {
+    auto EntityName = getLocalEntityNameForDecl(Callee);
+    EXPECT_FALSE(EntityName.has_value());
+  }
+}
+
+TEST(ASTEntityMappingTest, UnsupportedDecl) {
+  auto AST = tooling::buildASTFromCode("namespace N {}");
+  auto &Ctx = AST->getASTContext();
+
+  const auto *ND = findDecl<NamespaceDecl>(Ctx, "N");
+  ASSERT_NE(ND, nullptr);
+
+  auto EntityName = getLocalEntityNameForDecl(ND);
+  EXPECT_FALSE(EntityName.has_value());
+}
+
+TEST(ASTEntityMappingTest, FunctionReturn) {
+  auto AST = tooling::buildASTFromCode("int foo() { return 42; }");
+  auto &Ctx = AST->getASTContext();
+
+  const auto *FD = findDecl<FunctionDecl>(Ctx, "foo");
+  ASSERT_NE(FD, nullptr);
+
+  auto EntityName = getLocalEntityNameForFunctionReturn(FD);
+  EXPECT_TRUE(EntityName.has_value());
+}
+
+TEST(ASTEntityMappingTest, FunctionReturnNull) {
+  auto EntityName = getLocalEntityNameForFunctionReturn(nullptr);
+  EXPECT_FALSE(EntityName.has_value());
+}
+
+TEST(ASTEntityMappingTest, FunctionReturnBuiltin) {
+  auto AST = tooling::buildASTFromCode(R"(
+    void test() {
+      __builtin_memcpy(0, 0, 0);
+    }
+  )");
+  auto &Ctx = AST->getASTContext();
+
+  // Find the builtin call
+  auto Matcher = callExpr().bind("call");
+  auto Matches = match(Matcher, Ctx);
+  ASSERT_FALSE(Matches.empty());
+
+  const auto *CE = Matches[0].getNodeAs<CallExpr>("call");
+  ASSERT_NE(CE, nullptr);
+
+  const auto *Callee = CE->getDi...
[truncated]

@llvmbot
Copy link
Member

llvmbot commented Nov 22, 2025

@llvm/pr-subscribers-clang-analysis

Author: Jan Korous (jkorous-apple)

Changes

Add core abstractions for identifying program entities across compilation and link unit boundaries in the Scalable Static Analysis Framework (SSAF).

Introduces three key components:

  • BuildNamespace: Represents build artifacts (compilation units, link units)
  • EntityName: Globally unique entity identifiers across compilation boundaries
  • AST mapping: Functions to map Clang AST declarations to EntityNames

Entity identification uses Unified Symbol Resolution (USR) as the underlying mechanism, with extensions for sub-entities (parameters, return values) via suffixes. The abstraction allows whole-program analysis by providing stable identifiers that persist across separately compiled translation units.


Patch is 30.31 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/169131.diff

13 Files Affected:

  • (added) clang/include/clang/Analysis/Scalable/ASTEntityMapping.h (+46)
  • (added) clang/include/clang/Analysis/Scalable/Model/BuildNamespace.h (+84)
  • (added) clang/include/clang/Analysis/Scalable/Model/EntityName.h (+47)
  • (modified) clang/lib/Analysis/CMakeLists.txt (+1)
  • (added) clang/lib/Analysis/Scalable/ASTEntityMapping.cpp (+85)
  • (added) clang/lib/Analysis/Scalable/CMakeLists.txt (+19)
  • (added) clang/lib/Analysis/Scalable/Model/BuildNamespace.cpp (+72)
  • (added) clang/lib/Analysis/Scalable/Model/EntityName.cpp (+44)
  • (modified) clang/unittests/Analysis/CMakeLists.txt (+1)
  • (added) clang/unittests/Analysis/Scalable/ASTEntityMappingTest.cpp (+343)
  • (added) clang/unittests/Analysis/Scalable/BuildNamespaceTest.cpp (+99)
  • (added) clang/unittests/Analysis/Scalable/CMakeLists.txt (+18)
  • (added) clang/unittests/Analysis/Scalable/EntityNameTest.cpp (+62)
diff --git a/clang/include/clang/Analysis/Scalable/ASTEntityMapping.h b/clang/include/clang/Analysis/Scalable/ASTEntityMapping.h
new file mode 100644
index 0000000000000..a137e8b741821
--- /dev/null
+++ b/clang/include/clang/Analysis/Scalable/ASTEntityMapping.h
@@ -0,0 +1,46 @@
+//===- ASTMapping.h - AST to SSAF Entity mapping ----------------*- C++ -*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#ifndef LLVM_CLANG_ANALYSIS_SCALABLE_ASTMAPPING_H
+#define LLVM_CLANG_ANALYSIS_SCALABLE_ASTMAPPING_H
+
+#include "clang/Analysis/Scalable/Model/EntityName.h"
+#include "clang/AST/Decl.h"
+#include "llvm/ADT/StringRef.h"
+#include <optional>
+
+namespace clang {
+namespace ssaf {
+
+/// Maps a declaration to an EntityName.
+///
+/// Supported declaration types for entity mapping:
+/// - Functions and methods
+/// - Global Variables
+/// - Function parameters
+/// - Struct/class/union type definitions
+/// - Struct/class/union fields
+///
+/// Implicit declarations and compiler builtins are not mapped.
+///
+/// \param D The declaration to map. Must not be null.
+///
+/// \return An EntityName if the declaration can be mapped, std::nullopt otherwise.
+std::optional<EntityName> getLocalEntityNameForDecl(const Decl* D);
+
+/// Maps a function return type to an EntityName.
+///
+/// \param FD The function declaration. Must not be null.
+///
+/// \return An EntityName for the function's return type.
+std::optional<EntityName> getLocalEntityNameForFunctionReturn(const FunctionDecl* FD);
+
+} // namespace ssaf
+} // namespace clang
+
+#endif // LLVM_CLANG_ANALYSIS_SCALABLE_ASTMAPPING_H
diff --git a/clang/include/clang/Analysis/Scalable/Model/BuildNamespace.h b/clang/include/clang/Analysis/Scalable/Model/BuildNamespace.h
new file mode 100644
index 0000000000000..c4bf7146e461f
--- /dev/null
+++ b/clang/include/clang/Analysis/Scalable/Model/BuildNamespace.h
@@ -0,0 +1,84 @@
+//===- BuildNamespace.h -----------------------------------------*- C++ -*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#ifndef LLVM_CLANG_ANALYSIS_SCALABLE_BUILD_NAMESPACE_H
+#define LLVM_CLANG_ANALYSIS_SCALABLE_BUILD_NAMESPACE_H
+
+#include "llvm/ADT/StringRef.h"
+#include <optional>
+#include <string>
+#include <vector>
+
+namespace clang {
+namespace ssaf {
+
+enum class BuildNamespaceKind : unsigned short {
+  CompilationUnit,
+  LinkUnit
+};
+
+std::string toString(BuildNamespaceKind BNK);
+
+std::optional<BuildNamespaceKind> parseBuildNamespaceKind(llvm::StringRef Str);
+
+/// Represents a single step in the build process.
+class BuildNamespace {
+  BuildNamespaceKind Kind;
+  std::string Name;
+public:
+  BuildNamespace(BuildNamespaceKind Kind, llvm::StringRef Name)
+    : Kind(Kind), Name(Name.str()) {}
+
+  static BuildNamespace makeTU(llvm::StringRef CompilationId);
+
+  bool operator==(const BuildNamespace& Other) const;
+  bool operator!=(const BuildNamespace& Other) const;
+  bool operator<(const BuildNamespace& Other) const;
+
+  friend class SerializationFormat;
+};
+
+/// Represents a sequence of steps in the build process.
+class NestedBuildNamespace {
+  friend class SerializationFormat;
+
+  std::vector<BuildNamespace> Namespaces;
+
+public:
+  NestedBuildNamespace() = default;
+
+  explicit NestedBuildNamespace(const std::vector<BuildNamespace>& Namespaces)
+    : Namespaces(Namespaces) {}
+
+  explicit NestedBuildNamespace(const BuildNamespace& N) {
+    Namespaces.push_back(N);
+  }
+
+  static NestedBuildNamespace makeTU(llvm::StringRef CompilationId);
+
+  NestedBuildNamespace makeQualified(NestedBuildNamespace Namespace) {
+    auto Copy = *this;
+    for (const auto& N : Namespace.Namespaces)
+      Copy.Namespaces.push_back(N);
+    return Copy;
+  }
+
+  bool empty() const;
+
+  bool operator==(const NestedBuildNamespace& Other) const;
+  bool operator!=(const NestedBuildNamespace& Other) const;
+  bool operator<(const NestedBuildNamespace& Other) const;
+
+  friend class JSONWriter;
+  friend class LinkUnitResolution;
+};
+
+} // namespace ssaf
+} // namespace clang
+
+#endif // LLVM_CLANG_ANALYSIS_SCALABLE_BUILD_NAMESPACE_H
diff --git a/clang/include/clang/Analysis/Scalable/Model/EntityName.h b/clang/include/clang/Analysis/Scalable/Model/EntityName.h
new file mode 100644
index 0000000000000..7f11ef0589bf5
--- /dev/null
+++ b/clang/include/clang/Analysis/Scalable/Model/EntityName.h
@@ -0,0 +1,47 @@
+//===- EntityName.h ---------------------------------------------*- C++ -*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#ifndef LLVM_CLANG_ANALYSIS_SCALABLE_ENTITY_NAME_H
+#define LLVM_CLANG_ANALYSIS_SCALABLE_ENTITY_NAME_H
+
+#include "clang/Analysis/Scalable/Model/BuildNamespace.h"
+#include "llvm/ADT/SmallString.h"
+#include "llvm/ADT/StringRef.h"
+#include <string>
+
+namespace clang {
+namespace ssaf {
+
+/// Uniquely identifies an entity in a program.
+///
+/// EntityName provides a globally unique identifier for program entities that remains
+/// stable across compilation boundaries. This enables whole-program analysis to track
+/// and relate entities across separately compiled translation units.
+class EntityName {
+  std::string USR;
+  llvm::SmallString<16> Suffix;
+  NestedBuildNamespace Namespace;
+
+public:
+  EntityName(llvm::StringRef USR, llvm::StringRef Suffix,
+             NestedBuildNamespace Namespace);
+
+  bool operator==(const EntityName& Other) const;
+  bool operator!=(const EntityName& Other) const;
+  bool operator<(const EntityName& Other) const;
+
+  EntityName makeQualified(NestedBuildNamespace Namespace);
+
+  friend class LinkUnitResolution;
+  friend class SerializationFormat;
+};
+
+} // namespace ssaf
+} // namespace clang
+
+#endif // LLVM_CLANG_ANALYSIS_SCALABLE_ENTITY_NAME_H
diff --git a/clang/lib/Analysis/CMakeLists.txt b/clang/lib/Analysis/CMakeLists.txt
index 1dbd4153d856f..99a2ec684e149 100644
--- a/clang/lib/Analysis/CMakeLists.txt
+++ b/clang/lib/Analysis/CMakeLists.txt
@@ -50,3 +50,4 @@ add_clang_library(clangAnalysis
 add_subdirectory(plugins)
 add_subdirectory(FlowSensitive)
 add_subdirectory(LifetimeSafety)
+add_subdirectory(Scalable)
diff --git a/clang/lib/Analysis/Scalable/ASTEntityMapping.cpp b/clang/lib/Analysis/Scalable/ASTEntityMapping.cpp
new file mode 100644
index 0000000000000..87d05e8aa5dc3
--- /dev/null
+++ b/clang/lib/Analysis/Scalable/ASTEntityMapping.cpp
@@ -0,0 +1,85 @@
+//===- ASTMapping.cpp - AST to SSAF Entity mapping --------------*- C++ -*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+//
+// This file implements utilities for mapping AST declarations to SSAF entities.
+//
+//===----------------------------------------------------------------------===//
+
+#include "clang/Analysis/Scalable/ASTEntityMapping.h"
+#include "clang/AST/Decl.h"
+#include "clang/Analysis/Scalable/Model/BuildNamespace.h"
+#include "clang/Index/USRGeneration.h"
+#include "llvm/ADT/SmallString.h"
+
+namespace clang {
+namespace ssaf {
+
+std::optional<EntityName> getLocalEntityNameForDecl(const Decl* D) {
+  if (!D)
+    return std::nullopt;
+
+  if (D->isImplicit())
+    return std::nullopt;
+
+  if (isa<FunctionDecl>(D) && cast<FunctionDecl>(D)->getBuiltinID())
+    return std::nullopt;
+
+  if (!isa<FunctionDecl>(D) && !isa<ParmVarDecl>(D) && !isa<VarDecl>(D) &&
+      !isa<FieldDecl>(D) && !isa<RecordDecl>(D))
+    return std::nullopt;
+
+  llvm::SmallString<16> Suffix;
+  const Decl *USRDecl = D;
+
+  // For parameters, use the parent function's USR with parameter index as suffix
+  if (const auto * PVD = dyn_cast<ParmVarDecl>(D)) {
+    const auto *FD = dyn_cast_or_null<FunctionDecl>(PVD->getParentFunctionOrMethod());
+    if (!FD)
+      return std::nullopt;
+    USRDecl = FD;
+
+    const auto ParamIdx = PVD->getFunctionScopeIndex();
+    llvm::raw_svector_ostream OS(Suffix);
+    // Parameter uses function's USR with 1-based index as suffix
+    OS << (ParamIdx + 1);
+  }
+
+  llvm::SmallString<128> USRBuf;
+  if (clang::index::generateUSRForDecl(USRDecl, USRBuf)) {
+    return std::nullopt;
+  }
+
+  if (USRBuf.empty())
+    return std::nullopt;
+
+  return EntityName(USRBuf.str(), Suffix, {});
+}
+
+std::optional<EntityName> getLocalEntityNameForFunctionReturn(const FunctionDecl* FD) {
+  if (!FD)
+    return std::nullopt;
+
+  if (FD->isImplicit())
+    return std::nullopt;
+
+  if (FD->getBuiltinID())
+    return std::nullopt;
+
+  llvm::SmallString<128> USRBuf;
+  if (clang::index::generateUSRForDecl(FD, USRBuf)) {
+    return std::nullopt;
+  }
+
+  if (USRBuf.empty())
+    return std::nullopt;
+
+  return EntityName(USRBuf.str(), "0", {});
+}
+
+} // namespace ssaf
+} // namespace clang
diff --git a/clang/lib/Analysis/Scalable/CMakeLists.txt b/clang/lib/Analysis/Scalable/CMakeLists.txt
new file mode 100644
index 0000000000000..ea4693f102cb2
--- /dev/null
+++ b/clang/lib/Analysis/Scalable/CMakeLists.txt
@@ -0,0 +1,19 @@
+set(LLVM_LINK_COMPONENTS
+  Support
+  )
+
+add_clang_library(clangAnalysisScalable
+  ASTEntityMapping.cpp
+  Model/BuildNamespace.cpp
+  Model/EntityName.cpp
+
+  LINK_LIBS
+  clangAST
+  clangASTMatchers
+  clangBasic
+  clangIndex
+  clangLex
+  clangFrontend
+
+  DEPENDS
+  )
diff --git a/clang/lib/Analysis/Scalable/Model/BuildNamespace.cpp b/clang/lib/Analysis/Scalable/Model/BuildNamespace.cpp
new file mode 100644
index 0000000000000..5284a9a87a33a
--- /dev/null
+++ b/clang/lib/Analysis/Scalable/Model/BuildNamespace.cpp
@@ -0,0 +1,72 @@
+//===- BuildNamespace.cpp ---------------------------------------*- C++ -*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#include "clang/Analysis/Scalable/Model/BuildNamespace.h"
+#include "llvm/Support/ErrorHandling.h"
+
+namespace clang {
+namespace ssaf {
+
+std::string toString(BuildNamespaceKind BNK) {
+  switch(BNK) {
+    case BuildNamespaceKind::CompilationUnit: return "compilation_unit";
+    case BuildNamespaceKind::LinkUnit: return "link_unit";
+  }
+  llvm_unreachable("Unknown BuildNamespaceKind");
+}
+
+std::optional<BuildNamespaceKind> parseBuildNamespaceKind(llvm::StringRef Str) {
+  if (Str == "compilation_unit")
+    return BuildNamespaceKind::CompilationUnit;
+  if (Str == "link_unit")
+    return BuildNamespaceKind::LinkUnit;
+  return std::nullopt;
+}
+
+BuildNamespace BuildNamespace::makeTU(llvm::StringRef CompilationId) {
+  return BuildNamespace{BuildNamespaceKind::CompilationUnit, CompilationId.str()};
+}
+
+bool BuildNamespace::operator==(const BuildNamespace& Other) const {
+  return Kind == Other.Kind && Name == Other.Name;
+}
+
+bool BuildNamespace::operator!=(const BuildNamespace& Other) const {
+  return !(*this == Other);
+}
+
+bool BuildNamespace::operator<(const BuildNamespace& Other) const {
+  if (Kind != Other.Kind)
+    return Kind < Other.Kind;
+  return Name < Other.Name;
+}
+
+NestedBuildNamespace NestedBuildNamespace::makeTU(llvm::StringRef CompilationId) {
+  NestedBuildNamespace Result;
+  Result.Namespaces.push_back(BuildNamespace::makeTU(CompilationId));
+  return Result;
+}
+
+bool NestedBuildNamespace::empty() const {
+  return Namespaces.empty();
+}
+
+bool NestedBuildNamespace::operator==(const NestedBuildNamespace& Other) const {
+  return Namespaces == Other.Namespaces;
+}
+
+bool NestedBuildNamespace::operator!=(const NestedBuildNamespace& Other) const {
+  return !(*this == Other);
+}
+
+bool NestedBuildNamespace::operator<(const NestedBuildNamespace& Other) const {
+  return Namespaces < Other.Namespaces;
+}
+
+} // namespace ssaf
+} // namespace clang
diff --git a/clang/lib/Analysis/Scalable/Model/EntityName.cpp b/clang/lib/Analysis/Scalable/Model/EntityName.cpp
new file mode 100644
index 0000000000000..3404ecc58fac2
--- /dev/null
+++ b/clang/lib/Analysis/Scalable/Model/EntityName.cpp
@@ -0,0 +1,44 @@
+//===- EntityName.cpp -------------------------------------------*- C++ -*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#include "clang/Analysis/Scalable/Model/EntityName.h"
+
+namespace clang {
+namespace ssaf {
+
+EntityName::EntityName(llvm::StringRef USR, llvm::StringRef Suffix,
+                       NestedBuildNamespace Namespace)
+  : USR(USR.str()), Suffix(Suffix), Namespace(std::move(Namespace)) {}
+
+bool EntityName::operator==(const EntityName& Other) const {
+  return USR == Other.USR &&
+         Suffix == Other.Suffix &&
+         Namespace == Other.Namespace;
+}
+
+bool EntityName::operator!=(const EntityName& Other) const {
+  return !(*this == Other);
+}
+
+bool EntityName::operator<(const EntityName& Other) const {
+  if (USR != Other.USR)
+    return USR < Other.USR;
+  if (Suffix != Other.Suffix)
+    return Suffix.str() < Other.Suffix.str();
+  return Namespace < Other.Namespace;
+}
+
+EntityName EntityName::makeQualified(NestedBuildNamespace Namespace) {
+  auto Copy = *this;
+  Copy.Namespace = Copy.Namespace.makeQualified(Namespace);
+
+  return Copy;
+}
+
+} // namespace ssaf
+} // namespace clang
diff --git a/clang/unittests/Analysis/CMakeLists.txt b/clang/unittests/Analysis/CMakeLists.txt
index e0acf436b37c7..97e768b11db69 100644
--- a/clang/unittests/Analysis/CMakeLists.txt
+++ b/clang/unittests/Analysis/CMakeLists.txt
@@ -26,3 +26,4 @@ add_clang_unittest(ClangAnalysisTests
   )
 
 add_subdirectory(FlowSensitive)
+add_subdirectory(Scalable)
diff --git a/clang/unittests/Analysis/Scalable/ASTEntityMappingTest.cpp b/clang/unittests/Analysis/Scalable/ASTEntityMappingTest.cpp
new file mode 100644
index 0000000000000..8de0df246cb65
--- /dev/null
+++ b/clang/unittests/Analysis/Scalable/ASTEntityMappingTest.cpp
@@ -0,0 +1,343 @@
+//===- unittests/Analysis/Scalable/ASTEntityMappingTest.cpp --------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#include "clang/Analysis/Scalable/ASTEntityMapping.h"
+#include "clang/AST/ASTContext.h"
+#include "clang/AST/Decl.h"
+#include "clang/ASTMatchers/ASTMatchFinder.h"
+#include "clang/ASTMatchers/ASTMatchers.h"
+#include "clang/Tooling/Tooling.h"
+#include "gtest/gtest.h"
+
+using namespace clang::ast_matchers;
+
+namespace clang {
+namespace ssaf {
+namespace {
+
+// Helper function to find a declaration by name
+template <typename DeclType>
+const DeclType *findDecl(ASTContext &Ctx, StringRef Name) {
+  auto Matcher = namedDecl(hasName(Name)).bind("decl");
+  auto Matches = match(Matcher, Ctx);
+  if (Matches.empty())
+    return nullptr;
+  return Matches[0].getNodeAs<DeclType>("decl");
+}
+
+TEST(ASTEntityMappingTest, FunctionDecl) {
+  auto AST = tooling::buildASTFromCode("void foo() {}");
+  auto &Ctx = AST->getASTContext();
+
+  const auto *FD = findDecl<FunctionDecl>(Ctx, "foo");
+  ASSERT_NE(FD, nullptr);
+
+  auto EntityName = getLocalEntityNameForDecl(FD);
+  EXPECT_TRUE(EntityName.has_value());
+}
+
+TEST(ASTEntityMappingTest, VarDecl) {
+  auto AST = tooling::buildASTFromCode("int x = 42;");
+  auto &Ctx = AST->getASTContext();
+
+  const auto *VD = findDecl<VarDecl>(Ctx, "x");
+  ASSERT_NE(VD, nullptr);
+
+  auto EntityName = getLocalEntityNameForDecl(VD);
+  EXPECT_TRUE(EntityName.has_value());
+}
+
+TEST(ASTEntityMappingTest, ParmVarDecl) {
+  auto AST = tooling::buildASTFromCode("void foo(int x) {}");
+  auto &Ctx = AST->getASTContext();
+
+  const auto *FD = findDecl<FunctionDecl>(Ctx, "foo");
+  ASSERT_NE(FD, nullptr);
+  ASSERT_GT(FD->param_size(), 0u);
+
+  const auto *PVD = FD->getParamDecl(0);
+  ASSERT_NE(PVD, nullptr);
+
+  auto EntityName = getLocalEntityNameForDecl(PVD);
+  EXPECT_TRUE(EntityName.has_value());
+}
+
+TEST(ASTEntityMappingTest, RecordDecl) {
+  auto AST = tooling::buildASTFromCode("struct S {};");
+  auto &Ctx = AST->getASTContext();
+
+  const auto *RD = findDecl<RecordDecl>(Ctx, "S");
+  ASSERT_NE(RD, nullptr);
+
+  auto EntityName = getLocalEntityNameForDecl(RD);
+  EXPECT_TRUE(EntityName.has_value());
+}
+
+TEST(ASTEntityMappingTest, FieldDecl) {
+  auto AST = tooling::buildASTFromCode("struct S { int field; };");
+  auto &Ctx = AST->getASTContext();
+
+  const auto *FD = findDecl<FieldDecl>(Ctx, "field");
+  ASSERT_NE(FD, nullptr);
+
+  auto EntityName = getLocalEntityNameForDecl(FD);
+  EXPECT_TRUE(EntityName.has_value());
+}
+
+TEST(ASTEntityMappingTest, NullDecl) {
+  auto EntityName = getLocalEntityNameForDecl(nullptr);
+  EXPECT_FALSE(EntityName.has_value());
+}
+
+TEST(ASTEntityMappingTest, ImplicitDecl) {
+  auto AST = tooling::buildASTFromCode(R"(
+    struct S {
+      S() = default;
+    };
+  )", "test.cpp", std::make_shared<PCHContainerOperations>());
+  auto &Ctx = AST->getASTContext();
+
+  const auto *RD = findDecl<CXXRecordDecl>(Ctx, "S");
+  ASSERT_NE(RD, nullptr);
+
+  // Find the implicitly-declared copy constructor
+  for (const auto *Ctor : RD->ctors()) {
+    if (Ctor->isCopyConstructor() && Ctor->isImplicit()) {
+      auto EntityName = getLocalEntityNameForDecl(Ctor);
+      EXPECT_FALSE(EntityName.has_value());
+      return;
+    }
+  }
+}
+
+TEST(ASTEntityMappingTest, BuiltinFunction) {
+  auto AST = tooling::buildASTFromCode(R"(
+    void test() {
+      __builtin_memcpy(0, 0, 0);
+    }
+  )");
+  auto &Ctx = AST->getASTContext();
+
+  // Find the builtin call
+  auto Matcher = callExpr().bind("call");
+  auto Matches = match(Matcher, Ctx);
+  ASSERT_FALSE(Matches.empty());
+
+  const auto *CE = Matches[0].getNodeAs<CallExpr>("call");
+  ASSERT_NE(CE, nullptr);
+
+  const auto *Callee = CE->getDirectCallee();
+  if (Callee && Callee->getBuiltinID()) {
+    auto EntityName = getLocalEntityNameForDecl(Callee);
+    EXPECT_FALSE(EntityName.has_value());
+  }
+}
+
+TEST(ASTEntityMappingTest, UnsupportedDecl) {
+  auto AST = tooling::buildASTFromCode("namespace N {}");
+  auto &Ctx = AST->getASTContext();
+
+  const auto *ND = findDecl<NamespaceDecl>(Ctx, "N");
+  ASSERT_NE(ND, nullptr);
+
+  auto EntityName = getLocalEntityNameForDecl(ND);
+  EXPECT_FALSE(EntityName.has_value());
+}
+
+TEST(ASTEntityMappingTest, FunctionReturn) {
+  auto AST = tooling::buildASTFromCode("int foo() { return 42; }");
+  auto &Ctx = AST->getASTContext();
+
+  const auto *FD = findDecl<FunctionDecl>(Ctx, "foo");
+  ASSERT_NE(FD, nullptr);
+
+  auto EntityName = getLocalEntityNameForFunctionReturn(FD);
+  EXPECT_TRUE(EntityName.has_value());
+}
+
+TEST(ASTEntityMappingTest, FunctionReturnNull) {
+  auto EntityName = getLocalEntityNameForFunctionReturn(nullptr);
+  EXPECT_FALSE(EntityName.has_value());
+}
+
+TEST(ASTEntityMappingTest, FunctionReturnBuiltin) {
+  auto AST = tooling::buildASTFromCode(R"(
+    void test() {
+      __builtin_memcpy(0, 0, 0);
+    }
+  )");
+  auto &Ctx = AST->getASTContext();
+
+  // Find the builtin call
+  auto Matcher = callExpr().bind("call");
+  auto Matches = match(Matcher, Ctx);
+  ASSERT_FALSE(Matches.empty());
+
+  const auto *CE = Matches[0].getNodeAs<CallExpr>("call");
+  ASSERT_NE(CE, nullptr);
+
+  const auto *Callee = CE->getDi...
[truncated]

@github-actions
Copy link

github-actions bot commented Nov 22, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

@github-actions
Copy link

github-actions bot commented Nov 22, 2025

🐧 Linux x64 Test Results

  • 111876 tests passed
  • 4484 tests skipped

✅ The build succeeded and all tests passed.

/// \param D The declaration to map. Must not be null.
///
/// \return An EntityName if the declaration can be mapped, std::nullopt otherwise.
std::optional<EntityName> getLocalEntityNameForDecl(const Decl* D);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does the "Local" refer to in the name?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The names got shortened to getEntityName and getEntityNameForReturn.

The word "Local" referred to the fact that the resulting name would be TU-local, i.e., without BuildNamespace qualification.

LinkUnit
};

std::string toString(BuildNamespaceKind BNK);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: could this return a StringRef?

friend class SerializationFormat;
};

/// Represents a sequence of steps in the build process.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it important to preserve the information what entities belong to the same step or could a NestedBuildNamespace be not a different type just the result of merging some BuildNamespaces?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me preface this by saying that the namespace design will possibly evolve when we actually implement entity linking. To some degree this is just an educated guess.

We could just use std::vector<BuildNamespace> instead of introducing another class but I expect that we will have operations on that type and having this class will allow the interfaces to enforce type correctness and to be self-descriptive. Alternatively, we could have BuildNamespace implemented as std::vector<std::pair<BuildNamespaceKind, std::string>> and sink the per-element logic to its implementation.

So, yes, I can imagine there's only a single type but can you please elaborate on why would you prefer that?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not have a strong preference or reason other than I feel like this would make the implementation a bit more concise and I usually prefer the more concise form until there is a need to split functionality out.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I imagine BuildNamespace covers namespace identification and will be used in the interface and as an implementation detail (at least initially) of NestedBuildNamespace. NestedBuildNamespace would cover the "nested" part by providing additional operations.

I imagine there will be interfaces and declarations where having the option to distinguish between them containing a single namespace level or some number of nested namespaces will improve clarity of the code.

I suggest we keep the current design and improve it once we see the use-cases. Would you be ok with that?

Comment on lines 224 to 227
const auto *Decl1 = Matches[0].getNodeAs<FunctionDecl>("decl");
const auto *Decl2 = Matches[1].getNodeAs<FunctionDecl>("decl");
ASSERT_NE(Decl1, nullptr);
ASSERT_NE(Decl2, nullptr);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if it would make sense to explicitly check that Decl1 has no body, but Decl2 does.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed the test logic to check that ALL redeclarations get the same entity name.

Comment on lines 268 to 269
// Use recordDecl(isStruct()) to avoid matching implicit typedefs
auto Matcher = recordDecl(hasName("S"), isStruct()).bind("decl");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't get this comment. I see no typedefs or usings in the example.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed.

Comment on lines 301 to 302
ASSERT_GT(Func1->param_size(), 0u);
ASSERT_GT(Func2->param_size(), 0u);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why don't we check exact parameter counts here with EQ?

Copy link
Contributor Author

@jkorous-apple jkorous-apple Dec 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can. Although we don't strictly care about how many parameters are there. We are just making sure we can subsequently look up the first one.


static NestedBuildNamespace makeTU(llvm::StringRef CompilationId);

NestedBuildNamespace makeQualified(NestedBuildNamespace Namespace) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should document what this does.
That said, this could be a const member function, right?
If it was, that would also give a hint about how to use this API.

@usx95 usx95 self-requested a review November 26, 2025 13:59
Copy link
Collaborator

@ymand ymand left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm really excited to see these patches start to land! I reviewed the header files, focusing on the API. I'll try to get back to implementation and tests soon, but wanted to get these out in the meantime.


std::optional<BuildNamespaceKind> parseBuildNamespaceKind(llvm::StringRef Str);

/// Represents a single step in the build process.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you expand this comment? It's not immediately obvious how a step in the build process relates to the notion of "namespace". Alternatively (or maybe additionally), consider adding some background explanation to the beginning of the file.

Namespaces.push_back(N);
}

static NestedBuildNamespace makeTU(llvm::StringRef CompilationId);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please comment.

/// stable across compilation boundaries. This enables whole-program analysis to track
/// and relate entities across separately compiled translation units.
class EntityName {
std::string USR;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be helpful to explain the roles of these fields. Perhaps copy some of the information from the PR description. To the casual reader "USR" won't have much meaning and since the type is string that won't inform them either. Additionally, the need for the suffix won't be obvious. Similarly, the role of Namespace in distinguishing between otherwise identical entities.

Alternatively, provide a detailed explanation on the class comments or the constructor.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would actually prefer to keep the implementation details opaque from the users with the idea that we might want to change them later. I've added comments saying that.

In theory I could make the constructor private and declare the getEntityName, etc. in ASTEntityMapping.h as friend-s. I was going to not make this a hard restriction until I get clarity on details of how this would work with custom serialization formats where deserialization will likely need to use the constructor again.

class EntityName {
std::string USR;
llvm::SmallString<16> Suffix;
NestedBuildNamespace Namespace;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm afraid of the size implications of including this. USRs are already large, but including an arbitrarily large vector as well, in each ID, threatens to make this unusable for large scale application.

Does the entity name need the full vector, or could something like a unique 64-bit ID suffice?

Copy link
Contributor Author

@jkorous-apple jkorous-apple Dec 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My plan is to actually use 64-bit IDs to represent entities in most contexts.

A follow-up PR introduces that:
jkorous-apple#1

I imagine EntityName will mostly be used for linking and for mapping entities back to the AST which we need for source code rewriting tools.

The compilation unit IDs and link unit IDs that the namespaces use as names will be supplied to the framework by external tools. We imagine the implementation will need to be space-efficient and while this API doesn't impose such restriction, it is unlikely that we could use long strings.

With the current API design (with EntityIdTable), during summary extraction, we would still have an instance of EntityName for each entity contributing a fact or being referred to by a summary. I plan to factor out the common namespace prefix with compilation unit id and have individual entity names unqualified.

For summary analysis part, when we might need to represent all entities of a program in memory at once, having even a single instance of EntityName for each entity in memory might require too much memory for analysis of large programs and we might need a different representation than EntityName, EntityId and EntityIdTable.

/// \return An EntityName if the declaration can be mapped, std::nullopt otherwise.
std::optional<EntityName> getLocalEntityNameForDecl(const Decl* D);

/// Maps a function return type to an EntityName.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would help to spell out in more detail what you mean/why this specialization is necessary. It's not really the type that's being identified, or you could just separately identify the type. It's specifically the return entity of this function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's an interesting question.

I've tweaked the doc comment a bit but I suspect I am not fully answering your question.

I can imagine that waiting with adding this overload until I can show its use might be better. WDYT?


namespace clang {
namespace ssaf {

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For both of these lookup functions -- I expect these to be used heavily in analysis, so it would be beneficial if they were shorter names. Currently, the names encode a lot of the (existing) type information. Is that necessary/helpful? Could you instead go with a simpler scheme like getEntity and getReturnEntity?

Separately: constructing these is typically expensive, so we use a cache. Consider including a cache object in this library as well. I think that will be the correct choice for most use cases.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, let me update the names.

Re:caching - I imagine you are totally right and we will do that. Since we are building everything from scratch, I would prefer to wait with such optimizations until we understand the use cases a little better though. For example - I imagine the cache should probably keep map<Decl *, EntityID> while entity IDs are not even introduced in this PR yet.
WDYT?

/// EntityName provides a globally unique identifier for program entities that remains
/// stable across compilation boundaries. This enables whole-program analysis to track
/// and relate entities across separately compiled translation units.
class EntityName {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Consider EntityID -- its shorter and (IMO) more intuitive. I think of "name" more for circumstances where the identifier is readable. But, it's a matter of taste.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Please see:
#169131 (comment)

@@ -0,0 +1,44 @@
//===- ASTMapping.h - AST to SSAF Entity mapping ----------------*- C++ -*-===//
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: header name and the comment does not match.

}

llvm::SmallString<128> USRBuf;
if (clang::index::generateUSRForDecl(USRDecl, USRBuf)) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: braces could be dropped here.

@ymand ymand self-requested a review December 5, 2025 15:11
@jkorous-apple
Copy link
Contributor Author

jkorous-apple commented Dec 6, 2025

Thank you all for great feedback!
I have unfortunately been distracted with other tasks this week but I am working my way though all the comments.
EDIT: I am going through the comments in a non-linear fashion. I will get to all of them though.

Copy link
Contributor

@steakhal steakhal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks good enough for me.
I'd suggest everyone to look at the remaining comments and act if it's still relevant, or resolve the comment otherwise.

Copy link
Collaborator

@Xazax-hun Xazax-hun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good to me, I have one concern that I shared inline but it should not be blocking.

#include <string>

namespace clang::ssaf {
/// Uniquely identifies an entity in a program.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have some concerns about how certain entities will be uniquely identified. There are lots of edge cases as the result of the compilation model of C and C++. Specifically, the same entity might be forward declared in multiple compilation units (in entirely different header files). Do we consider those to be the same entity or not? Or do we only care about complete types? We do not need to answer all of these questions but I think we should start planning around these scenarios early and document the behavior via some test.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a fair amount of complexity in relating entities indeed.
At this point we are just creating foundations that we will use in the future (hopefully soon) to implement entity linking and resolution. We know that we will need more data than just EntityName for that. We are also aware that until we actually implement it, this is only our best guess of what will be necessary. If we later find out the EntityName is missing something, we will enhance it. The right place to document, implement and test all that logic will be in the patches that introduce entity linker and the APIs for lookup in analysis result data.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

clang:analysis clang Clang issues not falling into any other category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants