Skip to content

Conversation

@balazske
Copy link
Collaborator

@balazske balazske commented Jul 2, 2025

A new checker for checking if terminating zero is missing from a string. There is an existing unix.cstring.NotNullTerminated checker that looks similar but checks just if a non-string like object is passed to a function. The new checker tests if really the terminating zero is missing. This is the initial version, it can be improved to handle string (or memory) copy functions.

@llvmbot llvmbot added clang Clang issues not falling into any other category clang:static analyzer labels Jul 2, 2025
@llvmbot
Copy link
Member

llvmbot commented Jul 2, 2025

@llvm/pr-subscribers-clang-static-analyzer-1

Author: Balázs Kéri (balazske)

Changes

A new checker for checking if terminating zero is missing from a string. There is an existing unix.cstring.NotNullTerminated checker that looks similar but checks just if a non-string like object is passed to a function. The new checker tests if really the terminating zero is missing. This is the initial version, it can be improved to handle string (or memory) copy functions.


Patch is 21.65 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/146664.diff

9 Files Affected:

  • (modified) clang/docs/analyzer/checkers.rst (+52)
  • (modified) clang/include/clang/StaticAnalyzer/Checkers/Checkers.td (+28)
  • (modified) clang/lib/StaticAnalyzer/Checkers/CMakeLists.txt (+1)
  • (added) clang/lib/StaticAnalyzer/Checkers/MissingTerminatingZeroChecker.cpp (+295)
  • (modified) clang/test/Analysis/analyzer-config.c (+2)
  • (modified) clang/test/Analysis/analyzer-enabled-checkers.c (+1)
  • (added) clang/test/Analysis/cstring-missingterminatingzero-ignore.c (+27)
  • (added) clang/test/Analysis/cstring-missingterminatingzero.c (+91)
  • (modified) clang/test/Analysis/std-c-library-functions-arg-enabled-checkers.c (+1)
diff --git a/clang/docs/analyzer/checkers.rst b/clang/docs/analyzer/checkers.rst
index 26c5028e04955..16265a9742f16 100644
--- a/clang/docs/analyzer/checkers.rst
+++ b/clang/docs/analyzer/checkers.rst
@@ -2098,6 +2098,58 @@ Check the size argument passed into C string functions for common erroneous patt
      // warn: potential buffer overflow
  }
 
+.. _unix-cstring-MissingTerminatingZero:
+
+unix.cstring.MissingTerminatingZero (C)
+"""""""""""""""""""""""""""""""""""""""
+Check for string arguments passed to C library functions where the terminating
+zero is missing.
+
+The checker can only follow initializations with constant values and assignment
+of constant values to string elements.
+
+.. code-block:: c
+
+ int test1() {
+   char buf[4] = {1, 2, 3, 4};
+   return strlen(buf); // warn
+ }
+
+ int test2() {
+   char buf[] = "abcd";
+   buf[4] = 'e';
+   return strlen(buf); // warn
+ }
+
+ int test3() {
+   char buf[4];
+   buf[3] = 100;
+   return strlen(buf + 3); // warn
+ }
+
+**Options**
+
+By default the checker assumes that any parameter of type ``const char *`` to a
+global C system function should be a null-terminated string. Additionally there
+is a list of exceptions which are identified by the function name and parameter
+index. This list is called "ignore list" and contains these default values:
+(``stpncpy``, 1), (``strncat``, 1), (``strncmp``, 0), (``strncmp``, 1),
+(``strncpy``, 1), (``strndup``, 0), (``strnlen``, 0)
+These functions are ignored because they have a length parameter and can work
+with non-null terminated strings. The list can be changed by the following
+options:
+
+* ``OmitDefaultIgnoreFunctions`` (boolean). If true, the default ignore list is
+  cleared. (Independently of ``IgnoreFunctionArgs`` contains values or not.)
+
+* ``IgnoreFunctionArgs``  (string). Can be used to add functions to the ignore
+  list. It should contain entries in form of "<function name> <parameter index>"
+  separated by comma. These values are added to the ignore list. For example
+  ``strlen 0, strcpy 0, strcpy 1`` adds ``strlen`` and ``strcpy`` (both
+  parameters) to the ignore list. A function name can be used at most 2 times
+  (with different parameter values). Default value of the option is an empty
+  string.
+
 .. _unix-cstring-NotNullTerminated:
 
 unix.cstring.NotNullTerminated (C)
diff --git a/clang/include/clang/StaticAnalyzer/Checkers/Checkers.td b/clang/include/clang/StaticAnalyzer/Checkers/Checkers.td
index 2a96df80d1001..31a67c2992edf 100644
--- a/clang/include/clang/StaticAnalyzer/Checkers/Checkers.td
+++ b/clang/include/clang/StaticAnalyzer/Checkers/Checkers.td
@@ -480,6 +480,34 @@ def CStringSyntaxChecker : Checker<"BadSizeArg">,
   Dependencies<[CStringModeling]>,
   Documentation<HasDocumentation>;
 
+def MissingTerminatingZeroChecker
+    : Checker<"MissingTerminatingZero">,
+      HelpText<
+          "Check for string arguments passed to C library functions where the "
+          "terminating zero is missing">,
+      CheckerOptions<
+          [CmdLineOption<
+               Boolean, "OmitDefaultIgnoreFunctions",
+               "The checker checks by default all 'const char *' arguments "
+               "of system library C functions, except for a built-in list "
+               "of functions. If this parameter is set to true, this ignore "
+               "list is not used, but functions can be still specified by "
+               "the 'IgnoreFunctionArgs' option.",
+               "false", Released>,
+           CmdLineOption<
+               String, "IgnoreFunctionArgs",
+               "Specifies a list of functions to ignore by the checker. This "
+               "list should contain comma-separated values in the format "
+               "'<name> <parm>' where <name> is the function name and "
+               "<parm> is the zero-based index of the parameter (with "
+               "'const char *' type). The same function can be specified "
+               "with different parameters at most 2 times. These functions "
+               "are added to the default ignore list, unless "
+               "'OmitDefaultIgnoreFunctions' is true.",
+               "", Released>,
+]>,
+      Documentation<HasDocumentation>;
+
 } // end "unix.cstring"
 
 let ParentPackage = CStringAlpha in {
diff --git a/clang/lib/StaticAnalyzer/Checkers/CMakeLists.txt b/clang/lib/StaticAnalyzer/Checkers/CMakeLists.txt
index 22dd3f0374849..d4960431402c2 100644
--- a/clang/lib/StaticAnalyzer/Checkers/CMakeLists.txt
+++ b/clang/lib/StaticAnalyzer/Checkers/CMakeLists.txt
@@ -64,6 +64,7 @@ add_clang_library(clangStaticAnalyzerCheckers
   MallocChecker.cpp
   MallocSizeofChecker.cpp
   MismatchedIteratorChecker.cpp
+  MissingTerminatingZeroChecker.cpp
   MmapWriteExecChecker.cpp
   MIGChecker.cpp
   MoveChecker.cpp
diff --git a/clang/lib/StaticAnalyzer/Checkers/MissingTerminatingZeroChecker.cpp b/clang/lib/StaticAnalyzer/Checkers/MissingTerminatingZeroChecker.cpp
new file mode 100644
index 0000000000000..292f0c809ac1a
--- /dev/null
+++ b/clang/lib/StaticAnalyzer/Checkers/MissingTerminatingZeroChecker.cpp
@@ -0,0 +1,295 @@
+//=== MissingTerminatingZeroChecker.cpp -------------------------*- C++ -*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+//
+// Check for string arguments passed to C library functions where the
+// terminating zero is missing.
+//
+//===----------------------------------------------------------------------===//
+
+#include "clang/StaticAnalyzer/Checkers/BuiltinCheckerRegistration.h"
+#include "clang/StaticAnalyzer/Core/Checker.h"
+#include "clang/StaticAnalyzer/Core/CheckerManager.h"
+#include "clang/StaticAnalyzer/Core/PathSensitive/CallDescription.h"
+#include "clang/StaticAnalyzer/Core/PathSensitive/CheckerContext.h"
+#include "clang/StaticAnalyzer/Core/PathSensitive/DynamicExtent.h"
+#include "clang/StaticAnalyzer/Core/PathSensitive/MemRegion.h"
+#include "llvm/ADT/BitVector.h"
+#include <sstream>
+
+using namespace clang;
+using namespace ento;
+
+namespace {
+
+struct StringData {
+  const MemRegion *StrRegion;
+  int64_t StrLength;
+  unsigned int Offset;
+  const llvm::BitVector *NonNullData;
+};
+
+class MissingTerminatingZeroChecker
+    : public Checker<check::Bind, check::PreCall> {
+public:
+  void checkBind(SVal L, SVal V, const Stmt *S, CheckerContext &C) const;
+  void checkPreCall(const CallEvent &Call, CheckerContext &C) const;
+
+  void initOptions(bool NoDefaultIgnore, StringRef IgnoreList);
+
+private:
+  const BugType BT{this, "Missing terminating zero"};
+
+  using IgnoreEntry = std::pair<int, int>;
+  /// Functions (identified by name only) to ignore.
+  /// The entry stores a parameter index, or -1.
+  llvm::StringMap<IgnoreEntry> FunctionsToIgnore = {
+      {"stpncpy", {1, -1}}, {"strncat", {1, -1}}, {"strncmp", {0, 1}},
+      {"strncpy", {1, -1}}, {"strndup", {0, -1}}, {"strnlen", {0, -1}},
+  };
+
+  bool checkArg(unsigned int ArgI, CheckerContext &C,
+                const CallEvent &Call) const;
+  bool getStringData(StringData &DataOut, ProgramStateRef State,
+                     SValBuilder &SVB, const MemRegion *StrReg) const;
+  ProgramStateRef setStringData(ProgramStateRef State, Loc L,
+                                const llvm::BitVector &NonNullData) const;
+  void reportBug(ExplodedNode *N, const Expr *E, CheckerContext &C,
+                 const char Msg[]) const;
+};
+
+} // namespace
+
+namespace llvm {
+template <> struct FoldingSetTrait<llvm::BitVector> {
+  static inline void Profile(llvm::BitVector X, FoldingSetNodeID &ID) {
+    ID.AddInteger(X.size());
+    for (unsigned int I = 0; I < X.size(); ++I)
+      ID.AddBoolean(X[I]);
+  }
+};
+} // end namespace llvm
+
+// Contains for a "string" (character array) region if elements are known to be
+// non-zero. The bit vector is indexed by the array element index and is true
+// if the element is known to be non-zero. Size of the vector does not
+// correspond to the extent of the memory region (can be smaller), the missing
+// elements are considered to be false.
+// (A value of 'false' means that the string element is zero or unknown.)
+REGISTER_MAP_WITH_PROGRAMSTATE(NonNullnessData, const MemRegion *,
+                               llvm::BitVector)
+
+void MissingTerminatingZeroChecker::checkBind(SVal L, SVal V, const Stmt *S,
+                                              CheckerContext &C) const {
+  ProgramStateRef State = C.getState();
+  const MemRegion *MR = L.getAsRegion();
+
+  if (const ElementRegion *ER = dyn_cast_or_null<ElementRegion>(MR)) {
+    // Assign value to the index of an array.
+    // Check for applicable array type.
+    QualType ElemType = ER->getValueType().getCanonicalType();
+    if (!ElemType->isCharType())
+      return;
+    if (C.getASTContext().getTypeSizeInChars(ElemType).getQuantity() != 1)
+      return;
+
+    RegionRawOffset ROffset = ER->getAsArrayOffset();
+    unsigned int Index = ROffset.getOffset().getQuantity();
+
+    // If the checker has data about the value to bind, use this information.
+    // Otherwise try to get it from the analyzer.
+    auto GetKnownToBeNonNull = [this, State, &C](SVal V) -> ConditionTruthVal {
+      StringData ExistingSData;
+      if (getStringData(ExistingSData, State, C.getSValBuilder(),
+                        V.getAsRegion()) &&
+          ExistingSData.Offset < ExistingSData.NonNullData->size()) {
+        return ExistingSData.NonNullData->test(ExistingSData.Offset);
+      } else {
+        return State->isNonNull(V);
+      }
+    };
+    ConditionTruthVal VKnownToBeNonNull = GetKnownToBeNonNull(V);
+
+    if (const llvm::BitVector *NNData =
+            State->get<NonNullnessData>(ROffset.getRegion())) {
+      // Update existing data.
+      unsigned int NewSize =
+          Index < NNData->size() ? NNData->size() : Index + 1;
+      // Only extend the vector with 'true' value.
+      if (NewSize > NNData->size() && !VKnownToBeNonNull.isConstrainedTrue())
+        return;
+      llvm::BitVector NNData1(NewSize);
+      NNData1 |= *NNData;
+      NNData1[Index] = VKnownToBeNonNull.isConstrainedTrue();
+      State = State->set<NonNullnessData>(ROffset.getRegion(), NNData1);
+    } else {
+      // Only add new data if 'true' is found.
+      if (!VKnownToBeNonNull.isConstrainedTrue())
+        return;
+      llvm::BitVector NNData1(Index + 1);
+      NNData1[Index] = true;
+      State = State->set<NonNullnessData>(ROffset.getRegion(), NNData1);
+    }
+  } else if (const TypedValueRegion *TR =
+                 dyn_cast_or_null<TypedValueRegion>(MR)) {
+    // Initialize a region with compound value from list or string literal.
+    QualType Type = TR->getValueType().getCanonicalType();
+    if (!Type->isArrayType())
+      return;
+    if (!Type->castAsArrayTypeUnsafe()->getElementType()->isCharType())
+      return;
+
+    if (auto CVal = V.getAs<nonloc::CompoundVal>()) {
+      llvm::BitVector NNData;
+      for (auto Val = CVal->begin(); Val != CVal->end(); ++Val)
+        NNData.push_back(State->isNonNull(*Val).isConstrainedTrue());
+      State = State->set<NonNullnessData>(MR, NNData);
+    } else if (auto MRV = V.getAs<loc::MemRegionVal>()) {
+      if (auto *StrReg = MRV->stripCasts()->getAs<StringRegion>()) {
+        StringRef Str = StrReg->getStringLiteral()->getString();
+        size_t StrL = Str.size();
+        llvm::BitVector NNData(StrL + 1, true);
+        for (unsigned int I = 0; I < StrL; ++I)
+          if (Str[I] == 0)
+            NNData.reset(I);
+        NNData.reset(StrL);
+        State = State->set<NonNullnessData>(MR, NNData);
+      }
+    }
+  }
+  C.addTransition(State);
+}
+
+void MissingTerminatingZeroChecker::checkPreCall(const CallEvent &Call,
+                                                 CheckerContext &C) const {
+  if (!Call.isInSystemHeader() || !Call.isGlobalCFunction())
+    return;
+
+  auto Ignore = [this](StringRef Name) -> IgnoreEntry {
+    auto FIgnore = FunctionsToIgnore.find(Name);
+    if (FIgnore != FunctionsToIgnore.end())
+      return FIgnore->getValue();
+    return IgnoreEntry{-1, -1};
+  }(cast<NamedDecl>(Call.getDecl())->getNameAsString());
+
+  for (const auto &[ArgI, ParmD] : enumerate(Call.parameters())) {
+    QualType ArgTy = ParmD->getType();
+    if (!ArgTy->isPointerType())
+      continue;
+    QualType ArgPointeeTy = ArgTy->getPointeeType();
+    if (!ArgPointeeTy->isCharType() || !ArgPointeeTy.isConstQualified())
+      continue;
+    if (static_cast<int>(ArgI) == Ignore.first ||
+        static_cast<int>(ArgI) == Ignore.second)
+      continue;
+
+    if (checkArg(ArgI, C, Call))
+      return;
+  }
+}
+
+bool MissingTerminatingZeroChecker::checkArg(unsigned int ArgI,
+                                             CheckerContext &C,
+                                             const CallEvent &Call) const {
+  SVal StrArgVal = Call.getArgSVal(ArgI);
+
+  StringData SData;
+  if (!getStringData(SData, C.getState(), C.getSValBuilder(),
+                     StrArgVal.getAsRegion()))
+    return false;
+
+  // Check if all elements in the string are known to be non-zero.
+  // The stored bitvector can have a smaller size.
+  if (SData.NonNullData->size() < SData.StrLength)
+    return false;
+  for (int64_t I = SData.Offset; I < SData.StrLength; ++I)
+    if (!SData.NonNullData->test(I))
+      return false;
+
+  reportBug(C.getPredecessor(), Call.getArgExpr(ArgI), C,
+            "String contains no terminating zero; At this place a "
+            "null-terminated string is expected");
+  return true;
+}
+
+bool MissingTerminatingZeroChecker::getStringData(
+    StringData &DataOut, ProgramStateRef State, SValBuilder &SVB,
+    const MemRegion *StrReg) const {
+  if (!StrReg)
+    return false;
+
+  unsigned int Offset = 0;
+  if (const auto *ElemReg = dyn_cast_or_null<ElementRegion>(StrReg)) {
+    RegionRawOffset ROffset = ElemReg->getAsArrayOffset();
+    StrReg = ROffset.getRegion();
+    if (!StrReg)
+      return false;
+    Offset = ROffset.getOffset().getQuantity();
+  }
+
+  const llvm::BitVector *NNData = State->get<NonNullnessData>(StrReg);
+  if (!NNData)
+    return false;
+
+  DefinedOrUnknownSVal Extent = getDynamicExtent(State, StrReg, SVB);
+  if (Extent.isUnknown())
+    return false;
+  const llvm::APSInt *KnownExtent = SVB.getKnownValue(State, Extent);
+  if (!KnownExtent)
+    return false;
+
+  DataOut.StrRegion = StrReg;
+  DataOut.StrLength = KnownExtent->getExtValue();
+  DataOut.Offset = Offset;
+  DataOut.NonNullData = NNData;
+
+  return true;
+}
+
+void MissingTerminatingZeroChecker::reportBug(ExplodedNode *N, const Expr *E,
+                                              CheckerContext &C,
+                                              const char Msg[]) const {
+  auto R = std::make_unique<PathSensitiveBugReport>(BT, Msg, N);
+  bugreporter::trackExpressionValue(N, E, *R);
+  C.emitReport(std::move(R));
+}
+
+void MissingTerminatingZeroChecker::initOptions(bool NoDefaultIgnore,
+                                                StringRef IgnoreList) {
+  if (NoDefaultIgnore)
+    FunctionsToIgnore.clear();
+  std::istringstream IgnoreInput{std::string(IgnoreList)};
+  std::array<char, 100> I;
+  while (IgnoreInput.getline(&I[0], 100, ';')) {
+    std::istringstream FunctionInput{std::string(&I[0])};
+    std::string FName;
+    int FArg;
+    if (FunctionInput >> FName >> FArg) {
+      IgnoreEntry &E =
+          FunctionsToIgnore.insert({FName, {-1, -1}}).first->getValue();
+      if (E.first == -1)
+        E.first = FArg;
+      else if (E.second == -1)
+        E.second = FArg;
+    }
+  }
+}
+
+void ento::registerMissingTerminatingZeroChecker(CheckerManager &Mgr) {
+  auto *Checker = Mgr.registerChecker<MissingTerminatingZeroChecker>();
+  bool NoDefaultIgnore = Mgr.getAnalyzerOptions().getCheckerBooleanOption(
+      Checker, "OmitDefaultIgnoreFunctions");
+  StringRef IgnoreList = Mgr.getAnalyzerOptions().getCheckerStringOption(
+      Checker, "IgnoreFunctionArgs");
+  Checker->initOptions(NoDefaultIgnore, IgnoreList);
+}
+
+bool ento::shouldRegisterMissingTerminatingZeroChecker(
+    const CheckerManager &Mgr) {
+  // Check if the char type has size of 8 bits (to avoid indexing differences)?
+  return true;
+}
diff --git a/clang/test/Analysis/analyzer-config.c b/clang/test/Analysis/analyzer-config.c
index 7936273415ad4..d477987e51642 100644
--- a/clang/test/Analysis/analyzer-config.c
+++ b/clang/test/Analysis/analyzer-config.c
@@ -137,6 +137,8 @@
 // CHECK-NEXT: unix.StdCLibraryFunctions:DisplayLoadedSummaries = false
 // CHECK-NEXT: unix.StdCLibraryFunctions:ModelPOSIX = true
 // CHECK-NEXT: unix.Stream:Pedantic = false
+// CHECK-NEXT: unix.cstring.MissingTerminatingZero:IgnoreFunctionArgs = ""
+// CHECK-NEXT: unix.cstring.MissingTerminatingZero:OmitDefaultIgnoreFunctions = false
 // CHECK-NEXT: unroll-loops = false
 // CHECK-NEXT: verbose-report-filename = false
 // CHECK-NEXT: widen-loops = false
diff --git a/clang/test/Analysis/analyzer-enabled-checkers.c b/clang/test/Analysis/analyzer-enabled-checkers.c
index 66b9be9795f12..127bd3298c76c 100644
--- a/clang/test/Analysis/analyzer-enabled-checkers.c
+++ b/clang/test/Analysis/analyzer-enabled-checkers.c
@@ -57,6 +57,7 @@
 // CHECK-NEXT: unix.StdCLibraryFunctions
 // CHECK-NEXT: unix.Vfork
 // CHECK-NEXT: unix.cstring.BadSizeArg
+// CHECK-NEXT: unix.cstring.MissingTerminatingZero
 // CHECK-NEXT: unix.cstring.NotNullTerminated
 // CHECK-NEXT: unix.cstring.NullArg
 
diff --git a/clang/test/Analysis/cstring-missingterminatingzero-ignore.c b/clang/test/Analysis/cstring-missingterminatingzero-ignore.c
new file mode 100644
index 0000000000000..8b96bf81c55cd
--- /dev/null
+++ b/clang/test/Analysis/cstring-missingterminatingzero-ignore.c
@@ -0,0 +1,27 @@
+// RUN: %clang_analyze_cc1 -analyzer-checker=unix.cstring.MissingTerminatingZero \
+// RUN:   -analyzer-config unix.cstring.MissingTerminatingZero:IgnoreFunctionArgs='strlen 0;strcpy 1' -verify=all,ignore %s
+// RUN: %clang_analyze_cc1 -analyzer-checker=unix.cstring.MissingTerminatingZero \
+// RUN:   -analyzer-config unix.cstring.MissingTerminatingZero:IgnoreFunctionArgs='strlen 0;strcpy 1' \
+// RUN:   -analyzer-config unix.cstring.MissingTerminatingZero:OmitDefaultIgnoreFunctions=true -verify=all,omitdefault %s
+
+#include "Inputs/system-header-simulator.h"
+
+size_t test1(int i) {
+  char buf[1] = {1};
+  return strlen(buf);
+}
+
+void test2(char *dst) {
+  char src[1] = {1};
+  strcpy(dst, src);
+}
+
+int test3() {
+  const char buf[1] = {1};
+  return execl("path", buf, 4); // all-warning{{String contains no terminating zero}}
+}
+
+void test4(char *dst) {
+  char src[3] = {1, 2, 3};
+  strncpy(dst, src, 3); // omitdefault-warning{{String contains no terminating zero}}
+}
diff --git a/clang/test/Analysis/cstring-missingterminatingzero.c b/clang/test/Analysis/cstring-missingterminatingzero.c
new file mode 100644
index 0000000000000..d0ae6cb19db87
--- /dev/null
+++ b/clang/test/Analysis/cstring-missingterminatingzero.c
@@ -0,0 +1,91 @@
+// RUN: %clang_analyze_cc1 -analyzer-checker=unix.cstring.MissingTerminatingZero -verify %s
+
+#include "Inputs/system-header-simulator.h"
+
+void clang_analyzer_eval(int);
+
+size_t test_init_compound(int i) {
+  char src1[6] = {1,2,3,4,5,6};
+  char src2[6] = {1,2,3,0,5,6};
+  switch (i) {
+  case 1:
+    return strlen(src1); // expected-warning{{String contains no terminating zero}}
+  case 2:
+    return strlen(src1 + 1); // expected-warning{{String contains no terminating zero}}
+  case 3:
+    return strlen(src2);
+  case 4:
+    return strlen(src2 + 4); // expected-warning{{String contains no terminating zero}}
+  case 5:
+    return strlen(src2 + 3);
+  }
+  src1[5] = 0;
+  return strlen(src1);
+}
+
+typedef char CHAR;
+
+size_t test_init_literal(int i) {
+  CHAR src1[] = "abcdef";
+  int l = strlen(src1);
+  src1[6] = '.';
+  src1[3] = 0;
+  switch (i) {
+  case 1:
+    return strlen(src1);
+  case 2:
+    return strlen(src1 + 4); // expected-warning{{String contains no terminating zero}}
+  }
+  return l;
+}
+
+size_t test_init_assign(int i, char a) {
+  char src[6];
+  src[1] = '1';
+  src[2] = '2';
+  src[4] = '4';
+  src[5] = '5';
+
+  switch (i) {
+  case 0:
+    return ...
[truncated]

@llvmbot
Copy link
Member

llvmbot commented Jul 2, 2025

@llvm/pr-subscribers-clang

Author: Balázs Kéri (balazske)

Changes

A new checker for checking if terminating zero is missing from a string. There is an existing unix.cstring.NotNullTerminated checker that looks similar but checks just if a non-string like object is passed to a function. The new checker tests if really the terminating zero is missing. This is the initial version, it can be improved to handle string (or memory) copy functions.


Patch is 21.65 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/146664.diff

9 Files Affected:

  • (modified) clang/docs/analyzer/checkers.rst (+52)
  • (modified) clang/include/clang/StaticAnalyzer/Checkers/Checkers.td (+28)
  • (modified) clang/lib/StaticAnalyzer/Checkers/CMakeLists.txt (+1)
  • (added) clang/lib/StaticAnalyzer/Checkers/MissingTerminatingZeroChecker.cpp (+295)
  • (modified) clang/test/Analysis/analyzer-config.c (+2)
  • (modified) clang/test/Analysis/analyzer-enabled-checkers.c (+1)
  • (added) clang/test/Analysis/cstring-missingterminatingzero-ignore.c (+27)
  • (added) clang/test/Analysis/cstring-missingterminatingzero.c (+91)
  • (modified) clang/test/Analysis/std-c-library-functions-arg-enabled-checkers.c (+1)
diff --git a/clang/docs/analyzer/checkers.rst b/clang/docs/analyzer/checkers.rst
index 26c5028e04955..16265a9742f16 100644
--- a/clang/docs/analyzer/checkers.rst
+++ b/clang/docs/analyzer/checkers.rst
@@ -2098,6 +2098,58 @@ Check the size argument passed into C string functions for common erroneous patt
      // warn: potential buffer overflow
  }
 
+.. _unix-cstring-MissingTerminatingZero:
+
+unix.cstring.MissingTerminatingZero (C)
+"""""""""""""""""""""""""""""""""""""""
+Check for string arguments passed to C library functions where the terminating
+zero is missing.
+
+The checker can only follow initializations with constant values and assignment
+of constant values to string elements.
+
+.. code-block:: c
+
+ int test1() {
+   char buf[4] = {1, 2, 3, 4};
+   return strlen(buf); // warn
+ }
+
+ int test2() {
+   char buf[] = "abcd";
+   buf[4] = 'e';
+   return strlen(buf); // warn
+ }
+
+ int test3() {
+   char buf[4];
+   buf[3] = 100;
+   return strlen(buf + 3); // warn
+ }
+
+**Options**
+
+By default the checker assumes that any parameter of type ``const char *`` to a
+global C system function should be a null-terminated string. Additionally there
+is a list of exceptions which are identified by the function name and parameter
+index. This list is called "ignore list" and contains these default values:
+(``stpncpy``, 1), (``strncat``, 1), (``strncmp``, 0), (``strncmp``, 1),
+(``strncpy``, 1), (``strndup``, 0), (``strnlen``, 0)
+These functions are ignored because they have a length parameter and can work
+with non-null terminated strings. The list can be changed by the following
+options:
+
+* ``OmitDefaultIgnoreFunctions`` (boolean). If true, the default ignore list is
+  cleared. (Independently of ``IgnoreFunctionArgs`` contains values or not.)
+
+* ``IgnoreFunctionArgs``  (string). Can be used to add functions to the ignore
+  list. It should contain entries in form of "<function name> <parameter index>"
+  separated by comma. These values are added to the ignore list. For example
+  ``strlen 0, strcpy 0, strcpy 1`` adds ``strlen`` and ``strcpy`` (both
+  parameters) to the ignore list. A function name can be used at most 2 times
+  (with different parameter values). Default value of the option is an empty
+  string.
+
 .. _unix-cstring-NotNullTerminated:
 
 unix.cstring.NotNullTerminated (C)
diff --git a/clang/include/clang/StaticAnalyzer/Checkers/Checkers.td b/clang/include/clang/StaticAnalyzer/Checkers/Checkers.td
index 2a96df80d1001..31a67c2992edf 100644
--- a/clang/include/clang/StaticAnalyzer/Checkers/Checkers.td
+++ b/clang/include/clang/StaticAnalyzer/Checkers/Checkers.td
@@ -480,6 +480,34 @@ def CStringSyntaxChecker : Checker<"BadSizeArg">,
   Dependencies<[CStringModeling]>,
   Documentation<HasDocumentation>;
 
+def MissingTerminatingZeroChecker
+    : Checker<"MissingTerminatingZero">,
+      HelpText<
+          "Check for string arguments passed to C library functions where the "
+          "terminating zero is missing">,
+      CheckerOptions<
+          [CmdLineOption<
+               Boolean, "OmitDefaultIgnoreFunctions",
+               "The checker checks by default all 'const char *' arguments "
+               "of system library C functions, except for a built-in list "
+               "of functions. If this parameter is set to true, this ignore "
+               "list is not used, but functions can be still specified by "
+               "the 'IgnoreFunctionArgs' option.",
+               "false", Released>,
+           CmdLineOption<
+               String, "IgnoreFunctionArgs",
+               "Specifies a list of functions to ignore by the checker. This "
+               "list should contain comma-separated values in the format "
+               "'<name> <parm>' where <name> is the function name and "
+               "<parm> is the zero-based index of the parameter (with "
+               "'const char *' type). The same function can be specified "
+               "with different parameters at most 2 times. These functions "
+               "are added to the default ignore list, unless "
+               "'OmitDefaultIgnoreFunctions' is true.",
+               "", Released>,
+]>,
+      Documentation<HasDocumentation>;
+
 } // end "unix.cstring"
 
 let ParentPackage = CStringAlpha in {
diff --git a/clang/lib/StaticAnalyzer/Checkers/CMakeLists.txt b/clang/lib/StaticAnalyzer/Checkers/CMakeLists.txt
index 22dd3f0374849..d4960431402c2 100644
--- a/clang/lib/StaticAnalyzer/Checkers/CMakeLists.txt
+++ b/clang/lib/StaticAnalyzer/Checkers/CMakeLists.txt
@@ -64,6 +64,7 @@ add_clang_library(clangStaticAnalyzerCheckers
   MallocChecker.cpp
   MallocSizeofChecker.cpp
   MismatchedIteratorChecker.cpp
+  MissingTerminatingZeroChecker.cpp
   MmapWriteExecChecker.cpp
   MIGChecker.cpp
   MoveChecker.cpp
diff --git a/clang/lib/StaticAnalyzer/Checkers/MissingTerminatingZeroChecker.cpp b/clang/lib/StaticAnalyzer/Checkers/MissingTerminatingZeroChecker.cpp
new file mode 100644
index 0000000000000..292f0c809ac1a
--- /dev/null
+++ b/clang/lib/StaticAnalyzer/Checkers/MissingTerminatingZeroChecker.cpp
@@ -0,0 +1,295 @@
+//=== MissingTerminatingZeroChecker.cpp -------------------------*- C++ -*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+//
+// Check for string arguments passed to C library functions where the
+// terminating zero is missing.
+//
+//===----------------------------------------------------------------------===//
+
+#include "clang/StaticAnalyzer/Checkers/BuiltinCheckerRegistration.h"
+#include "clang/StaticAnalyzer/Core/Checker.h"
+#include "clang/StaticAnalyzer/Core/CheckerManager.h"
+#include "clang/StaticAnalyzer/Core/PathSensitive/CallDescription.h"
+#include "clang/StaticAnalyzer/Core/PathSensitive/CheckerContext.h"
+#include "clang/StaticAnalyzer/Core/PathSensitive/DynamicExtent.h"
+#include "clang/StaticAnalyzer/Core/PathSensitive/MemRegion.h"
+#include "llvm/ADT/BitVector.h"
+#include <sstream>
+
+using namespace clang;
+using namespace ento;
+
+namespace {
+
+struct StringData {
+  const MemRegion *StrRegion;
+  int64_t StrLength;
+  unsigned int Offset;
+  const llvm::BitVector *NonNullData;
+};
+
+class MissingTerminatingZeroChecker
+    : public Checker<check::Bind, check::PreCall> {
+public:
+  void checkBind(SVal L, SVal V, const Stmt *S, CheckerContext &C) const;
+  void checkPreCall(const CallEvent &Call, CheckerContext &C) const;
+
+  void initOptions(bool NoDefaultIgnore, StringRef IgnoreList);
+
+private:
+  const BugType BT{this, "Missing terminating zero"};
+
+  using IgnoreEntry = std::pair<int, int>;
+  /// Functions (identified by name only) to ignore.
+  /// The entry stores a parameter index, or -1.
+  llvm::StringMap<IgnoreEntry> FunctionsToIgnore = {
+      {"stpncpy", {1, -1}}, {"strncat", {1, -1}}, {"strncmp", {0, 1}},
+      {"strncpy", {1, -1}}, {"strndup", {0, -1}}, {"strnlen", {0, -1}},
+  };
+
+  bool checkArg(unsigned int ArgI, CheckerContext &C,
+                const CallEvent &Call) const;
+  bool getStringData(StringData &DataOut, ProgramStateRef State,
+                     SValBuilder &SVB, const MemRegion *StrReg) const;
+  ProgramStateRef setStringData(ProgramStateRef State, Loc L,
+                                const llvm::BitVector &NonNullData) const;
+  void reportBug(ExplodedNode *N, const Expr *E, CheckerContext &C,
+                 const char Msg[]) const;
+};
+
+} // namespace
+
+namespace llvm {
+template <> struct FoldingSetTrait<llvm::BitVector> {
+  static inline void Profile(llvm::BitVector X, FoldingSetNodeID &ID) {
+    ID.AddInteger(X.size());
+    for (unsigned int I = 0; I < X.size(); ++I)
+      ID.AddBoolean(X[I]);
+  }
+};
+} // end namespace llvm
+
+// Contains for a "string" (character array) region if elements are known to be
+// non-zero. The bit vector is indexed by the array element index and is true
+// if the element is known to be non-zero. Size of the vector does not
+// correspond to the extent of the memory region (can be smaller), the missing
+// elements are considered to be false.
+// (A value of 'false' means that the string element is zero or unknown.)
+REGISTER_MAP_WITH_PROGRAMSTATE(NonNullnessData, const MemRegion *,
+                               llvm::BitVector)
+
+void MissingTerminatingZeroChecker::checkBind(SVal L, SVal V, const Stmt *S,
+                                              CheckerContext &C) const {
+  ProgramStateRef State = C.getState();
+  const MemRegion *MR = L.getAsRegion();
+
+  if (const ElementRegion *ER = dyn_cast_or_null<ElementRegion>(MR)) {
+    // Assign value to the index of an array.
+    // Check for applicable array type.
+    QualType ElemType = ER->getValueType().getCanonicalType();
+    if (!ElemType->isCharType())
+      return;
+    if (C.getASTContext().getTypeSizeInChars(ElemType).getQuantity() != 1)
+      return;
+
+    RegionRawOffset ROffset = ER->getAsArrayOffset();
+    unsigned int Index = ROffset.getOffset().getQuantity();
+
+    // If the checker has data about the value to bind, use this information.
+    // Otherwise try to get it from the analyzer.
+    auto GetKnownToBeNonNull = [this, State, &C](SVal V) -> ConditionTruthVal {
+      StringData ExistingSData;
+      if (getStringData(ExistingSData, State, C.getSValBuilder(),
+                        V.getAsRegion()) &&
+          ExistingSData.Offset < ExistingSData.NonNullData->size()) {
+        return ExistingSData.NonNullData->test(ExistingSData.Offset);
+      } else {
+        return State->isNonNull(V);
+      }
+    };
+    ConditionTruthVal VKnownToBeNonNull = GetKnownToBeNonNull(V);
+
+    if (const llvm::BitVector *NNData =
+            State->get<NonNullnessData>(ROffset.getRegion())) {
+      // Update existing data.
+      unsigned int NewSize =
+          Index < NNData->size() ? NNData->size() : Index + 1;
+      // Only extend the vector with 'true' value.
+      if (NewSize > NNData->size() && !VKnownToBeNonNull.isConstrainedTrue())
+        return;
+      llvm::BitVector NNData1(NewSize);
+      NNData1 |= *NNData;
+      NNData1[Index] = VKnownToBeNonNull.isConstrainedTrue();
+      State = State->set<NonNullnessData>(ROffset.getRegion(), NNData1);
+    } else {
+      // Only add new data if 'true' is found.
+      if (!VKnownToBeNonNull.isConstrainedTrue())
+        return;
+      llvm::BitVector NNData1(Index + 1);
+      NNData1[Index] = true;
+      State = State->set<NonNullnessData>(ROffset.getRegion(), NNData1);
+    }
+  } else if (const TypedValueRegion *TR =
+                 dyn_cast_or_null<TypedValueRegion>(MR)) {
+    // Initialize a region with compound value from list or string literal.
+    QualType Type = TR->getValueType().getCanonicalType();
+    if (!Type->isArrayType())
+      return;
+    if (!Type->castAsArrayTypeUnsafe()->getElementType()->isCharType())
+      return;
+
+    if (auto CVal = V.getAs<nonloc::CompoundVal>()) {
+      llvm::BitVector NNData;
+      for (auto Val = CVal->begin(); Val != CVal->end(); ++Val)
+        NNData.push_back(State->isNonNull(*Val).isConstrainedTrue());
+      State = State->set<NonNullnessData>(MR, NNData);
+    } else if (auto MRV = V.getAs<loc::MemRegionVal>()) {
+      if (auto *StrReg = MRV->stripCasts()->getAs<StringRegion>()) {
+        StringRef Str = StrReg->getStringLiteral()->getString();
+        size_t StrL = Str.size();
+        llvm::BitVector NNData(StrL + 1, true);
+        for (unsigned int I = 0; I < StrL; ++I)
+          if (Str[I] == 0)
+            NNData.reset(I);
+        NNData.reset(StrL);
+        State = State->set<NonNullnessData>(MR, NNData);
+      }
+    }
+  }
+  C.addTransition(State);
+}
+
+void MissingTerminatingZeroChecker::checkPreCall(const CallEvent &Call,
+                                                 CheckerContext &C) const {
+  if (!Call.isInSystemHeader() || !Call.isGlobalCFunction())
+    return;
+
+  auto Ignore = [this](StringRef Name) -> IgnoreEntry {
+    auto FIgnore = FunctionsToIgnore.find(Name);
+    if (FIgnore != FunctionsToIgnore.end())
+      return FIgnore->getValue();
+    return IgnoreEntry{-1, -1};
+  }(cast<NamedDecl>(Call.getDecl())->getNameAsString());
+
+  for (const auto &[ArgI, ParmD] : enumerate(Call.parameters())) {
+    QualType ArgTy = ParmD->getType();
+    if (!ArgTy->isPointerType())
+      continue;
+    QualType ArgPointeeTy = ArgTy->getPointeeType();
+    if (!ArgPointeeTy->isCharType() || !ArgPointeeTy.isConstQualified())
+      continue;
+    if (static_cast<int>(ArgI) == Ignore.first ||
+        static_cast<int>(ArgI) == Ignore.second)
+      continue;
+
+    if (checkArg(ArgI, C, Call))
+      return;
+  }
+}
+
+bool MissingTerminatingZeroChecker::checkArg(unsigned int ArgI,
+                                             CheckerContext &C,
+                                             const CallEvent &Call) const {
+  SVal StrArgVal = Call.getArgSVal(ArgI);
+
+  StringData SData;
+  if (!getStringData(SData, C.getState(), C.getSValBuilder(),
+                     StrArgVal.getAsRegion()))
+    return false;
+
+  // Check if all elements in the string are known to be non-zero.
+  // The stored bitvector can have a smaller size.
+  if (SData.NonNullData->size() < SData.StrLength)
+    return false;
+  for (int64_t I = SData.Offset; I < SData.StrLength; ++I)
+    if (!SData.NonNullData->test(I))
+      return false;
+
+  reportBug(C.getPredecessor(), Call.getArgExpr(ArgI), C,
+            "String contains no terminating zero; At this place a "
+            "null-terminated string is expected");
+  return true;
+}
+
+bool MissingTerminatingZeroChecker::getStringData(
+    StringData &DataOut, ProgramStateRef State, SValBuilder &SVB,
+    const MemRegion *StrReg) const {
+  if (!StrReg)
+    return false;
+
+  unsigned int Offset = 0;
+  if (const auto *ElemReg = dyn_cast_or_null<ElementRegion>(StrReg)) {
+    RegionRawOffset ROffset = ElemReg->getAsArrayOffset();
+    StrReg = ROffset.getRegion();
+    if (!StrReg)
+      return false;
+    Offset = ROffset.getOffset().getQuantity();
+  }
+
+  const llvm::BitVector *NNData = State->get<NonNullnessData>(StrReg);
+  if (!NNData)
+    return false;
+
+  DefinedOrUnknownSVal Extent = getDynamicExtent(State, StrReg, SVB);
+  if (Extent.isUnknown())
+    return false;
+  const llvm::APSInt *KnownExtent = SVB.getKnownValue(State, Extent);
+  if (!KnownExtent)
+    return false;
+
+  DataOut.StrRegion = StrReg;
+  DataOut.StrLength = KnownExtent->getExtValue();
+  DataOut.Offset = Offset;
+  DataOut.NonNullData = NNData;
+
+  return true;
+}
+
+void MissingTerminatingZeroChecker::reportBug(ExplodedNode *N, const Expr *E,
+                                              CheckerContext &C,
+                                              const char Msg[]) const {
+  auto R = std::make_unique<PathSensitiveBugReport>(BT, Msg, N);
+  bugreporter::trackExpressionValue(N, E, *R);
+  C.emitReport(std::move(R));
+}
+
+void MissingTerminatingZeroChecker::initOptions(bool NoDefaultIgnore,
+                                                StringRef IgnoreList) {
+  if (NoDefaultIgnore)
+    FunctionsToIgnore.clear();
+  std::istringstream IgnoreInput{std::string(IgnoreList)};
+  std::array<char, 100> I;
+  while (IgnoreInput.getline(&I[0], 100, ';')) {
+    std::istringstream FunctionInput{std::string(&I[0])};
+    std::string FName;
+    int FArg;
+    if (FunctionInput >> FName >> FArg) {
+      IgnoreEntry &E =
+          FunctionsToIgnore.insert({FName, {-1, -1}}).first->getValue();
+      if (E.first == -1)
+        E.first = FArg;
+      else if (E.second == -1)
+        E.second = FArg;
+    }
+  }
+}
+
+void ento::registerMissingTerminatingZeroChecker(CheckerManager &Mgr) {
+  auto *Checker = Mgr.registerChecker<MissingTerminatingZeroChecker>();
+  bool NoDefaultIgnore = Mgr.getAnalyzerOptions().getCheckerBooleanOption(
+      Checker, "OmitDefaultIgnoreFunctions");
+  StringRef IgnoreList = Mgr.getAnalyzerOptions().getCheckerStringOption(
+      Checker, "IgnoreFunctionArgs");
+  Checker->initOptions(NoDefaultIgnore, IgnoreList);
+}
+
+bool ento::shouldRegisterMissingTerminatingZeroChecker(
+    const CheckerManager &Mgr) {
+  // Check if the char type has size of 8 bits (to avoid indexing differences)?
+  return true;
+}
diff --git a/clang/test/Analysis/analyzer-config.c b/clang/test/Analysis/analyzer-config.c
index 7936273415ad4..d477987e51642 100644
--- a/clang/test/Analysis/analyzer-config.c
+++ b/clang/test/Analysis/analyzer-config.c
@@ -137,6 +137,8 @@
 // CHECK-NEXT: unix.StdCLibraryFunctions:DisplayLoadedSummaries = false
 // CHECK-NEXT: unix.StdCLibraryFunctions:ModelPOSIX = true
 // CHECK-NEXT: unix.Stream:Pedantic = false
+// CHECK-NEXT: unix.cstring.MissingTerminatingZero:IgnoreFunctionArgs = ""
+// CHECK-NEXT: unix.cstring.MissingTerminatingZero:OmitDefaultIgnoreFunctions = false
 // CHECK-NEXT: unroll-loops = false
 // CHECK-NEXT: verbose-report-filename = false
 // CHECK-NEXT: widen-loops = false
diff --git a/clang/test/Analysis/analyzer-enabled-checkers.c b/clang/test/Analysis/analyzer-enabled-checkers.c
index 66b9be9795f12..127bd3298c76c 100644
--- a/clang/test/Analysis/analyzer-enabled-checkers.c
+++ b/clang/test/Analysis/analyzer-enabled-checkers.c
@@ -57,6 +57,7 @@
 // CHECK-NEXT: unix.StdCLibraryFunctions
 // CHECK-NEXT: unix.Vfork
 // CHECK-NEXT: unix.cstring.BadSizeArg
+// CHECK-NEXT: unix.cstring.MissingTerminatingZero
 // CHECK-NEXT: unix.cstring.NotNullTerminated
 // CHECK-NEXT: unix.cstring.NullArg
 
diff --git a/clang/test/Analysis/cstring-missingterminatingzero-ignore.c b/clang/test/Analysis/cstring-missingterminatingzero-ignore.c
new file mode 100644
index 0000000000000..8b96bf81c55cd
--- /dev/null
+++ b/clang/test/Analysis/cstring-missingterminatingzero-ignore.c
@@ -0,0 +1,27 @@
+// RUN: %clang_analyze_cc1 -analyzer-checker=unix.cstring.MissingTerminatingZero \
+// RUN:   -analyzer-config unix.cstring.MissingTerminatingZero:IgnoreFunctionArgs='strlen 0;strcpy 1' -verify=all,ignore %s
+// RUN: %clang_analyze_cc1 -analyzer-checker=unix.cstring.MissingTerminatingZero \
+// RUN:   -analyzer-config unix.cstring.MissingTerminatingZero:IgnoreFunctionArgs='strlen 0;strcpy 1' \
+// RUN:   -analyzer-config unix.cstring.MissingTerminatingZero:OmitDefaultIgnoreFunctions=true -verify=all,omitdefault %s
+
+#include "Inputs/system-header-simulator.h"
+
+size_t test1(int i) {
+  char buf[1] = {1};
+  return strlen(buf);
+}
+
+void test2(char *dst) {
+  char src[1] = {1};
+  strcpy(dst, src);
+}
+
+int test3() {
+  const char buf[1] = {1};
+  return execl("path", buf, 4); // all-warning{{String contains no terminating zero}}
+}
+
+void test4(char *dst) {
+  char src[3] = {1, 2, 3};
+  strncpy(dst, src, 3); // omitdefault-warning{{String contains no terminating zero}}
+}
diff --git a/clang/test/Analysis/cstring-missingterminatingzero.c b/clang/test/Analysis/cstring-missingterminatingzero.c
new file mode 100644
index 0000000000000..d0ae6cb19db87
--- /dev/null
+++ b/clang/test/Analysis/cstring-missingterminatingzero.c
@@ -0,0 +1,91 @@
+// RUN: %clang_analyze_cc1 -analyzer-checker=unix.cstring.MissingTerminatingZero -verify %s
+
+#include "Inputs/system-header-simulator.h"
+
+void clang_analyzer_eval(int);
+
+size_t test_init_compound(int i) {
+  char src1[6] = {1,2,3,4,5,6};
+  char src2[6] = {1,2,3,0,5,6};
+  switch (i) {
+  case 1:
+    return strlen(src1); // expected-warning{{String contains no terminating zero}}
+  case 2:
+    return strlen(src1 + 1); // expected-warning{{String contains no terminating zero}}
+  case 3:
+    return strlen(src2);
+  case 4:
+    return strlen(src2 + 4); // expected-warning{{String contains no terminating zero}}
+  case 5:
+    return strlen(src2 + 3);
+  }
+  src1[5] = 0;
+  return strlen(src1);
+}
+
+typedef char CHAR;
+
+size_t test_init_literal(int i) {
+  CHAR src1[] = "abcdef";
+  int l = strlen(src1);
+  src1[6] = '.';
+  src1[3] = 0;
+  switch (i) {
+  case 1:
+    return strlen(src1);
+  case 2:
+    return strlen(src1 + 4); // expected-warning{{String contains no terminating zero}}
+  }
+  return l;
+}
+
+size_t test_init_assign(int i, char a) {
+  char src[6];
+  src[1] = '1';
+  src[2] = '2';
+  src[4] = '4';
+  src[5] = '5';
+
+  switch (i) {
+  case 0:
+    return ...
[truncated]

@balazske balazske marked this pull request as draft July 2, 2025 13:12
@balazske
Copy link
Collaborator Author

balazske commented Jul 2, 2025

Pointer escape and region changes are not handled yet.

Copy link
Contributor

@NagyDonat NagyDonat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As an initial review I added some high-level remarks. I will dig into the technical details once these are discussed and answered.

return strlen(src1);
}

typedef char CHAR;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you use this typedef?

Comment on lines +83 to +84
REGISTER_MAP_WITH_PROGRAMSTATE(NonNullnessData, const MemRegion *,
llvm::BitVector)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you introduce this map instead of relying on the standard RegionStore? Unless there is a really compelling reason, we shouldn't duplicate the same data in two different locations.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought the CStringChecker has the CStringLength trait that maps lvalues to the symbolic length of the null-terminated string. In other words, CStringLength represents the position of the null-terminator.
Have you thought of reusing the same trait for representing this?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found that CStringLength is only set when a string literal is used at initialization and is not updated at later changes (if value is assigned to array element). Other ("compound") initializations can be handled but we must use a special value if there is not a terminating zero. Cases like char a[5] = {1, 0, 1, 0, 1} are still difficult to handle in this way. I plan to add support for functions like memcpy (if both argument strings and the size are known to the checker the non-null-data can be updated). These changes would be more difficult to handle if only the length of the strings is known.

Another difference is that the checker stores positions of known nonzero characters, not of a known zero (at the end). A bug is reported only if the whole string contains known nonzero values. In this way we should not get false positives from the checker (or probably for other reasons like escape issues).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you introduce this map instead of relying on the standard RegionStore? Unless there is a really compelling reason, we shouldn't duplicate the same data in two different locations.

If I remember my first idea was to only check the affected (element) regions and it would be a simple checker. But somehow I did not find a way to iterate over the array elements (or subregions). Probably array initialization did not appear in the element regions in all cases too. Additionally string copy functions can be more difficult to handle in that way. (If copy is added it should be in a different checker, probably CStringChecker.)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or would it be better to "get" (or create?) an element region for all elements of the string when for example a strlen is called?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@NagyDonat What's your opinion about the iterBindings api? I don't think you reflected on that potentially promising path forward - instead of creating many ElementRegions.

It didn't catch my interest at first glance, but now that I think about it it's indeed promising -- especially if it is combined with getDefaultBinding (or perhaps if it is extended to return the default bindings directly). At first I felt that this "iterate over the internal representation" approach was too low-level (and therefore inelegant), but it is the natural approach to avoid performance issues and the superfluous use of heavyweight ElementRegions. (And in some sense it's elegant to iterate over what is in some sense the "most structured" representation of our knowledge.)

I think that the main drawback of this approach is that it's more difficult than just creating the many ElementRegions. It is not impossibly difficult (it would be feasible to implement it in the foreseeable future), but we should choose it only if we see concrete use cases where this iterBindings-based approach would work, while the naive ElementRegion would fail.

Right now as far as I see we can only detect non-null-terminated strings if they are constructed by characterwise operations (because we don't model memcpy et al), and I'd guess that these primitive operations are mostly used for constructing very short strings where the naive ElementRegion-based approach is sufficient.

However, if we had proper modeling for memcpy and similar functions, then it would be possible to report more complex errors e.g. where memcpy can overwrite a trailing zero -- and for that it would be worth to implement this checker based on iterBindings.


Also, a technical note: as far as I see iterBindings iterates over all bindings everywhere in the memory (with for each [Region, Cluster] in the store { for each [Key, Val] in Cluster { ... } }). Do I understand it correctly that for this application we would introduce a modified variant of iterBindings that only iterates over a single Cluster (the parts of the relevant memory block).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It didn't catch my interest at first glance, but now that I think about it it's indeed promising -- especially if it is combined with getDefaultBinding (or perhaps if it is extended to return the default bindings directly). At first I felt that this "iterate over the internal representation" approach was too low-level (and therefore inelegant), but it is the natural approach to avoid performance issues and the superfluous use of heavyweight ElementRegions. (And in some sense it's elegant to iterate over what is in some sense the "most structured" representation of our knowledge.)

This matches my understanding. I agree that this feels like the natural way of doing this.

I think that the main drawback of this approach is that it's more difficult than just creating the many ElementRegions. It is not impossibly difficult (it would be feasible to implement it in the foreseeable future), but we should choose it only if we see concrete use cases where this iterBindings-based approach would work, while the naive ElementRegion would fail.

The concern I have that we trash the MemRegion allocator with MemRegions that we never will see/use again. Not a particularly big deal in practice, but to me it's a pretty big red flag that outweighs the concerns about iterBindings and nods me to explore that one before compromising.


Also, a technical note: as far as I see iterBindings iterates over all bindings everywhere in the memory (with for each [Region, Cluster] in the store { for each [Key, Val] in Cluster { ... } }). Do I understand it correctly that for this application we would introduce a modified variant of iterBindings that only iterates over a single Cluster (the parts of the relevant memory block).

Yes. In the current form of iterBindings is not particularly useful. Exposing all clusters seems to be a very broad contract. If there is indeed a need for such an operation, that should be split. Iterating only a single cluster is probably a much more useful API for most applications.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is one Achilles heel of the RegionStore. Copying an object uses LCVs, which can only be bound as a default binding, which doesn't have an extent. It's correct that LCVs are default bindings. The problem is that default bindings doesn't have an extent, aka. end/length.

It would be really nice to eliminate this limitation...

There is a lengthy discussion of the proposal that we at Sonar believe would work, and we actually have a really early draft prototype of what I described there. Unfortunately, we never got back to finishing it.

Is this prototype available for public review? I would definitely like to have a look at it to understand the outline and challenges of this task. (Depending on the details, I may consider working on this goal -- but obviously I wouldn't want to start that without checking the prior work and ensuring alignment with your efforts.)

We discussed this internally, and we agreed that we should share the feature branch of the RegionStore prototype to foster progress in the area to achieve potential breakthroughs. Please give me some time to figure out the best format to share this. Ideally, it should be part of the repository in some way. Ping me privately if I forgot to come back to this.

Copy link
Contributor

@NagyDonat NagyDonat Aug 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, a technical note: as far as I see iterBindings iterates over all bindings everywhere in the memory (with for each [Region, Cluster] in the store { for each [Key, Val] in Cluster { ... } }). Do I understand it correctly that for this application we would introduce a modified variant of iterBindings that only iterates over a single Cluster (the parts of the relevant memory block).

Yes. In the current form of iterBindings is not particularly useful. Exposing all clusters seems to be a very broad contract. If there is indeed a need for such an operation, that should be split. Iterating only a single cluster is probably a much more useful API for most applications.

I realized that RegionBindingsReg inherits publicly from llvm::ImmutableMap<const MemRegion *, ClusterBindings>, so its there is no need to implement a new method because the inherited lookup method can be used to acquire a single ClusterBindings object which corresponds to a certain base region. (Note that ClusterBindings is an alias for llvm::ImmutableMap<BindingKey, SVal> so it can be easily traversed.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI The region store with extent prototype was shared in https://discourse.llvm.org/t/rfc-regionstore/70954/24.

Comment on lines +2134 to +2137
is a list of exceptions which are identified by the function name and parameter
index. This list is called "ignore list" and contains these default values:
(``stpncpy``, 1), (``strncat``, 1), (``strncmp``, 0), (``strncmp``, 1),
(``strncpy``, 1), (``strndup``, 0), (``strnlen``, 0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are you tracking the parameter indices on the exception list? Do you know about any significant builtin functions where one parameter is an exception (doesn't need to be null-terminated) while another parameter must be null-terminated?

If not, then it would be much simpler to just track the function names.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Functions like strncpy have a destination that should be null-terminated and a source string that can be not null-terminated.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The destination parameter is a non-const char * while (at least according to the documentation) your checker only checks the const char * parameters of the C library functions.

Comment on lines +2145 to +2151
* ``IgnoreFunctionArgs`` (string). Can be used to add functions to the ignore
list. It should contain entries in form of "<function name> <parameter index>"
separated by comma. These values are added to the ignore list. For example
``strlen 0, strcpy 0, strcpy 1`` adds ``strlen`` and ``strcpy`` (both
parameters) to the ignore list. A function name can be used at most 2 times
(with different parameter values). Default value of the option is an empty
string.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you do keep the logic that parameters are excluded separately, then please get rid of the limitation that "a function name can be used at most 2 times" -- arbitrary constraints this have no place in modern software.

(It would be somewhat understandable in a project written in C, but here you can easily use SmallVector instead of std::pair so there is no reason to limit the number of arguments that can be specified.)

@steakhal
Copy link
Contributor

Sorry about my late reply. It took me a while to get back to this one.

@balazske
Copy link
Collaborator Author

The two patches (#149106 and this) are a different solution for the problem. I started the other because it looks more difficult to make this checker work if memory invalidations happen, probably many cases should be handled to maintain the checker internal data. The other solution has different problems (possible use of many ElementRegion's, no good ability to handle memory copy functions), a decision is needed which is better to continue.

@steakhal
Copy link
Contributor

I'm using my personal account here because that's what I used this thread so far. Consider this from @balazs-benics-sonarsource.

We have patches downstream in a related subject that we believe is not in our best interest to upstream right now. This may/hopefully change in the future. As I was involved in those patches, I don't think I can give a fair and unbiased review without risking intellectual property rights, or nodding the PR in a direction where it would incidentally make the maintenance of our downstream patches more challenging than they already are.
I think I shared most if not all what I could. I resign from reviewing this and the sibling patch of this PR.

I'm of course open to answer questions, but I don't plan to look actively at the content of these patches, except if the question is technical and references it.

I'll leave the rest of the review to @NagyDonat and @Xazax-hun

@balazske
Copy link
Collaborator Author

Probably for this checker another approach can work: Store only if a string is known to be null-terminated, or known to contain no zero at all (probably other state like "likely does not contain terminating zero" or "likely contains zero only at the end" can be used). Additionally string length should be maintained more accurately (in CStringChecker) and strlen should return the "string length" value. When handling this data in all string and memory manipulation functions simple cases of passing not null-terminated string to a function could be found.

@NagyDonat
Copy link
Contributor

Probably for this checker another approach can work: Store only if a string is known to be null-terminated, or known to contain no zero at all (probably other state like "likely does not contain terminating zero" or "likely contains zero only at the end" can be used). Additionally string length should be maintained more accurately (in CStringChecker) and strlen should return the "string length" value. When handling this data in all string and memory manipulation functions simple cases of passing not null-terminated string to a function could be found.

I see the logic behind this approach, and you're right that this could be quick solution for the simple cases, but I fear that the devil is in the details and it would be difficult to ensure that this string-length-based information remains consistent with the RegionStore. For this reason I would strongly prefer an approach that would rely on the RegionStore and e.g. investigate the ClusterBindings to understand the contents (and terminated/unterminated status) of a string.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

clang:static analyzer clang Clang issues not falling into any other category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants