Skip to content

Conversation

alejandro-alvarez-sonarsource
Copy link
Contributor

  • ArrayBoundChecker now directly models index accesses on ConstantArrayType, enabling more detection of out-of-bounds (OOB) accesses for arrays with known constant sizes.
  • Improved support for multi-dimensional arrays.

By default, positive OOB accesses of "fake" Flexible Array Members (FAM) are silenced, and do not introduce any assumption on the index beyond it not being negative.
The optionsecurity.ArrayBound:EnableFakeFlexibleArrayWarn allows to opt-in for warnings in these cases (without sinking).

With this patch we can now warn on OOB access even if we know nothing about the allocation, since the type suffices.

Additionally, this also allows to report OOB accesses on subarrays. For instance:

int array[10][10];

array[0][10];

Before this change, ArrayBoundChecker would not report an OOB, since only the byte offset is taken into account:
0 * 10 + 10 = 10, which is less than the allocated 100 bytes. This is, nonetheless, UB per the standard.

CPP-6852, CPP-6956

- `ArrayBoundChecker` now directly models index accesses on
  `ConstantArrayType`, enabling more detection of out-of-bounds (OOB)
  accesses for arrays with known constant sizes.
- Improved support for multi-dimensional arrays.

By default, positive OOB accesses of "fake" Flexible Array Members
(FAM) are silenced, and do not introduce any assumption on the index
beyond it not being negative.
The option`security.ArrayBound:EnableFakeFlexibleArrayWarn` allows
to opt-in for warnings in these cases (without sinking).

With this patch we can now warn on OOB access even if we know
nothing about the allocation, since the type suffices.

Additionally, this also allows to report OOB accesses on
subarrays. For instance:

```cpp
int array[10][10];

array[0][10];
```

Before this change, `ArrayBoundChecker` would not report an OOB,
since only the byte offset is taken into account:
0 * 10 + 10 = 10, which is less than the allocated 100 bytes.
This is, nonetheless, UB per the standard.
@llvmbot llvmbot added clang Clang issues not falling into any other category clang:static analyzer labels Sep 17, 2025
@llvmbot
Copy link
Member

llvmbot commented Sep 17, 2025

@llvm/pr-subscribers-clang-static-analyzer-1

@llvm/pr-subscribers-clang

Author: Alejandro Álvarez Ayllón (alejandro-alvarez-sonarsource)

Changes
  • ArrayBoundChecker now directly models index accesses on ConstantArrayType, enabling more detection of out-of-bounds (OOB) accesses for arrays with known constant sizes.
  • Improved support for multi-dimensional arrays.

By default, positive OOB accesses of "fake" Flexible Array Members (FAM) are silenced, and do not introduce any assumption on the index beyond it not being negative.
The optionsecurity.ArrayBound:EnableFakeFlexibleArrayWarn allows to opt-in for warnings in these cases (without sinking).

With this patch we can now warn on OOB access even if we know nothing about the allocation, since the type suffices.

Additionally, this also allows to report OOB accesses on subarrays. For instance:

int array[10][10];

array[0][10];

Before this change, ArrayBoundChecker would not report an OOB, since only the byte offset is taken into account:
0 * 10 + 10 = 10, which is less than the allocated 100 bytes. This is, nonetheless, UB per the standard.

CPP-6852, CPP-6956


Patch is 41.36 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/159357.diff

7 Files Affected:

  • (modified) clang/include/clang/StaticAnalyzer/Checkers/Checkers.td (+12-3)
  • (modified) clang/lib/StaticAnalyzer/Checkers/ArrayBoundChecker.cpp (+395-41)
  • (modified) clang/test/Analysis/ArrayBound/assumption-reporting.c (+82-8)
  • (modified) clang/test/Analysis/ArrayBound/brief-tests.c (+186-5)
  • (modified) clang/test/Analysis/ArrayBound/cplusplus.cpp (+41-3)
  • (modified) clang/test/Analysis/ArrayBound/verbose-tests.c (+52-6)
  • (modified) clang/test/Analysis/analyzer-config.c (+1)
diff --git a/clang/include/clang/StaticAnalyzer/Checkers/Checkers.td b/clang/include/clang/StaticAnalyzer/Checkers/Checkers.td
index a1f5bdfb9edf1..5f60b0ee517bb 100644
--- a/clang/include/clang/StaticAnalyzer/Checkers/Checkers.td
+++ b/clang/include/clang/StaticAnalyzer/Checkers/Checkers.td
@@ -915,9 +915,18 @@ def decodeValueOfObjCType : Checker<"decodeValueOfObjCType">,
 
 let ParentPackage = Security in {
 
-  def ArrayBoundChecker : Checker<"ArrayBound">,
-                          HelpText<"Warn about out of bounds access to memory">,
-                          Documentation<HasDocumentation>;
+  def ArrayBoundChecker
+      : Checker<"ArrayBound">,
+        HelpText<"Warn about out of bounds access to memory">,
+        CheckerOptions<[
+          CmdLineOption<Boolean,
+                        "EnableFakeFlexibleArrayWarn",
+                        "Do not silence out-of-bound accesses to arrays of size 1 or 0 that are "
+                        "the last member of a record, or member of an union",
+                        "false",
+                          Released>,
+        ]>,
+        Documentation<HasDocumentation>;
 
   def FloatLoopCounter
       : Checker<"FloatLoopCounter">,
diff --git a/clang/lib/StaticAnalyzer/Checkers/ArrayBoundChecker.cpp b/clang/lib/StaticAnalyzer/Checkers/ArrayBoundChecker.cpp
index 6ad5acd4e76f2..eed48afbf3933 100644
--- a/clang/lib/StaticAnalyzer/Checkers/ArrayBoundChecker.cpp
+++ b/clang/lib/StaticAnalyzer/Checkers/ArrayBoundChecker.cpp
@@ -103,27 +103,6 @@ class StateUpdateReporter {
 
 private:
   std::string getMessage(PathSensitiveBugReport &BR) const;
-
-  /// Return true if information about the value of `Sym` can put constraints
-  /// on some symbol which is interesting within the bug report `BR`.
-  /// In particular, this returns true when `Sym` is interesting within `BR`;
-  /// but it also returns true if `Sym` is an expression that contains integer
-  /// constants and a single symbolic operand which is interesting (in `BR`).
-  /// We need to use this instead of plain `BR.isInteresting()` because if we
-  /// are analyzing code like
-  ///   int array[10];
-  ///   int f(int arg) {
-  ///     return array[arg] && array[arg + 10];
-  ///   }
-  /// then the byte offsets are `arg * 4` and `(arg + 10) * 4`, which are not
-  /// sub-expressions of each other (but `getSimplifiedOffsets` is smart enough
-  /// to detect this out of bounds access).
-  static bool providesInformationAboutInteresting(SymbolRef Sym,
-                                                  PathSensitiveBugReport &BR);
-  static bool providesInformationAboutInteresting(SVal SV,
-                                                  PathSensitiveBugReport &BR) {
-    return providesInformationAboutInteresting(SV.getAsSymbol(), BR);
-  }
 };
 
 struct Messages {
@@ -157,12 +136,27 @@ class ArrayBoundChecker : public Checker<check::PostStmt<ArraySubscriptExpr>,
   BugType BT{this, "Out-of-bound access"};
   BugType TaintBT{this, "Out-of-bound access", categories::TaintedData};
 
+  enum class ConstantArrayIndexResult {
+    Done, //< Could model the index access based on its type
+    Unknown //< Could not model the array access based on its type
+  };
+
+  // `ConstantArrayType`s have a constant size, so use it to check the access.
+  ConstantArrayIndexResult
+  performCheckArrayTypeIndex(const ArraySubscriptExpr *E,
+                             CheckerContext &C) const;
+
   void performCheck(const Expr *E, CheckerContext &C) const;
 
   void reportOOB(CheckerContext &C, ProgramStateRef ErrorState, Messages Msgs,
                  NonLoc Offset, std::optional<NonLoc> Extent,
                  bool IsTaintBug = false) const;
 
+  void warnFlexibleArrayAccess(CheckerContext &C, ProgramStateRef State,
+                               const ArraySubscriptExpr *E, StringRef Name,
+                               NonLoc Index, nonloc::ConcreteInt ArraySize,
+                               QualType ArrayType) const;
+
   static void markPartsInteresting(PathSensitiveBugReport &BR,
                                    ProgramStateRef ErrorState, NonLoc Val,
                                    bool MarkTaint);
@@ -177,8 +171,12 @@ class ArrayBoundChecker : public Checker<check::PostStmt<ArraySubscriptExpr>,
   static bool isInAddressOf(const Stmt *S, ASTContext &AC);
 
 public:
+  bool EnableFakeFlexibleArrayWarn{false};
+
   void checkPostStmt(const ArraySubscriptExpr *E, CheckerContext &C) const {
-    performCheck(E, C);
+    if (performCheckArrayTypeIndex(E, C) == ConstantArrayIndexResult::Unknown) {
+      performCheck(E, C);
+    }
   }
   void checkPostStmt(const UnaryOperator *E, CheckerContext &C) const {
     if (E->getOpcode() == UO_Deref)
@@ -478,10 +476,8 @@ static Messages getNonTaintMsgs(const ASTContext &ACtx,
           std::string(Buf)};
 }
 
-static Messages getTaintMsgs(const MemSpaceRegion *Space,
-                             const SubRegion *Region, const char *OffsetName,
+static Messages getTaintMsgs(StringRef RegName, const char *OffsetName,
                              bool AlsoMentionUnderflow) {
-  std::string RegName = getRegionName(Space, Region);
   return {formatv("Potential out of bound access to {0} with tainted {1}",
                   RegName, OffsetName),
           formatv("Access of {0} with a tainted {1} that may be {2}too large",
@@ -489,6 +485,43 @@ static Messages getTaintMsgs(const MemSpaceRegion *Space,
                   AlsoMentionUnderflow ? "negative or " : "")};
 }
 
+/// Return true if information about the value of `Sym` can put constraints
+/// on some symbol which is interesting within the bug report `BR`.
+/// In particular, this returns true when `Sym` is interesting within `BR`;
+/// but it also returns true if `Sym` is an expression that contains integer
+/// constants and a single symbolic operand which is interesting (in `BR`).
+/// We need to use this instead of plain `BR.isInteresting()` because if we
+/// are analyzing code like
+///   int array[10];
+///   int f(int arg) {
+///     return array[arg] && array[arg + 10];
+///   }
+/// then the byte offsets are `arg * 4` and `(arg + 10) * 4`, which are not
+/// sub-expressions of each other (but `getSimplifiedOffsets` is smart enough
+/// to detect this out of bounds access).
+static bool providesInformationAboutInteresting(SymbolRef Sym,
+                                                PathSensitiveBugReport &BR) {
+  if (!Sym)
+    return false;
+  for (SymbolRef PartSym : Sym->symbols()) {
+    // The interestingess mark may appear on any layer as we're stripping off
+    // the SymIntExpr, UnarySymExpr etc. layers...
+    if (BR.isInteresting(PartSym))
+      return true;
+    // ...but if both sides of the expression are symbolic, then there is no
+    // practical algorithm to produce separate constraints for the two
+    // operands (from the single combined result).
+    if (isa<SymSymExpr>(PartSym))
+      return false;
+  }
+  return false;
+}
+
+static bool providesInformationAboutInteresting(SVal SV,
+                                                PathSensitiveBugReport &BR) {
+  return providesInformationAboutInteresting(SV.getAsSymbol(), BR);
+}
+
 const NoteTag *StateUpdateReporter::createNoteTag(CheckerContext &C) const {
   // Don't create a note tag if we didn't assume anything:
   if (!AssumedNonNegative && !AssumedUpperBound)
@@ -555,24 +588,309 @@ std::string StateUpdateReporter::getMessage(PathSensitiveBugReport &BR) const {
   return std::string(Out.str());
 }
 
-bool StateUpdateReporter::providesInformationAboutInteresting(
-    SymbolRef Sym, PathSensitiveBugReport &BR) {
-  if (!Sym)
-    return false;
-  for (SymbolRef PartSym : Sym->symbols()) {
-    // The interestingess mark may appear on any layer as we're stripping off
-    // the SymIntExpr, UnarySymExpr etc. layers...
-    if (BR.isInteresting(PartSym))
-      return true;
-    // ...but if both sides of the expression are symbolic, then there is no
-    // practical algorithm to produce separate constraints for the two
-    // operands (from the single combined result).
-    if (isa<SymSymExpr>(PartSym))
+// If the base expression of `expr` refers to a `ConstantArrayType`,
+// return the element type and the array size.
+// Note that "real" Flexible Array Members are `IncompleteArrayType`.
+// For them, this function returns `std::nullopt`.
+static std::optional<std::pair<QualType, nonloc::ConcreteInt>>
+getArrayTypeInfo(SValBuilder &svb, ArraySubscriptExpr const *expr) {
+  auto const *arrayBaseExpr = expr->getBase()->IgnoreParenImpCasts();
+  auto const arrayQualType = arrayBaseExpr->getType().getCanonicalType();
+  if (arrayQualType.isNull() || !arrayQualType->isConstantArrayType()) {
+    return std::nullopt;
+  }
+  auto *constArrayType =
+      dyn_cast<ConstantArrayType>(arrayQualType->getAsArrayTypeUnsafe());
+  if (!constArrayType) {
+    return std::nullopt;
+  }
+  return std::make_pair(
+      constArrayType->getElementType(),
+      // Note that an array size is technically unsigned, but
+      // `compareValueToThreshold` (via `getSimplifiedOffsets`)
+      // will do some arithmetics that could overflow and cause FN. For instance
+      // ```
+      // int array[20];
+      // array[unsigned_index + 21];
+      // ```
+      // `unsigned_index + 21 < 20` is turned into `unsigned_index < 20 - 21`,
+      // and if 20 is unsigned, that will overflow to the biggest possible array
+      // which is always trivially true. We obviously do not want that, so we
+      // need to treat the size as signed.
+      svb.makeIntVal(llvm::APSInt{constArrayType->getSize(), false}));
+}
+
+static Messages getNegativeIndexMessage(StringRef Name,
+                                        nonloc::ConcreteInt ArraySize,
+                                        NonLoc Index) {
+  auto const ArraySizeVal = ArraySize.getValue()->getZExtValue();
+  std::string const IndexStr = [&]() -> std::string {
+    if (auto ConcreteIndex = getConcreteValue(Index);
+      ConcreteIndex.has_value()) {
+      return formatv(" {0}", ConcreteIndex);
+    }
+    return "";
+  }();
+
+  return {formatv("Out of bound access to {0} at a negative index", Name),
+          formatv("Access of {0} containing {1} elements at negative index{2}",
+                  Name, ArraySizeVal, IndexStr)};
+}
+
+static std::string truncateWithEllipsis(StringRef str, size_t maxLength) {
+  if (str.size() <= maxLength)
+    return str.str();
+
+  return (str.substr(0, maxLength - 3) + "...").str();
+}
+
+static Messages getOOBIndexMessage(StringRef Name, NonLoc Index,
+                                   nonloc::ConcreteInt ArraySize,
+                                   QualType ElemType,
+                                   bool AlsoMentionUnderflow) {
+  std::optional<int64_t> IndexN = getConcreteValue(Index);
+  int64_t ExtentN = ArraySize.getValue()->getZExtValue();
+
+  SmallString<256> Buf;
+  llvm::raw_svector_ostream Out(Buf);
+  Out << "Access of " << Name << " at ";
+  if (AlsoMentionUnderflow) {
+    Out << "a negative or overflowing index";
+  } else if (IndexN.has_value()) {
+    Out << "index " << *IndexN;
+  } else {
+    Out << "an overflowing index";
+  }
+
+  const auto ElemTypeStr = truncateWithEllipsis(ElemType.getAsString(), 20);
+
+  Out << ", while it holds only ";
+  if (ExtentN != 1)
+    Out << ExtentN << " '" << ElemTypeStr << "' elements";
+  else
+    Out << "a single " << "'" << ElemTypeStr << "' element";
+
+  return {formatv("Out of bound access to memory {0} {1}",
+                  AlsoMentionUnderflow ? "around" : "after the end of",
+                  Name),
+          std::string(Buf)};
+}
+
+// "True" flexible array members do not specify any size (`int elems[];`)
+// However, some projects use "fake flexible arrays" (aka "struct hack"), where
+// they specify a size of 0 or 1 to work around a compiler limitation.
+// "True" flexible array members are `IncompleteArrayType` and will be skipped
+// by `performCheckArrayTypeIndex`. We need an heuristic to identify "fake" ones.
+static bool isFakeFlexibleArrays(const ArraySubscriptExpr *E) {
+  auto getFieldDecl = [](ArraySubscriptExpr const *array)-> FieldDecl * {
+    const Expr *BaseExpr = array->getBase()->IgnoreParenImpCasts();
+    if (const MemberExpr *ME = dyn_cast<MemberExpr>(BaseExpr)) {
+      return dyn_cast<FieldDecl>(ME->getMemberDecl());
+    }
+    return nullptr;
+  };
+  auto const isLastField = [
+      ](RecordDecl const *Parent, FieldDecl const *Field) {
+    const FieldDecl *LastField = nullptr;
+    for (const FieldDecl *F : Parent->fields()) {
+      LastField = F;
+    }
+
+    return (LastField == Field);
+  };
+  // We expect placeholder constant arrays to have size 0 or 1.
+  auto maybeConstArrayPlaceholder = [](QualType Type) {
+    const ConstantArrayType *CAT =
+        dyn_cast<ConstantArrayType>(Type->getAsArrayTypeUnsafe());
+    return CAT && CAT->getSize().getZExtValue() <= 1;
+  };
+
+  if (auto const *Field = getFieldDecl(E)) {
+    if (!maybeConstArrayPlaceholder(Field->getType()))
       return false;
+
+    const RecordDecl *Parent = Field->getParent();
+    return Parent && (Parent->isUnion() || isLastField(Parent, Field));
   }
+
   return false;
 }
 
+// Generate a representation of `Expr` suitable for diagnosis.
+SmallString<128> ExprReprForDiagnosis(ArraySubscriptExpr const *E,
+                                      CheckerContext &C) {
+  SmallString<128> Buf;
+  llvm::raw_svector_ostream Out(Buf);
+
+  auto const *Base = E->getBase()->IgnoreParenImpCasts();
+  switch (Base->getStmtClass()) {
+  case Stmt::MemberExprClass:
+    Out << "the field '";
+    Out << dyn_cast<MemberExpr>(Base)->getMemberDecl()->getName().str();
+    Out << '\'';
+    break;
+  case Stmt::ArraySubscriptExprClass:
+    Out << "the subarray '";
+    Base->printPretty(Out, nullptr, {C.getLangOpts()});
+    Out << '\'';
+    break;
+  default:
+    Out << '\'';
+    Base->printPretty(Out, nullptr, {C.getLangOpts()});
+    Out << '\'';
+  }
+
+  return Buf;
+}
+
+class StateIndexUpdateReporter {
+  std::string Repr;
+  QualType ElementType;
+  NonLoc Index;
+  nonloc::ConcreteInt ArraySize;
+  bool AssumedNonNegative = false;
+  bool AssumedInBounds = false;
+
+  std::string getMessage(PathSensitiveBugReport &BR) const {
+    SmallString<256> Buf;
+    if (providesInformationAboutInteresting(Index, BR)) {
+      llvm::raw_svector_ostream Out{Buf};
+      Out << "Assuming index is";
+      if (AssumedNonNegative)
+        Out << " non-negative";
+      if (AssumedInBounds) {
+        if (AssumedNonNegative)
+          Out << " and";
+        Out << " less than " << ArraySize.getValue()->getZExtValue() << ", ";
+        Out << "the number of '" << ElementType.getAsString() <<
+            "' elements in ";
+        Out << Repr;
+      }
+    }
+    return std::string{Buf.str()};
+  }
+
+public:
+  StateIndexUpdateReporter(StringRef Repr, QualType ElementType, NonLoc Index,
+                           nonloc::ConcreteInt ArraySize)
+    : Repr(Repr), ElementType{ElementType}, Index{Index}, ArraySize{ArraySize} {
+  }
+
+  void recordNonNegativeAssumption() {
+    AssumedNonNegative = true;
+  }
+
+  void recordInBoundsAssumption() {
+    AssumedInBounds = true;
+  }
+
+  const NoteTag *createNoteTag(CheckerContext &C) const {
+    // Don't create a note tag if we didn't assume anything:
+    if (!AssumedNonNegative && !AssumedInBounds) {
+      return nullptr;
+    }
+
+    return C.getNoteTag(
+        [*this](PathSensitiveBugReport &BR) { return getMessage(BR); });
+  }
+};
+
+// If the array is a `ConstantArrayType`, check the axis size.
+// It returns `ConstantArrayIndexResult::Unknown` if it could not reason about
+// the array access, deferring to the regular check based on the region.
+auto ArrayBoundChecker::performCheckArrayTypeIndex(
+    const ArraySubscriptExpr *E,
+    CheckerContext &C) const -> ConstantArrayIndexResult {
+  auto State = C.getState();
+  SValBuilder &SVB = C.getSValBuilder();
+
+  auto const ArrayInfo = getArrayTypeInfo(SVB, E);
+  auto const Index = SVB.simplifySVal(State, C.getSVal(E->getIdx())).getAs<
+    NonLoc>();
+  if (!ArrayInfo || !Index)
+    return ConstantArrayIndexResult::Unknown;
+
+  auto const &[ArrayType, ArraySize] = *ArrayInfo;
+
+  auto const ExprAsStr = ExprReprForDiagnosis(E, C);
+  bool const IsFakeFAM = isFakeFlexibleArrays(E);
+
+  StateIndexUpdateReporter SUR(ExprAsStr, ArrayType, *Index, ArraySize);
+
+  // Is the index negative?
+  auto [NegativeIndexState, NonNegativeIndexState] =
+      compareValueToThreshold(State, *Index, SVB.makeZeroArrayIndex(), SVB);
+  bool const AlsoMentionUnderflow = (NegativeIndexState != nullptr);
+
+  // Negative is possible
+  if (NegativeIndexState) {
+    // But it can't be
+    if (E->getIdx()->getType()->isUnsignedIntegerOrEnumerationType()) {
+      // And positive isn't possible
+      if (!NonNegativeIndexState) {
+        // The state is broken
+        return ConstantArrayIndexResult::Done;
+      }
+      // As in `performCheck`, we add no assumptions about the index
+    } else if (!NonNegativeIndexState) {
+      // Positive is not possible, this is a bug
+      Messages Msgs = getNegativeIndexMessage(ExprAsStr, ArraySize, *Index);
+      reportOOB(C, NegativeIndexState, Msgs, *Index, ArraySize);
+      return ConstantArrayIndexResult::Done;
+
+    } else {
+      // Both negative and positive are possible, assume positive
+      SUR.recordNonNegativeAssumption();
+    }
+  } else if (!NonNegativeIndexState) {
+    // Broken state
+    return ConstantArrayIndexResult::Done;
+  }
+
+  // The index is greater than 0, is it within bounds?
+  auto [WithinUpperBound, OutOfBounds] =
+      compareValueToThreshold(NonNegativeIndexState, *Index, ArraySize, SVB);
+  if (!WithinUpperBound && !OutOfBounds) {
+    // Invalid state
+    return ConstantArrayIndexResult::Done;
+  }
+
+  if (!WithinUpperBound) {
+    if (isIdiomaticPastTheEndPtr(E, OutOfBounds, *Index, ArraySize, C)) {
+      C.addTransition(OutOfBounds, SUR.createNoteTag(C));
+      return ConstantArrayIndexResult::Done;
+    }
+    // Silence for fake flexible arrays unless explicitly enabled
+    if (!IsFakeFAM) {
+      Messages Msgs = getOOBIndexMessage(ExprAsStr, *Index, ArraySize, ArrayType,
+                                         AlsoMentionUnderflow);
+      reportOOB(C, OutOfBounds, Msgs, *Index, ArraySize);
+    } else if (EnableFakeFlexibleArrayWarn) {
+      warnFlexibleArrayAccess(C, OutOfBounds, E, ExprAsStr, *Index, ArraySize,
+                              ArrayType);
+    }
+    return ConstantArrayIndexResult::Done;
+  }
+
+  // The access might be within range, but it may be tainted
+  if (OutOfBounds && isTainted(OutOfBounds, *Index)) {
+    Messages Msgs = getTaintMsgs(ExprAsStr, "index", AlsoMentionUnderflow);
+    reportOOB(C, OutOfBounds, Msgs, *Index, ArraySize,
+              /*IsTaintBug=*/true);
+  }
+
+  // When "Flexible Array Members" are involved, assume only non-negative
+  // even if we want the warning for OOB FAM access.
+  if (!IsFakeFAM) {
+    if (WithinUpperBound != NonNegativeIndexState)
+      SUR.recordInBoundsAssumption();
+    C.addTransition(WithinUpperBound, SUR.createNoteTag(C));
+  } else
+    C.addTransition(NonNegativeIndexState, SUR.createNoteTag(C));
+
+  return ConstantArrayIndexResult::Done;
+}
+
 void ArrayBoundChecker::performCheck(const Expr *E, CheckerContext &C) const {
   const SVal Location = C.getSVal(E);
 
@@ -708,8 +1026,7 @@ void ArrayBoundChecker::performCheck(const Expr *E, CheckerContext &C) const {
           if (isTainted(State, ASE->getIdx(), C.getLocationContext()))
             OffsetName = "index";
 
-        Messages Msgs =
-            getTaintMsgs(Space, Reg, OffsetName, AlsoMentionUnderflow);
+        Messages Msgs = getTaintMsgs(getRegionName(Space, Reg), OffsetName, AlsoMentionUnderflow);
         reportOOB(C, ExceedsUpperBound, Msgs, ByteOffset, KnownSize,
                   /*IsTaintBug=*/true);
         return;
@@ -785,6 +1102,40 @@ void ArrayBoundChecker::reportOOB(CheckerContext &C, ProgramStateRef ErrorState,
   C.emitReport(std::move(BR));
 }
 
+void ArrayBoundChecker::warnFlexibleArrayAccess(
+    CheckerContext &C, ProgramStateRef State, const ArraySubscriptExpr *E,
+    StringRef Name, NonLoc Index,
+    nonloc::ConcreteInt ArraySize, QualType ElemType) const {
+  ExplodedNode *WarnNo...
[truncated]

Copy link

github-actions bot commented Sep 17, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

@steakhal
Copy link
Contributor

Should we warn about flax arrays if -fstrict-flex-arrays is explicitly specified in the clang flags?
https://clang.llvm.org/docs/ClangCommandLineReference.html#cmdoption-clang-fstrict-flex-arrays

@alejandro-alvarez-sonarsource
Copy link
Contributor Author

Should we warn about flax arrays if -fstrict-flex-arrays is explicitly specified in the clang flags? https://clang.llvm.org/docs/ClangCommandLineReference.html#cmdoption-clang-fstrict-flex-arrays

You mean, and replace the option I added to the checker? I suppose it does make sense, if there is already a flag for it.

@steakhal
Copy link
Contributor

Should we warn about flax arrays if -fstrict-flex-arrays is explicitly specified in the clang flags? https://clang.llvm.org/docs/ClangCommandLineReference.html#cmdoption-clang-fstrict-flex-arrays

You mean, and replace the option I added to the checker? I suppose it does make sense, if there is already a flag for it.

It makes sense if the checker has an option to override this thus have precedence.

Copy link
Contributor

@NagyDonat NagyDonat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm very happy to see that you are implementing these enhancements; they were also on my long-term todo list but I didn't know when would I have time for implementing them.

Unfortunately it seems that your commit duplicates several hundred lines of code, instead of reusing the existing code (with suitable generalizations and tweaks). In addition to the maintenance burden, this would introduce inconsistencies in the message format and behavior of the analyzer, so I think we must avoid it. As the original author of this code, I'm happy to help you with integrating your new requirements into the existing logic.

Comment on lines 596 to 598
getArrayTypeInfo(SValBuilder &svb, ArraySubscriptExpr const *expr) {
auto const *arrayBaseExpr = expr->getBase()->IgnoreParenImpCasts();
auto const arrayQualType = arrayBaseExpr->getType().getCanonicalType();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here and elsewhere please follow the LLVM Coding Standard and use CamelCase names for variables.

Moreover, I would suggest avoiding auto in situations like this, because IMO spelling out the type would make the code more readable. (The coding standard leaves this question for the judgement of the developer. I tend to use auto only when the type name is already spelled out on the same line – e.g. by a dyn_cast – or it is an iterator or some similar implementation detail type.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed the autos and using CamelCase.

Comment on lines 678 to 683
// "True" flexible array members do not specify any size (`int elems[];`)
// However, some projects use "fake flexible arrays" (aka "struct hack"), where
// they specify a size of 0 or 1 to work around a compiler limitation.
// "True" flexible array members are `IncompleteArrayType` and will be skipped
// by `performCheckArrayTypeIndex`. We need an heuristic to identify "fake"
// ones.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The static analyzer already has some heuristics for recognizing flexible array members (including "fake" ones) in MemRegionManager::getStaticSize (which is defined in MemRegion.cpp). Please try to reuse that code instead of reimplementing it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is even Expr::isFlexibleArrayMemberLike, I don't know how I overlooked that, sorry.
Removed the re-invented check. Also, the checker can now leverage the flag mentioned by @steakhal.

@alejandro-alvarez-sonarsource
Copy link
Contributor Author

Unfortunately it seems that your commit duplicates several hundred lines of code

Fair point. I struggled to adapt, for instance, StateUpdateReporter without if-ing my way around, and I opted to split into a different type. Let me think how can I have it handle both indexes and maybe-index-maybe-offset nicely.

@NagyDonat
Copy link
Contributor

NagyDonat commented Sep 18, 2025

EDIT: I noticed the comment above this one only after posting this. I agree that the complexity of the existing code makes it more difficult to avoid code duplication, but it also makes the duplication more problematic, so I still think that we need to struggle against it (just with higher stakes). I wish you good luck with your efforts and I will try to help if I get any good ideas.

By the way, don't forget to rebase onto my very recently merged commit #158639 which consolidates the message generation (unifies getPrecedesMsgs and getExceedsMsgs) and might make it slightly easier to reuse the message generation functions for your "new" check.


@alejandro-alvarez-sonarsource @steakhal What do you think about the code duplication situation? What are your reasons for proposing this implementation?

If I understand correctly the "new" type-based bounds checking differs from the "old" region/extent-based logic in the following areas:

  • it performs calculations with an index instead of a byte offset,
  • it uses a different logic to calculate the array size (which will be compared with the index),
  • in multidimensional arrays (and probably other similar situations?) it has a different (more strict) understanding about the beginning of the array,
  • it reports slightly different messages (I admit that I didn't analyze the differences in detail).

To achieve these differences the current version of the PR duplicates the following components of the code:

  • the "main method" performCheck / performCheckArrayTypeIndex,
  • the helper class StateUpdateReporter / StateIndexUpdateReporter,
  • the various helper functions that produce the Messages.

Among these my rough impression is that the state update handler and message generation helpers are relatively straightforward and they could be unified by introducing a few well-placed additional parameters, boolean flags etc. (or even a BaseStateUpdateReporter class with two subclasses that implement slightly different behavior).

In the "main" methods I see that the "old" and "new" approaches begin with completely different logic for calculating the index/offset value, the base region (i.e. the "full array" to which the accessed location is compared) and the array size. However, after that I find it very unfortunate that lots of complex logic (the deeply nested ifs that check underflow and then overflow, plus past-the-end expressions, taint etc.) is duplicated between the two variants. I hope that it would be possible to factor out a "check whether this index/offset is between zero and this extent value" method which could be shared between the "new" and "old" checks.

I see that there are some additional differences in this latter part of the "main" performCheck methods, but I feel that these are not neccessary differences:

  • I would guess that the new heuristics for better fake-FAM handling would be valuable in the old code as well (although getDynamicExtent has some flexible array member heuristics);
  • I don't immediately see why can the new variant omit my "isObviouslyNonnegative" hack (and if you do have a better solution, then I would be very happy to see it applied in the "old" check as well).

Moreover I feel that we could and perhaps should reduce the amount of duplicated code by removing some parts of the old logic and keeping only the new approach:

  • Perhaps we don't need the complex old logic that handles multidimensional arrays by calculating the offset from the beginning of the whole array (by combining multiple ElementRegions) -- IIUC that logic was introduced when this checker was still using check::Location and check::Bind callbacks that provided less information than the current callbacks.
  • Perhaps we could switch to using indices instead of byte offsets by default -- because this would produce simpler symbolic expressions that would be "more useful" for the rest of the analyzer (e.g. instead of assuming 4 * x < 40 we could assume x < 10 which can also occur as e.g. an if condition).

In this review I'll try to be as constructive as possible and I'm open to investing work into refactoring this checker to get an end result that provides the features that you need with as little code duplication as possible. (Edit: I can also pragmatically accept some code duplication if that yields the overall best code.)

Note that I'm also planning significant changes in this file: I'm planning to reimplement alpha.security.ReturnPtrRange as a separate CheckerFrontend of ArrayBound (which would become a checker family) because ReturnPtrRange and ArrayBound check the exactly same thing (is this location in bounds?) in two different situations (returned pointer value vs accessed location). (In fact, I started to work on this change two days ago, but I'll probably need to restart that work because this PR affects lots of code within ArrayBoundChecker.cpp.) On a longer time horizon, I'm also planning to use the ArrayBound logic as a "backend" for the CStringOutOfBound checker, so perhaps we might also think about separating some core logic into a library-like bounds checking source file.

@alejandro-alvarez-sonarsource
Copy link
Contributor Author

@alejandro-alvarez-sonarsource @steakhal What do you think about the code duplication situation? What are your reasons for proposing this implementation?

I tend to be conservative about existing behavior because

  1. It is working!
  2. We have been relying on it for long, and even a small divergence could turn into hundreds, if not thousands, of new positives and negatives coming and going on existing projects (changes that unfortunately are not caught by lit tests)
  3. Not to mention a slight change in assumptions can cascade to other CSA checkers (both up and downstream) adding and/or removing another big set of raised issues.

Of course this is not news for you!, but this is why I have preferred to keep the surface of the changes small (at the cost of duplication).

Trade-offs, though, and I agree that I have probably lean too far into duplication this time.

If I understand correctly the "new" type-based bounds checking differs from the "old" region/extent-based logic in the following areas:

  • it performs calculations with an index instead of a byte offset,
  • it uses a different logic to calculate the array size (which will be compared with the index),
  • in multidimensional arrays (and probably other similar situations?) it has a different (more strict) understanding about the beginning of the array,

Correct. It relies purely on the type, even if we know little or nothing about the memory region.

  • it reports slightly different messages (I admit that I didn't analyze the differences in detail).

TBH this started a side effect of using indexes (so now "index" is used instead of "offset" in some issues), because I wanted to stay close to existing reports. Some details like using "subarray" to be more specific came later since now we could do it.

To achieve these differences the current version of the PR duplicates the following components of the code:

  • the "main method" performCheck / performCheckArrayTypeIndex,
  • the helper class StateUpdateReporter / StateIndexUpdateReporter,
  • the various helper functions that produce the Messages.

Among these my rough impression is that the state update handler and message generation helpers are relatively straightforward and they could be unified by introducing a few well-placed additional parameters, boolean flags etc. (or even a BaseStateUpdateReporter class with two subclasses that implement slightly different behavior).

As a first approximation, this is where most of the duplication could be removed, yes.

In the "main" methods I see that the "old" and "new" approaches begin with completely different logic for calculating the index/offset value, the base region (i.e. the "full array" to which the accessed location is compared) and the array size. However, after that I find it very unfortunate that lots of complex logic (the deeply nested ifs that check underflow and then overflow, plus past-the-end expressions, taint etc.) is duplicated between the two variants. I hope that it would be possible to factor out a "check whether this index/offset is between zero and this extent value" method which could be shared between the "new" and "old" checks.

Makes sense.

I see that there are some additional differences in this latter part of the "main" performCheck methods, but I feel that these are not neccessary differences:

  • I would guess that the new heuristics for better fake-FAM handling would be valuable in the old code as well (although getDynamicExtent has some flexible array member heuristics);
  • I don't immediately see why can the new variant omit my "isObviouslyNonnegative" hack (and if you do have a better solution, then I would be very happy to see it applied in the "old" check as well).

I didn't quite omit it (because it's great at suppressing FP). getAsCleanArraySubscriptExpr needs memory regions to be laid out in a particular way ("not modified by pointer arithmetic)", and since the new logic doesn't look at the memory region, but at the type as-is in the AST, I inlined the E->getIdx()->getType()->isUnsignedIntegerOrEnumerationType() check.

Moreover I feel that we could and perhaps should reduce the amount of duplicated code by removing some parts of the old logic and keeping only the new approach:

I have to admit this didn't cross my mind.

  • Perhaps we don't need the complex old logic that handles multidimensional arrays by calculating the offset from the beginning of the whole array (by combining multiple ElementRegions) -- IIUC that logic was introduced when this checker was still using check::Location and check::Bind callbacks that provided less information than the current callbacks.
  • Perhaps we could switch to using indices instead of byte offsets by default -- because this would produce simpler symbolic expressions that would be "more useful" for the rest of the analyzer (e.g. instead of assuming 4 * x < 40 we could assume x < 10 which can also occur as e.g. an if condition).

Can we? How about cases such as this lit test?

struct two_bytes convertedArray2(void) {
  // We report this with byte offsets because the offset is not divisible by the element size.
  struct two_bytes a = {0, 0};
  char *p = (char*)&a;
  return *((struct two_bytes*)(p + 7));
  // expected-warning@-1 {{Out of bound access to memory after the end of 'a'}}
  // expected-note@-2 {{Access of 'a' at byte offset 7, while it holds only 2 bytes}}
}

In this review I'll try to be as constructive as possible and I'm open to investing work into refactoring this checker to get an end result that provides the features that you need with as little code duplication as possible. (Edit: I can also pragmatically accept some code duplication if that yields the overall best code.)

Note that I'm also planning significant changes in this file: I'm planning to reimplement alpha.security.ReturnPtrRange as a separate CheckerFrontend of ArrayBound (which would become a checker family) because ReturnPtrRange and ArrayBound check the exactly same thing (is this location in bounds?) in two different situations (returned pointer value vs accessed location). (In fact, I started to work on this change two days ago, but I'll probably need to restart that work because this PR affects lots of code within ArrayBoundChecker.cpp.) On a longer time horizon, I'm also planning to use the ArrayBound logic as a "backend" for the CStringOutOfBound checker, so perhaps we might also think about separating some core logic into a library-like bounds checking source file.

It's quite a coincidence (or perhaps not!). I am also working in parallel in PointerArithChecker which is turning out to be very similar too (except it raises the moment the pointer is formed, and that one-after-the-end is still ok, as long as it is not derefenced). So indeed a "core bound checker" library sounds very interesting.

In fact, I started to work on this change two days ago, but I'll probably need to restart that work because this PR affects lots of code within ArrayBoundChecker.cpp.

Sorry 😅 I am thinking, maybe you go ahead with your work on those changes, since you have already progressed. I don't think I have the bandwidth to move fast enough adapting this PR as not to block you for a few weeks. I am happy to revisit these changes and rebase (or even redo) them on top of yours.

And, by the way, I would also be happy to review those changes once you submit them.

@NagyDonat
Copy link
Contributor

@alejandro-alvarez-sonarsource Thanks for the detailed answers and clarifications! 😄

I tend to be conservative about existing behavior because

1. It is working!

2. We have been relying on it for long, and even a small divergence could turn into hundreds, if not thousands, of new positives and negatives coming and going on existing projects (changes that unfortunately are not caught by lit tests)

3. Not to mention a slight change in assumptions can cascade to other CSA checkers (both up and downstream) adding and/or removing another big set of raised issues.

Of course this is not news for you!, but this is why I have preferred to keep the surface of the changes small (at the cost of duplication).

Understood, and I agree in general. However, note that ArrayBound is a new checker (I brought it out of alpha this year), so I feel that it may be a bit more malleable than the "ancient" checkers like e.g. CallAndMessage.


  • Perhaps we don't need the complex old logic that handles multidimensional arrays by calculating the offset from the beginning of the whole array (by combining multiple ElementRegions) -- IIUC that logic was introduced when this checker was still using check::Location and check::Bind callbacks that provided less information than the current callbacks.
  • Perhaps we could switch to using indices instead of byte offsets by default -- because this would produce simpler symbolic expressions that would be "more useful" for the rest of the analyzer (e.g. instead of assuming 4 * x < 40 we could assume x < 10 which can also occur as e.g. an if condition).

Can we? How about cases such as this lit test?

struct two_bytes convertedArray2(void) {
  // We report this with byte offsets because the offset is not divisible by the element size.
  struct two_bytes a = {0, 0};
  char *p = (char*)&a;
  return *((struct two_bytes*)(p + 7));
  // expected-warning@-1 {{Out of bound access to memory after the end of 'a'}}
  // expected-note@-2 {{Access of 'a' at byte offset 7, while it holds only 2 bytes}}
}

Hmm, we will probably need byte offsets as a fallback option if we need to cover cases like this. (Which is not an absolute requirement IMO -- we can accept not reporting a few rare corner cases if it is compensated by better behavior in the common case.)

[...] It's quite a coincidence (or perhaps not!). I am also working in parallel in PointerArithChecker which is turning out to be very similar too (except it raises the moment the pointer is formed, and that one-after-the-end is still ok, as long as it is not derefenced). So indeed a "core bound checker" library sounds very interesting.

It is nice to hear that you are progressing with PointerArith 😄

As my next goal in this area I will try to design the interface of the "core bound checker" library, which should be general enough to cover all the use cases. I'll try to post a discourse post about this within a few days.

In fact, I started to work on this change two days ago, but I'll probably need to restart that work because this PR affects lots of code within ArrayBoundChecker.cpp.

Sorry 😅 I am thinking, maybe you go ahead with your work on those changes, since you have already progressed. [...]

No problem 😄 collisions are natural in open source development.

Now that this discussion highlighted the question of introducing a library-like interface for bounds checking, I will probably work on that and discard my unfinished change (which would complicate the performCheck and make the separation of the library interface more difficult).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:static analyzer clang Clang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants