Skip to content
Original file line number Diff line number Diff line change
Expand Up @@ -259,7 +259,7 @@ class FindUninitializedField {
if (T->getAsStructureType()) {
if (Find(FR))
return true;
} else {
} else if (!I->isUnnamedBitField()){
SVal V = StoreMgr.getBinding(store, loc::MemRegionVal(FR));
if (V.isUndef())
return true;
Expand Down
15 changes: 14 additions & 1 deletion clang/lib/StaticAnalyzer/Core/RegionStore.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -2122,8 +2122,21 @@ SVal RegionStoreManager::getBindingForField(RegionBindingsConstRef B,
if (const std::optional<SVal> &V = B.getDirectBinding(R))
return *V;

// If the containing record was initialized, try to get its constant value.
// UnnamedBitField is always Undefined unless using memory operation such
// as 'memset'.
// For example, for code
// typedef struct {
// int i :2;
// int :30; // unnamed bit-field
// } A;
// A a = {1};
// The bits of the unnamed bit-field in local variable a can be anything.
const FieldDecl *FD = R->getDecl();
if (FD->isUnnamedBitField()) {
return UndefinedVal();
}

// If the containing record was initialized, try to get its constant value.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assuming the CallAndMessageChecker is patched, do we need this patch here?
I'd rather not touch this code as it's really sensitive. And btw, reading from the Store by default gives you UndefinedVal so I'm not sure what case this helps with. For example, a memset(0) should also zero the padding bytes, thus if we happen to read that padding byte via a char* the Store should still model it and return the correct value instead of handing back UndefinedVal.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To solve the false warning problem with unnamed bit-field, patch here is unnecessary.
However, I do not think getBinding returning SymbolVal is the correct result, which is the current behavior when parsing the source as c++. To my understanding, SymbolVal means it is initialized, but somehow the static analyzer cannot infer the value, while UndefinedVal means the value it stores can be anything and reading from it is an UB. Unnamed bit-field is the second case.
I understand patching here may bring influences to other usages, even though the test of check-clang-analysis does not show any. So if you think that current implementation is incorrect but we'd better keep it before fully evaluating the influences, let me leave a FIXME comment here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your understanding of UndefinedVal is correct, unlike with SymbolVal. Symbols (SymbolVal aka. SymExpr) we track values. We may or may not know anything about these symbols (most notably the value range that a symbol can hold). More importantly, we can combine such symbols into making larger symbols, basically embedding the history of the computation that the given variable holds at any given point in time. But this is likely not important here.

This is a critical component, so we don't accept patches without tests. Even tests are not enough to demonstrate correctness, thus we frequently ask for "measurements", or running differential analysis with and without a patch and observing the outcomes of many many real-world projects to have a better picture of what the implications are.

Frequently even doing the correct thing reveals untended other bugs that are actually worse than what we initially wanted to fix, thus effectively preventing us from doing the right thing. Don't worry, this is not the case with the CallAndMessageChecker.

You can propose a FIXME, but without more context it can do more harm than good if put at the wrong place with a misleading content. So to approve that, we will need to do some digging where the Symbol is coming from and why do we have that Symbol instead of Undef there?
Otherwise we are better off not having this FIXME I think.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can provide some evidence where the UndefinedVal(in c) and the SymbolVal(in c++) are from.
After reverting all my changes in CallAndMessageChecker.cpp and RegionStore.cpp, I use watchpoints to find where the return values of the invoking StoreMgr.getBinding(store, loc::MemRegionVal(FR)) from:

Following is the test code:

// unnamed.c
struct B {
    int i : 2;
    int : 30;  // unnamed bit-field
};

extern void consume_B(struct B);

void bitfield_B_init(void) {
    struct B b1;
    b1.i = 1; // b1 is initialized
    consume_B(b1);
}

When analyzing the test code as c, via clang -cc1 -x c -analyze -analyzer-checker=core unnamed.c, I get the stackframes:

1# RegionStoreManager::getBindingForFieldOrElementCommon(const RegionBindingsRef &, const clang::ento::TypedValueRegion *, QualType) RegionStore.cpp:2312
2# RegionStoreManager::getBindingForField(const RegionBindingsRef &, const clang::ento::FieldRegion *) RegionStore.cpp:2170
3# RegionStoreManager::getBinding(const RegionBindingsRef &, Loc, QualType) RegionStore.cpp:1617
4# RegionStoreManager::getBinding(const void *, Loc, QualType) RegionStore.cpp:711
5# FindUninitializedField::Find(const clang::ento::TypedValueRegion *) CallAndMessageChecker.cpp:263
......
// in FindUninitializedField::Find      
      for (const auto *I : RD->fields()) {
        const FieldRegion *FR = MrMgr.getFieldRegion(I, R);
        FieldChain.push_back(I);
        T = I->getType();
        if (T->getAsStructureType()) {
          if (Find(FR))
            return true;
        } else {
          SVal V = StoreMgr.getBinding(store, loc::MemRegionVal(FR));
          if (V.isUndef())
            return true;
        }
        FieldChain.pop_back();
      }

When I switching to frame 5 and print the FR->getRawMemorySpace(), I get the type clang::ento::StackLocalsSpaceRegion *

While analyzing the test code as c++, via clang -cc1 -x c ++-analyze -analyzer-checker=core unnamed.c, I get the stackframes:

RegionStoreManager::getBindingForFieldOrElementCommon(const RegionBindingsRef &, const clang::ento::TypedValueRegion *, QualType) RegionStore.cpp:2317
RegionStoreManager::getBindingForField(const RegionBindingsRef &, const clang::ento::FieldRegion *) RegionStore.cpp:2170
RegionStoreManager::getBinding(const RegionBindingsRef &, Loc, QualType) RegionStore.cpp:1617
RegionStoreManager::getBinding(const void *, Loc, QualType) RegionStore.cpp:711
FindUninitializedField::Find(const clang::ento::TypedValueRegion *) CallAndMessageChecker.cpp:263

When I switching to frame 5 and print the FR->getRawMemorySpace(), I get the type clang::ento::StackArgumentsSpaceRegion *

The raw memory space of b1 is different in c and c++, I wonder it due to the implicit copy constructor in c++?
And the difference on FR->getRawMemorySpace() finally reaches the if check in RegionStoreManager::getBindingForFieldOrElementCommon and returns a different result.

// line 2288 in RegionStore.cpp
  if (isa<StackLocalsSpaceRegion>(R->getRawMemorySpace())) {
    if (isa<ElementRegion>(R)) {

The last statement in getBindingForFieldOrElementCommon seems a default handle, where the c++ unnamed bit-field gets its binding.

  // All other values are symbolic.
  return svalBuilder.getRegionValueSymbolVal(R);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From reading this, it seems to me that StackArgumentsSpaceRegion is wrong, as this is not an argument.
What was R->dump() in that case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@steakhal I have tested the check-clang-analysis, no more testcases break

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say then let's remove it. CompoundVals and LCVs should behave exactly the same way wrt. the bug reports.
When proposing that change, could you look into git blame to see if we miss something about the history of that continue?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The continue line is added in commit 3720e2f, when the function tryBindSmallStruct is added. The commit log illustrates the reason for binding a LazyCompoundVal to simple struct by memberwise copy, but discusses nothing about the unnamed bit-field.
I'll make a new pull-request to fix this soon.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the idea was that a copy in C++ means a member-wise copy. And there anonym bitfields don't count.
Consequently, the padding bits may or may not get copied in a copy operation, thus they wanted to having it Undef.
I don't think it's important. We can drop this part.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to the c++ standard, "Unnamed bit-fields are not members and cannot be initialized(https://eel.is/c++draft/class.bit#2.sentence-2)", and "The implicitly-defined copy/move constructor for a non-union class X performs a memberwise copy/move of its bases and members(https://eel.is/c++draft/class.copy.ctor#14.sentence-1)". So strictly speaking, an unnamed bit-field can be anything, since the memberwise copy is not required for unnamed bits.

Consider the case:

struct B {
    int i : 2;
    int : 30; 
};

void bitfield_B_init(void) {
    B b1;
    b1.i = 1; 
    memset(&b1, 0, sizeof(b1));
    B b2 = b1;   
}

The unnamed bits is set after the memset. So for the statement B b2 = b1, the unnamed bitfield in b1 is supposed to be SymbolVal, while the unnamed bitfields in b2 should be UndefinedVal.

Back to the code, neithor keeping nor removing the continue lines in tryBindSmallStruct make a perfect handling on unnamed bit-fields. A perfect rule maybe that unnamed bit-fileds must be undefined after being implicitly-defined constructed. Considering the stackframes I got before and the little benefit it could bring, implementing the rule seems to be compilicated and potentially harmful. So perhaps we should leave it unchanged?

QualType Ty = FD->getType();
const MemRegion* superR = R->getSuperRegion();
if (const auto *VR = dyn_cast<VarRegion>(superR)) {
Expand Down
27 changes: 26 additions & 1 deletion clang/test/Analysis/call-and-message.c
Original file line number Diff line number Diff line change
@@ -1,12 +1,19 @@
// RUN: %clang_analyze_cc1 %s -verify \
// RUN: -analyzer-checker=core \
// RUN: -analyzer-config core.CallAndMessage:ArgPointeeInitializedness=true \
// RUN: -analyzer-config core.CallAndMessage:ArgInitializedness=false \
// RUN: -analyzer-output=plist -o %t.plist
// RUN: cat %t.plist | FileCheck %s

// RUN: %clang_analyze_cc1 %s -verify=no-pointee \
// RUN: -analyzer-checker=core \
// RUN: -analyzer-config core.CallAndMessage:ArgPointeeInitializedness=false
// RUN: -analyzer-config core.CallAndMessage:ArgPointeeInitializedness=false \
// RUN: -analyzer-config core.CallAndMessage:ArgInitializedness=false

// RUN: %clang_analyze_cc1 %s -verify=arg-init \
// RUN: -analyzer-checker=core \
// RUN: -analyzer-config core.CallAndMessage:ArgPointeeInitializedness=false \
// RUN: -analyzer-config core.CallAndMessage:ArgInitializedness=true

// no-pointee-no-diagnostics

Expand All @@ -22,3 +29,21 @@ void pointee_uninit(void) {
// checker, as described in the CallAndMessage comments!
// CHECK: <key>issue_hash_content_of_line_in_context</key>
// CHECK-SAME: <string>97a74322d64dca40aa57303842c745a1</string>

typedef struct {
int i :2;
int :30; // unnamed bit-field
} B;

extern void consume_B(B);

void bitfield_B_init(void) {
B b1;
b1.i = 1; // b1 is initialized
consume_B(b1);
}

void bitfield_B_uninit(void) {
B b2;
consume_B(b2); // arg-init-warning{{Passed-by-value struct argument contains uninitialized data (e.g., field: 'i') [core.CallAndMessage]}}
}
16 changes: 16 additions & 0 deletions clang/test/Analysis/call-and-message.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -169,4 +169,20 @@ void record_uninit() {
// CHECK-SAME: <string>a46bb5c1ee44d4611ffeb13f7f499605</string>
// CHECK: <key>issue_hash_content_of_line_in_context</key>
// CHECK-SAME: <string>e0e0d30ea5a7b2e3a71e1931fa0768a5</string>

struct B{
int i :2;
int :30; // unnamed bit-field
};

void bitfield_B_init(void) {
B b1;
b1.i = 1; // b1 is initialized
consume(b1);
}

void bitfield_B_uninit(void) {
B b2;
consume(b2); // arg-init-warning{{Passed-by-value struct argument contains uninitialized data (e.g., field: 'i') [core.CallAndMessage]}}
}
} // namespace uninit_arg
Loading