Skip to content

Commit e42dcc3

Browse files
Added state snapshot invariant tests (#5009)
# Description Resolves #5002 This adds the `stateSnapshotInvariant`. This is intended for expensive checks, like scanning the entire BucketList to reconcile checks. Periodically, we take a snapshot of ledger state at the end of applyLedger, spin up a background thread, and pass the snapshot over to the background thread to run checks. This is expensive from a memory perspective, so it is hidden behind the `INVARIANT_EXTRA_CHECKS` flag. The flag itself is also now only allowed for watchers. I've replaced the `start` invariant with this snapshot invariant and set a special case on startup where the snapshot invariant will run. I also rewrote the `ArchivedStateConsistency::start` invariant to use this new snapshot interface and not rely on HAS files. There's two sort of messy places I'm not super sure on. First, the failure behavior of the invariant. Currently, if a strict invariant fails, I post the failure handling back to the main thread. The function on the main thread will then throw, killing the process. This is basically what the other invariants do and I think kills the process in the way we want, but I'm not sure if it's the best way to do this. The other place I'm unsure of is `getInMemorySorobanStateForInvariantCheck`. I want to maintain the apply time phase invariants we have around `getInMemorySorobanState`, as we shouldn't be reading from it outside of the apply phase. However, we do need to copy it during the commit phase for the invariant. For now I've just made another getter with different asserts, with naming to suggest that it should only be used for invariants, but it feels like there's a better way to harden this. # Checklist - [x] Reviewed the [contributing](https://github.com/stellar/stellar-core/blob/master/CONTRIBUTING.md#submitting-changes) document - [x] Rebased on top of master (no merge commits) - [x] Ran `clang-format` v8.0.0 (via `make format` or the Visual Studio extension) - [x] Compiles - [x] Ran all tests - [ ] If change impacts performance, include supporting evidence per the [performance document](https://github.com/stellar/stellar-core/blob/master/performance-eval/performance-eval.md)
2 parents eba663a + c787eb4 commit e42dcc3

22 files changed

+528
-134
lines changed

docs/metrics.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -87,6 +87,7 @@ ledger.apply-soroban.max-clusters | counter | maximum number of cluste
8787
ledger.apply-soroban.stages | counter | number of stages used for parallel apply in a ledger
8888
ledger.catchup.duration | timer | time between entering LM_CATCHING_UP_STATE and entering LM_SYNCED_STATE
8989
ledger.invariant.failure | counter | number of times invariants failed
90+
ledger.invariant.state-snapshot-skipped | counter | number of times state snapshot invariant was skipped due to previous scan still running
9091
ledger.ledger.close | timer | time to close a ledger (excluding consensus)
9192
ledger.memory.queued-ledgers | counter | number of ledgers queued in memory for replay
9293
ledger.metastream.bytes | meter | number of bytes written per ledger into meta-stream

docs/stellar-core_example.cfg

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -516,6 +516,13 @@ INVARIANT_CHECKS = [ "AccountSubEntriesCountIsValid",
516516
"SponsorshipCountIsValid" ]
517517

518518

519+
# STATE_SNAPSHOT_INVARIANT_LEDGER_FREQUENCY (integer) defaults to 300
520+
# Frequency (in seconds) at which expensive state snapshot invariants should run.
521+
# State snapshot invariants perform comprehensive checks that scan the entire
522+
# bucket list, which are too expensive to perform on every ledger.
523+
STATE_SNAPSHOT_INVARIANT_LEDGER_FREQUENCY=300
524+
525+
519526
# MANUAL_CLOSE (true or false) defaults to false
520527
# Mode for testing. Ledger will only close when stellar-core gets
521528
# the `manualclose` command

src/invariant/ArchivedStateConsistency.cpp

Lines changed: 56 additions & 67 deletions
Original file line numberDiff line numberDiff line change
@@ -3,9 +3,9 @@
33
// of this distribution or at http://www.apache.org/licenses/LICENSE-2.0
44

55
#include "invariant/ArchivedStateConsistency.h"
6-
#include "bucket/BucketManager.h"
76
#include "bucket/BucketSnapshot.h"
87
#include "bucket/BucketSnapshotManager.h"
8+
#include "bucket/HotArchiveBucket.h"
99
#include "bucket/LedgerCmp.h"
1010
#include "invariant/InvariantManager.h"
1111
#include "ledger/LedgerManager.h"
@@ -17,100 +17,89 @@
1717
#include "util/XDRCereal.h"
1818
#include "util/types.h"
1919
#include <fmt/format.h>
20+
#include <vector>
2021

2122
namespace stellar
2223
{
2324
ArchivedStateConsistency::ArchivedStateConsistency() : Invariant(true)
2425
{
2526
}
2627

27-
std::string
28-
ArchivedStateConsistency::start(Application& app)
28+
bool
29+
ArchivedStateConsistency::usesStateSnapshotInvariant() const
2930
{
30-
releaseAssert(threadIsMain());
31-
LogSlowExecution logSlow("ArchivedStateConsistency startup");
31+
return true;
32+
}
3233

33-
if (!app.getConfig().INVARIANT_EXTRA_CHECKS)
34-
{
35-
CLOG_INFO(Invariant,
36-
"Skipping ArchivedStateConsistency startup check - "
37-
"INVARIANT_EXTRA_CHECKS is disabled");
38-
return std::string{};
39-
}
34+
// This test iterates through both the live and archived bucket lists and checks
35+
// that no entry is live in both BucketLists simultaneously.
36+
std::string
37+
ArchivedStateConsistency::stateSnapshotInvariant(
38+
CompleteConstLedgerStatePtr ledgerState,
39+
InMemorySorobanState const& inMemorySnapshot)
40+
{
41+
LogSlowExecution logSlow("ArchivedStateConsistency::stateSnapshotInvariant",
42+
LogSlowExecution::Mode::AUTOMATIC_RAII, "took",
43+
std::chrono::seconds(30));
4044

41-
auto protocolVersion =
42-
app.getLedgerManager().getLastClosedLedgerHeader().header.ledgerVersion;
45+
auto liveSnapshot = ledgerState->getBucketSnapshot();
46+
auto hotArchiveSnapshot = ledgerState->getHotArchiveSnapshot();
47+
auto const& header = liveSnapshot->getLedgerHeader();
4348
if (protocolVersionIsBefore(
44-
protocolVersion,
49+
header.ledgerVersion,
4550
LiveBucket::FIRST_PROTOCOL_SUPPORTING_PERSISTENT_EVICTION))
4651
{
47-
CLOG_INFO(Invariant,
48-
"Skipping ArchivedStateConsistency invariant for "
49-
"protocol version {}",
50-
protocolVersion);
5152
return std::string{};
5253
}
5354

54-
CLOG_INFO(Invariant, "Starting ArchivedStateConsistency invariant");
55-
auto has = app.getLedgerManager().getLastClosedLedgerHAS();
55+
auto const ARCHIVAL_ENTRY_TYPES = std::set{CONTRACT_CODE, CONTRACT_DATA};
5656

57-
std::map<LedgerKey, LedgerEntry> archived =
58-
app.getBucketManager().loadCompleteHotArchiveState(has);
59-
60-
// Get live snapshot for iterating through buckets
61-
auto liveSnapshot = app.getBucketManager()
62-
.getBucketSnapshotManager()
63-
.copySearchableLiveBucketListSnapshot();
64-
65-
// Track which keys we've already seen in live buckets (from level 0 upward)
66-
// to avoid checking duplicates
6757
UnorderedSet<LedgerKey> seenKeys;
68-
69-
// Iterate through live buckets from level 0 upward and check if any key
70-
// also exists in archived state
71-
std::string result;
72-
liveSnapshot->loopAllBuckets([&](LiveBucketSnapshot const& bucketSnapshot) {
73-
LiveBucketInputIterator it(bucketSnapshot.getRawBucket());
74-
while (it && result.empty())
75-
{
76-
BucketEntry const& e = *it;
77-
if (e.type() == LIVEENTRY || e.type() == INITENTRY)
58+
std::string errorMsg;
59+
60+
// For each entry in the Live BucketList, check if we have seen it in a
61+
// previous level. If not, this entry is the newest version, so check if it
62+
// exists in the Hot Archive.
63+
auto checkIfLiveEntryInArchive =
64+
[&seenKeys, &errorMsg, &hotArchiveSnapshot](BucketEntry const& be) {
65+
if (be.type() == LIVEENTRY || be.type() == INITENTRY)
7866
{
79-
auto key = LedgerEntryKey(e.liveEntry());
67+
auto lk = LedgerEntryKey(be.liveEntry());
68+
auto [_, wasInserted] = seenKeys.emplace(lk);
8069

81-
// Skip if we've already seen this key in a more recent
82-
// bucket
83-
if (seenKeys.find(key) == seenKeys.end())
70+
// If this BucketEntry is not shadowed, and the key exists in
71+
// the Hot Archive, we have an error.
72+
if (wasInserted && hotArchiveSnapshot->load(lk))
8473
{
85-
seenKeys.insert(key);
86-
87-
// Check if this key also exists in archived state
88-
if (archived.find(key) != archived.end())
89-
{
90-
result = fmt::format(
91-
FMT_STRING(
92-
"ArchivedStateConsistency: Entry with the "
93-
"same key is present in both live and "
94-
"archived state. Key: {}"),
95-
xdrToCerealString(key, "entry_key"));
96-
}
74+
errorMsg = fmt::format(
75+
FMT_STRING("ArchivedStateConsistency invariant failed: "
76+
"Live entry is present in both live and "
77+
"archived state: {}"),
78+
xdrToCerealString(lk, "entry_key"));
79+
return Loop::COMPLETE;
9780
}
9881
}
99-
else
82+
// Mark DEADENTRY as seen, but the key does not exist wrt ledger
83+
// state, so we don't need to check the Hot Archive.
84+
else if (be.type() == DEADENTRY)
10085
{
101-
seenKeys.insert(e.deadEntry());
86+
seenKeys.emplace(be.deadEntry());
10287
}
103-
++it;
104-
}
105-
return result.empty() ? Loop::INCOMPLETE : Loop::COMPLETE;
106-
});
88+
return Loop::INCOMPLETE;
89+
};
10790

108-
if (!result.empty())
91+
// We just need to check for Soroban types that are stored in the Hot
92+
// Archive
93+
for (auto const& type : ARCHIVAL_ENTRY_TYPES)
10994
{
110-
return result;
95+
liveSnapshot->scanForEntriesOfType(type, checkIfLiveEntryInArchive);
96+
if (!errorMsg.empty())
97+
{
98+
return errorMsg;
99+
}
100+
seenKeys.clear();
111101
}
112102

113-
CLOG_INFO(Invariant, "ArchivedStateConsistency invariant passed");
114103
return std::string{};
115104
}
116105

@@ -540,4 +529,4 @@ ArchivedStateConsistency::checkRestoreInvariants(
540529

541530
return std::string{};
542531
}
543-
};
532+
};

src/invariant/ArchivedStateConsistency.h

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -42,6 +42,10 @@ class ArchivedStateConsistency : public Invariant
4242
UnorderedMap<LedgerKey, LedgerEntry> const& restoredFromLiveState)
4343
override;
4444

45-
virtual std::string start(Application& app) override;
45+
bool usesStateSnapshotInvariant() const override;
46+
47+
std::string stateSnapshotInvariant(
48+
CompleteConstLedgerStatePtr ledgerState,
49+
InMemorySorobanState const& inMemorySnapshot) override;
4650
};
4751
}

src/invariant/Invariant.h

Lines changed: 9 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -6,9 +6,9 @@
66

77
#include "bucket/BucketSnapshotManager.h"
88
#include "bucket/BucketUtils.h"
9+
#include "ledger/LedgerStateSnapshot.h"
910
#include "xdr/Stellar-ledger.h"
1011
#include <cstdint>
11-
#include <functional>
1212
#include <memory>
1313
#include <string>
1414
#include <unordered_set>
@@ -84,8 +84,15 @@ class Invariant
8484
return std::string{};
8585
}
8686

87+
virtual bool
88+
usesStateSnapshotInvariant() const
89+
{
90+
return false;
91+
}
92+
8793
virtual std::string
88-
start(Application& app)
94+
stateSnapshotInvariant(CompleteConstLedgerStatePtr ledgerState,
95+
InMemorySorobanState const& inMemorySnapshot)
8996
{
9097
return std::string{};
9198
}

src/invariant/InvariantManager.h

Lines changed: 25 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,7 @@ class AppConnector;
1515
class Application;
1616
class Bucket;
1717
class Invariant;
18+
class LedgerManager;
1819
struct EvictedStateVectors;
1920
struct LedgerTxnDelta;
2021
struct Operation;
@@ -60,13 +61,36 @@ class InvariantManager
6061
UnorderedMap<LedgerKey, LedgerEntry> const& restoredFromArchive,
6162
UnorderedMap<LedgerKey, LedgerEntry> const& restoredFromLiveState) = 0;
6263

64+
// This is used for expensive invariants that can't run in a blocking
65+
// fashion, such as invariants that require scanning the entire BucketList.
66+
// The invariant will periodically run on a background thread against the
67+
// given ledger state snapshot. These invariants will only run if
68+
// INVARIANT_EXTRA_CHECKS is enabled.
69+
virtual void
70+
runStateSnapshotInvariant(CompleteConstLedgerStatePtr ledgerState,
71+
InMemorySorobanState const& inMemorySnapshot) = 0;
72+
6373
virtual void registerInvariant(std::shared_ptr<Invariant> invariant) = 0;
6474

6575
virtual void enableInvariant(std::string const& name) = 0;
6676

67-
virtual void start(Application& app) = 0;
77+
virtual void start(LedgerManager const& ledgerManager) = 0;
78+
79+
virtual bool shouldRunInvariantSnapshot() const = 0;
80+
81+
// Copy InMemorySorobanState for invariant checking. This is the only
82+
// method that can access the private copy constructor of
83+
// InMemorySorobanState.
84+
virtual std::shared_ptr<InMemorySorobanState const>
85+
copyInMemorySorobanStateForInvariant(
86+
InMemorySorobanState const& state) const = 0;
6887

6988
#ifdef BUILD_TESTS
89+
// Blocks until any running snapshot invariant scan completes. Used in
90+
// testing mode when ALWAYS_RUN_SNAPSHOT_FOR_TESTING is true to ensure
91+
// scans complete before proceeding to the next ledger.
92+
virtual void waitForScanToCompleteForTesting() const = 0;
93+
7094
virtual void snapshotForFuzzer() = 0;
7195
virtual void resetForFuzzer() = 0;
7296
#endif // BUILD_TESTS

0 commit comments

Comments
 (0)