Skip to content

Commit 96ed3eb

Browse files
BenHuddlestondaverigby
authored andcommitted
MB-34017: Optimize warmup - Only warmup prepares from HCS to HPS
For Durability, we have introduced a new LoadPrepare phase at Warmup. That is necessary for loading pending Prepares from disk and inserting them into memory structures (ie, HashTable, CheckpointManager, DurabilityMonitor) for leading them to completion. Given that we need to re-process only Prepares that have not been completed (ie, Committed or Aborted), then we can safely start the LoadPrepare scan from the HCS (excluded) onward. That's because by definition every Prepare before or at HCS has been completed. After introducing the LoadPrepare phase (and before this change) we have seen an increase of 100% on the total Warmup runtime. That is because the first implementation of the LoadPrepare phase starts the scan at seqno=0. Thus, the full Warmup performs two full scans of the entire seqno-index. This patch addresses the issue. We also do not load any prepares when HCS == HPS as every prepare has been completed. Change-Id: Iaf310fe5d7f508303d05d1f5a9632b9dfcf368a7 Reviewed-on: http://review.couchbase.org/113267 Reviewed-by: James Harrison <[email protected]> Reviewed-by: Dave Rigby <[email protected]> Tested-by: Build Bot <[email protected]>
1 parent a2b7748 commit 96ed3eb

File tree

8 files changed

+125
-26
lines changed

8 files changed

+125
-26
lines changed

engines/ep/src/ep_bucket.cc

Lines changed: 61 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -1366,14 +1366,28 @@ void EPBucket::rollbackUnpersistedItems(VBucket& vb, int64_t rollbackSeqno) {
13661366
// At the end of the scan, all outstanding Prepared items (which did not
13671367
// have a Commit persisted to disk) will be registered with the Durability
13681368
// Monitor.
1369-
void EPBucket::loadPreparedSyncWrites(
1369+
EPBucket::LoadPreparedSyncWritesResult EPBucket::loadPreparedSyncWrites(
13701370
folly::SharedMutex::WriteHolder& vbStateLh, VBucket& vb) {
13711371
/// Disk load callback for scan.
13721372
struct LoadSyncWrites : public StatusCallback<GetValue> {
1373-
LoadSyncWrites(EPVBucket& vb) : vb(vb) {
1373+
LoadSyncWrites(EPVBucket& vb, uint64_t highPreparedSeqno)
1374+
: vb(vb), highPreparedSeqno(highPreparedSeqno) {
13741375
}
13751376

13761377
void callback(GetValue& val) override {
1378+
// Abort the scan early if we have passed the HPS as we don't need
1379+
// to load any more prepares.
1380+
if (val.item->getBySeqno() >
1381+
static_cast<int64_t>(highPreparedSeqno)) {
1382+
// ENOMEM may seem like an odd status code to abort the scan but
1383+
// disk backfill to a given seqno also returns ENGINE_ENOMEM
1384+
// when it has received all the seqnos that it cares about to
1385+
// abort the scan.
1386+
setStatus(ENGINE_ENOMEM);
1387+
return;
1388+
}
1389+
1390+
itemsVisited++;
13771391
if (val.item->isPending()) {
13781392
// Pending item which was not aborted (deleted). Add to
13791393
// outstanding Prepare map.
@@ -1392,6 +1406,13 @@ void EPBucket::loadPreparedSyncWrites(
13921406

13931407
EPVBucket& vb;
13941408

1409+
// HPS after which we can abort the scan
1410+
uint64_t highPreparedSeqno = std::numeric_limits<uint64_t>::max();
1411+
1412+
// Number of items our callback "visits". Used to validate how many
1413+
// items we look at when loading SyncWrites.
1414+
uint64_t itemsVisited = 0;
1415+
13951416
/// Map of Document key -> outstanding (not yet Committed / Aborted)
13961417
/// prepares.
13971418
std::unordered_map<StoredDocKey, std::unique_ptr<Item>>
@@ -1401,18 +1422,39 @@ void EPBucket::loadPreparedSyncWrites(
14011422
auto& epVb = dynamic_cast<EPVBucket&>(vb);
14021423
const auto start = std::chrono::steady_clock::now();
14031424

1404-
// @TODO MB-34017: We can optimise this by starting the scan at the
1405-
// high_committed_seqno - all earlier prepares would have been committed
1406-
// (or were aborted) and only scanning up to the high prepared seqno.
1407-
uint64_t startSeqno = 0;
1408-
14091425
// Get the kvStore. Using the RW store as the rollback code that will call
14101426
// this function will modify vbucket_state that will only be reflected in
14111427
// RW store. For warmup case, we don't allow writes at this point in time
14121428
// anyway.
14131429
auto* kvStore = getRWUnderlyingByShard(epVb.getShard()->getId());
14141430

1415-
auto storageCB = std::make_shared<LoadSyncWrites>(epVb);
1431+
// Need the HPS/HCS so the DurabilityMonitor can be fully resumed
1432+
auto vbState = kvStore->getVBucketState(epVb.getId());
1433+
if (!vbState) {
1434+
throw std::logic_error("EPBucket::loadPreparedSyncWrites: processing " +
1435+
epVb.getId().to_string() +
1436+
", but found no vbucket_state");
1437+
}
1438+
1439+
// Insert all outstanding Prepares into the VBucket (HashTable &
1440+
// DurabilityMonitor).
1441+
std::vector<queued_item> prepares;
1442+
if (vbState->highPreparedSeqno == vbState->highCompletedSeqno) {
1443+
// We don't need to warm up anything for this vBucket as all of our
1444+
// prepares have been completed, but we do need to create the DM
1445+
// with our vbucket_state.
1446+
epVb.loadOutstandingPrepares(vbStateLh, *vbState, std::move(prepares));
1447+
// No prepares loaded
1448+
return {0, 0};
1449+
}
1450+
1451+
// We optimise this step by starting the scan at the seqno following the
1452+
// High Completed Seqno. By definition, all earlier prepares have been
1453+
// completed (Committed or Aborted).
1454+
const uint64_t startSeqno = vbState->highCompletedSeqno + 1;
1455+
1456+
auto storageCB =
1457+
std::make_shared<LoadSyncWrites>(epVb, vbState->highPreparedSeqno);
14161458

14171459
// Don't expect to find anything already in the HashTable, so use
14181460
// NoLookupCallback.
@@ -1434,11 +1476,17 @@ void EPBucket::loadPreparedSyncWrites(
14341476
EP_LOG_CRITICAL(
14351477
"EPBucket::loadPreparedSyncWrites: scanCtx is null for {}",
14361478
epVb.getId());
1437-
return;
1479+
// No prepares loaded
1480+
return {0, 0};
14381481
}
14391482

14401483
auto scanResult = kvStore->scan(scanCtx);
1441-
Expects(scanResult == scan_success);
1484+
1485+
// If we abort our scan early due to reaching the HPS then the scan result
1486+
// will be failure but we will have scanned correctly.
1487+
if (storageCB->getStatus() != ENGINE_ENOMEM) {
1488+
Expects(scanResult == scan_success);
1489+
}
14421490

14431491
kvStore->destroyScanContext(scanCtx);
14441492

@@ -1451,7 +1499,7 @@ void EPBucket::loadPreparedSyncWrites(
14511499

14521500
// Insert all outstanding Prepares into the VBucket (HashTable &
14531501
// DurabilityMonitor).
1454-
std::vector<queued_item> prepares;
1502+
prepares.reserve(storageCB->outstandingPrepares.size());
14551503
for (auto& prepare : storageCB->outstandingPrepares) {
14561504
prepares.emplace_back(std::move(prepare.second));
14571505
}
@@ -1461,15 +1509,9 @@ void EPBucket::loadPreparedSyncWrites(
14611509
return a->getBySeqno() < b->getBySeqno();
14621510
});
14631511

1464-
// Need the HPS/HCS so the DurabilityMonitor can be fully resumed
1465-
auto vbState = kvStore->getVBucketState(epVb.getId());
1466-
if (!vbState) {
1467-
throw std::logic_error("EPBucket::loadPreparedSyncWrites: processing " +
1468-
epVb.getId().to_string() +
1469-
", but found no vbucket_state");
1470-
}
1471-
1512+
auto numPrepares = prepares.size();
14721513
epVb.loadOutstandingPrepares(vbStateLh, *vbState, std::move(prepares));
1514+
return {storageCB->itemsVisited, numPrepares};
14731515
}
14741516

14751517
ValueFilter EPBucket::getValueFilterForCompressionMode() {

engines/ep/src/ep_bucket.h

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -137,8 +137,8 @@ class EPBucket : public KVBucket {
137137

138138
void rollbackUnpersistedItems(VBucket& vb, int64_t rollbackSeqno) override;
139139

140-
void loadPreparedSyncWrites(folly::SharedMutex::WriteHolder& vbStateLh,
141-
VBucket& vb) override;
140+
LoadPreparedSyncWritesResult loadPreparedSyncWrites(
141+
folly::SharedMutex::WriteHolder& vbStateLh, VBucket& vb) override;
142142

143143
/**
144144
* Returns the ValueFilter to use for KVStore scans, given the bucket

engines/ep/src/ephemeral_bucket.h

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -104,9 +104,10 @@ class EphemeralBucket : public KVBucket {
104104
// No op
105105
}
106106

107-
void loadPreparedSyncWrites(folly::SharedMutex::WriteHolder& vbStateLh,
108-
VBucket& vb) override {
109-
// No op
107+
LoadPreparedSyncWritesResult loadPreparedSyncWrites(
108+
folly::SharedMutex::WriteHolder& vbStateLh, VBucket& vb) override {
109+
// No op, return 0 prepares loaded
110+
return {0, 0};
110111
}
111112

112113
void notifyNewSeqno(const Vbid vbid, const VBNotifyCtx& notifyCtx) override;

engines/ep/src/kv_bucket_iface.h

Lines changed: 11 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -787,6 +787,14 @@ class KVBucketIface {
787787
*/
788788
virtual bool isGetAllKeysSupported() const = 0;
789789

790+
/**
791+
* Result of the loadPreparedSyncWrites function
792+
*/
793+
struct LoadPreparedSyncWritesResult {
794+
uint64_t itemsVisited = 0;
795+
uint64_t preparesLoaded = 0;
796+
};
797+
790798
protected:
791799

792800
/**
@@ -834,8 +842,10 @@ class KVBucketIface {
834842
*
835843
* @param vbStateLh vBucket state lock
836844
* @param vb vBucket for which we will load SyncWrites
845+
*
846+
* @returns number of prepares loaded
837847
*/
838-
virtual void loadPreparedSyncWrites(
848+
virtual LoadPreparedSyncWritesResult loadPreparedSyncWrites(
839849
folly::SharedMutex::WriteHolder& vbStateLh, VBucket& vb) = 0;
840850

841851
// During the warmup phase we might want to enable external traffic

engines/ep/src/stats.cc

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -27,6 +27,8 @@
2727
EPStats::EPStats()
2828
: warmedUpKeys(0),
2929
warmedUpValues(0),
30+
warmedUpPrepares(0),
31+
warmupItemsVisitedWhilstLoadingPrepares(0),
3032
warmDups(0),
3133
warmOOM(0),
3234
warmupMemUsedCap(0),

engines/ep/src/stats.h

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -137,6 +137,10 @@ class EPStats {
137137
Counter warmedUpKeys;
138138
//! Number of key-values warmed up during data loading.
139139
Counter warmedUpValues;
140+
//! Number of prepares warmed up.
141+
Counter warmedUpPrepares;
142+
//! Number of items visited whilst loading prepares
143+
Counter warmupItemsVisitedWhilstLoadingPrepares;
140144
//! Number of warmup failures due to duplicates
141145
Counter warmDups;
142146
//! Number of OOM failures at warmup time.

engines/ep/src/warmup.cc

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1223,7 +1223,13 @@ void Warmup::loadPreparedSyncWrites(uint16_t shardId) {
12231223
// for rollback.
12241224
auto& vb = *(itr->second);
12251225
folly::SharedMutex::WriteHolder vbStateLh(vb.getStateLock());
1226-
store.loadPreparedSyncWrites(vbStateLh, vb);
1226+
1227+
auto result = store.loadPreparedSyncWrites(vbStateLh, vb);
1228+
store.getEPEngine()
1229+
.getEpStats()
1230+
.warmupItemsVisitedWhilstLoadingPrepares += result.itemsVisited;
1231+
store.getEPEngine().getEpStats().warmedUpPrepares +=
1232+
result.preparesLoaded;
12271233
}
12281234

12291235
if (++threadtask_count == store.vbMap.getNumShards()) {

engines/ep/tests/module_tests/evp_store_warmup_test.cc

Lines changed: 34 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -655,6 +655,13 @@ void DurabilityWarmupTest::testPendingSyncWrite(
655655

656656
// DurabilityMonitor be tracking the prepare.
657657
EXPECT_EQ(++numTracked, vb->getDurabilityMonitor().getNumTracked());
658+
659+
EXPECT_EQ(numTracked,
660+
store->getEPEngine().getEpStats().warmedUpPrepares);
661+
EXPECT_EQ(numTracked,
662+
store->getEPEngine()
663+
.getEpStats()
664+
.warmupItemsVisitedWhilstLoadingPrepares);
658665
}
659666
}
660667

@@ -731,6 +738,13 @@ void DurabilityWarmupTest::testCommittedSyncWrite(
731738

732739
// DurabilityMonitor should be empty as no outstanding prepares.
733740
EXPECT_EQ(--numTracked, vb->getDurabilityMonitor().getNumTracked());
741+
742+
EXPECT_EQ(numTracked,
743+
store->getEPEngine().getEpStats().warmedUpPrepares);
744+
EXPECT_EQ(numTracked,
745+
store->getEPEngine()
746+
.getEpStats()
747+
.warmupItemsVisitedWhilstLoadingPrepares);
734748
}
735749
}
736750

@@ -776,6 +790,11 @@ void DurabilityWarmupTest::testCommittedAndPendingSyncWrite(
776790
setVBucketStateAndRunPersistTask(vbid, vbState);
777791
}
778792
resetEngineAndWarmup();
793+
EXPECT_EQ(1, store->getEPEngine().getEpStats().warmedUpPrepares);
794+
EXPECT_EQ(2,
795+
store->getEPEngine()
796+
.getEpStats()
797+
.warmupItemsVisitedWhilstLoadingPrepares);
779798

780799
// Should load two items into memory - both committed and the pending value.
781800
// Check the original committed value is inaccessible due to the pending
@@ -859,6 +878,11 @@ TEST_P(DurabilityWarmupTest, AbortedSyncWritePrepareIsNotLoaded) {
859878
EXPECT_EQ(1, vb->getNumItems());
860879
}
861880
resetEngineAndWarmup();
881+
EXPECT_EQ(0, store->getEPEngine().getEpStats().warmedUpPrepares);
882+
EXPECT_EQ(0,
883+
store->getEPEngine()
884+
.getEpStats()
885+
.warmupItemsVisitedWhilstLoadingPrepares);
862886

863887
// Should load one item into memory - committed value.
864888
auto vb = engine->getVBucket(vbid);
@@ -898,6 +922,11 @@ TEST_P(DurabilityWarmupTest, ReplicationTopologyMissing) {
898922
vbid, vbstate, VBStatePersist::VBSTATE_PERSIST_WITH_COMMIT);
899923

900924
resetEngineAndWarmup();
925+
EXPECT_EQ(0, store->getEPEngine().getEpStats().warmedUpPrepares);
926+
EXPECT_EQ(0,
927+
store->getEPEngine()
928+
.getEpStats()
929+
.warmupItemsVisitedWhilstLoadingPrepares);
901930

902931
// Check topology is empty.
903932
auto vb = engine->getKVBucket()->getVBucket(vbid);
@@ -943,6 +972,11 @@ TEST_P(DurabilityWarmupTest, WarmupCommit) {
943972
// Because we bypassed KVBucket::set the HPS/HCS will be incorrect and fail
944973
// the pre/post warmup checker, so disable the checker for this test.
945974
resetEngineAndWarmup().disable();
975+
EXPECT_EQ(1, store->getEPEngine().getEpStats().warmedUpPrepares);
976+
EXPECT_EQ(1,
977+
store->getEPEngine()
978+
.getEpStats()
979+
.warmupItemsVisitedWhilstLoadingPrepares);
946980

947981
vb = store->getVBucket(vbid);
948982
ASSERT_TRUE(vb);

0 commit comments

Comments
 (0)