Skip to content

Commit 5e5d224

Browse files
committed
MB-39292: 1/n set_collections persist current manifest
set_collections only allows 'forward' progress after checking that the new manifest is a successor of the current manifest, however after warm-up we have to accept whatever we are given. This commit updates set_collections so that for persistent buckets the new manifest is first stored in the database directory and then we update from that manifest on-success, now warm-up brings the manifest back from storage and we can validate that further updates are a valid successor. Ephemeral buckets just update with no background task. This patch stores using Manifest::toJSON and reloads JSON using Manifest's existing construction with no integrity checking. Change-Id: Ie548e31f56c4847ecf4c0c4ad866544f6bcd2a5c Reviewed-on: http://review.couchbase.org/c/kv_engine/+/137161 Reviewed-by: Dave Rigby <[email protected]> Tested-by: Jim Walker <[email protected]>
1 parent fb7e114 commit 5e5d224

29 files changed

+668
-66
lines changed

daemon/protocol/mcbp/collections_set_manifest_executor.cc

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -29,6 +29,6 @@ void collections_set_manifest_executor(Cookie& cookie) {
2929
val.size()};
3030
const auto ret = connection.getBucketEngine().set_collection_manifest(
3131
&cookie, jsonBuffer);
32-
Expects(ret != cb::engine_errc::would_block);
32+
3333
handle_executor_status(cookie, ret);
34-
}
34+
}

engines/ep/CMakeLists.txt

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -229,9 +229,10 @@ SET(CONFIG_SOURCE src/configuration.cc
229229
SET(COLLECTIONS_SOURCE src/collections/collections_types.cc
230230
src/collections/eraser_context.cc
231231
src/collections/flush.cc
232+
src/collections/kvstore.cc
232233
src/collections/manager.cc
233234
src/collections/manifest.cc
234-
src/collections/kvstore.cc
235+
src/collections/persist_manifest_task.cc
235236
${CMAKE_CURRENT_BINARY_DIR}/src/collections/events_generated.h
236237
${CMAKE_CURRENT_BINARY_DIR}/src/collections/kvstore_generated.h
237238
src/collections/scan_context.cc

engines/ep/src/collections/collections_types.h

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -44,6 +44,10 @@ const char* const ScopeEventDebugTag = "_scope";
4444
// Couchstore private file name for manifest data
4545
const char CouchstoreManifest[] = "_local/collections_manifest";
4646

47+
// Name of file where the manifest will be kept on persistent buckets
48+
const char* const ManifestFile = "collections.manifest";
49+
static std::string_view ManifestFileName(ManifestFile);
50+
4751
// Length of the string excluding the zero terminator (i.e. strlen)
4852
const size_t CouchstoreManifestLen = sizeof(CouchstoreManifest) - 1;
4953

engines/ep/src/collections/manager.cc

Lines changed: 89 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,7 @@
1919
#include "bucket_logger.h"
2020
#include "collections/flush.h"
2121
#include "collections/manifest.h"
22+
#include "collections/persist_manifest_task.h"
2223
#include "collections/vbucket_manifest_handles.h"
2324
#include "ep_engine.h"
2425
#include "kv_bucket.h"
@@ -35,11 +36,49 @@ Collections::Manager::Manager() {
3536
}
3637

3738
cb::engine_error Collections::Manager::update(KVBucket& bucket,
38-
std::string_view manifest) {
39+
std::string_view manifestString,
40+
const void* cookie) {
41+
auto lockedUpdateCookie = updateInProgress.wlock();
42+
if (*lockedUpdateCookie != nullptr && *lockedUpdateCookie != cookie) {
43+
// log this as it's very unexpected, only ever 1 manager
44+
return cb::engine_error(
45+
cb::engine_errc::too_busy,
46+
"An update is already in-progress for another cookie:" +
47+
std::to_string(uintptr_t(*lockedUpdateCookie)));
48+
}
49+
50+
// Now getEngineSpecific - if that is null this is a new command, else
51+
// it's the IO complete command
52+
void* manifest = bucket.getEPEngine().getEngineSpecific(cookie);
53+
54+
if (manifest) {
55+
// I/O complete path?
56+
if (!*lockedUpdateCookie) {
57+
// This can occur for a DCP connection, cookie is 'reserved'.
58+
EP_LOG_WARN(
59+
"Collections::Manager::update aborted as we have found a "
60+
"manifest:{} but updateInProgress:{}",
61+
manifest,
62+
*lockedUpdateCookie);
63+
return cb::engine_error(cb::engine_errc::failed,
64+
"Collections::Manager::update failure");
65+
}
66+
67+
// Final stage of update now happening, clear the cookie and engine
68+
// specific so the next update can start after this one returns.
69+
*lockedUpdateCookie = nullptr;
70+
bucket.getEPEngine().storeEngineSpecific(cookie, nullptr);
71+
72+
// Take ownership back of the manifest so it destructs/frees on return
73+
std::unique_ptr<Manifest> newManifest(
74+
reinterpret_cast<Manifest*>(manifest));
75+
return updateFromIOComplete(bucket, std::move(newManifest), cookie);
76+
}
77+
78+
// Construct a new Manifest (ctor will throw if JSON was illegal)
3979
std::unique_ptr<Manifest> newManifest;
40-
// Construct a newManifest (will throw if JSON was illegal)
4180
try {
42-
newManifest = std::make_unique<Manifest>(manifest);
81+
newManifest = std::make_unique<Manifest>(manifestString);
4382
} catch (std::exception& e) {
4483
EP_LOG_WARN(
4584
"Collections::Manager::update can't construct manifest "
@@ -48,35 +87,54 @@ cb::engine_error Collections::Manager::update(KVBucket& bucket,
4887
return cb::engine_error(
4988
cb::engine_errc::invalid_arguments,
5089
"Collections::Manager::update manifest json invalid:" +
51-
std::string(manifest));
90+
std::string(manifestString));
5291
}
5392

54-
// Get upgrade access to the manifest for the initial part of the update
55-
// This gives shared access (other readers allowed) but would block other
56-
// attempts to get upgrade access.
93+
// Next compare with current
94+
// First get an upgrade lock (which is initially read)
95+
// Persistence will schedule a task and drop the lock whereas ephemeral will
96+
// upgrade from read to write locking and do the update
5797
auto current = currentManifest.ulock();
58-
59-
// ignore being told the same manifest
60-
if (current->getUid() == newManifest->getUid() &&
61-
*newManifest == *current) {
62-
return cb::engine_error(
63-
cb::engine_errc::success,
64-
"Collections::Manager::update ignored matching manifest");
65-
}
66-
67-
// any other uid, and the incoming manifest must be a valid succession
6898
auto isSuccessorResult = current->isSuccessor(*newManifest);
6999
if (isSuccessorResult.code() != cb::engine_errc::success) {
70100
return isSuccessorResult;
71101
}
72102

103+
// New manifest is a legal successor the update is going ahead.
104+
// Ephemeral bucket can update now, Persistent bucket on wake-up from
105+
// successful run of the PeristManifestTask.
106+
cb::engine_errc status = cb::engine_errc::success;
107+
if (!bucket.maybeScheduleManifestPersistence(cookie, newManifest)) {
108+
// Ephemeral case, apply immediately
109+
return applyNewManifest(bucket, current, std::move(newManifest));
110+
} else {
111+
*lockedUpdateCookie = cookie;
112+
status = cb::engine_errc::would_block;
113+
}
114+
115+
return cb::engine_error(status,
116+
"Collections::Manager::update part one complete");
117+
}
118+
119+
cb::engine_error Collections::Manager::updateFromIOComplete(
120+
KVBucket& bucket,
121+
std::unique_ptr<Manifest> newManifest,
122+
const void* cookie) {
123+
auto current = currentManifest.ulock(); // Will update to newManifest
124+
return applyNewManifest(bucket, current, std::move(newManifest));
125+
}
126+
127+
// common to ephemeral/persistent, this does the update
128+
cb::engine_error Collections::Manager::applyNewManifest(
129+
KVBucket& bucket,
130+
folly::Synchronized<Manifest>::UpgradeLockedPtr& current,
131+
std::unique_ptr<Manifest> newManifest) {
73132
auto updated = updateAllVBuckets(bucket, *newManifest);
74133
if (updated.has_value()) {
75134
return cb::engine_error(
76135
cb::engine_errc::cannot_apply_collections_manifest,
77136
"Collections::Manager::update aborted on " +
78-
updated->to_string() +
79-
", cannot apply:" + std::string(manifest));
137+
updated->to_string() + ", cannot apply to vbuckets");
80138
}
81139

82140
// Now switch to write locking and change the manifest. The lock is
@@ -255,6 +313,18 @@ void Collections::Manager::addScopeStats(KVBucket& bucket,
255313
currentManifest.rlock()->addScopeStats(bucket, cookie, add_stat);
256314
}
257315

316+
void Collections::Manager::warmupLoadManifest(const std::string& dbpath) {
317+
auto rv = Collections::PersistManifestTask::tryAndLoad(dbpath);
318+
if (rv) {
319+
EP_LOG_INFO(
320+
"Collections::Manager::warmupLoadManifest: loaded uid:{:#x}",
321+
rv->getUid());
322+
*currentManifest.wlock() = std::move(*rv);
323+
} else {
324+
EP_LOG_INFO("Collections::Manager::warmupLoadManifest: nothing loaded");
325+
}
326+
}
327+
258328
/**
259329
* Perform actions for a completed warmup - currently check if any
260330
* collections are 'deleting' and require erasing retriggering.

engines/ep/src/collections/manager.h

Lines changed: 38 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -93,14 +93,20 @@ class Manager {
9393
/**
9494
* Update the bucket with the latest JSON collections manifest.
9595
*
96-
* Locks the Manager and prevents concurrent updates, concurrent updates
97-
* are failed with TMPFAIL as in reality there should be 1 admin connection.
96+
* For persistent buckets, this will store the manifest first and then use
97+
* IO complete success to perform the apply the new manifest. Ephemeral
98+
* buckets the update is 'immediate'.
99+
*
100+
* Note that a mutex ensures that this update method works serially, no
101+
* concurrent admin updates allowed.
98102
*
99103
* @param bucket the bucket receiving a set-collections command.
100104
* @param manifest the json manifest form a set-collections command.
101105
* @returns engine_error indicating why the update failed.
102106
*/
103-
cb::engine_error update(KVBucket& bucket, std::string_view manifest);
107+
cb::engine_error update(KVBucket& bucket,
108+
std::string_view manifest,
109+
const void* cookie);
104110

105111
/**
106112
* Retrieve the current manifest
@@ -168,6 +174,11 @@ class Manager {
168174
const void* cookie,
169175
const AddStatFn& add_stat) const;
170176

177+
/**
178+
* Called from bucket warmup - see if we have a manifest to resume from
179+
*/
180+
void warmupLoadManifest(const std::string& dbpath);
181+
171182
/**
172183
* Perform actions for a completed warmup - currently check if any
173184
* collections are 'deleting' and require erasing retriggering.
@@ -215,6 +226,27 @@ class Manager {
215226
std::optional<Vbid> updateAllVBuckets(KVBucket& bucket,
216227
const Manifest& newManifest);
217228

229+
/**
230+
* This method handles the IO complete path and allows ::update to
231+
* correctly call applyNewManifest
232+
*/
233+
cb::engine_error updateFromIOComplete(KVBucket& bucket,
234+
std::unique_ptr<Manifest> newManifest,
235+
const void* cookie);
236+
237+
/**
238+
* Final stage of the manifest update is to roll the new manifest out to
239+
* the active vbuckets.
240+
*
241+
* @param bucket The bucket to work on
242+
* @param current The locked, current manifest (which will be replaced)
243+
* @param newManifest The new manifest to apply
244+
*/
245+
cb::engine_error applyNewManifest(
246+
KVBucket& bucket,
247+
folly::Synchronized<Manifest>::UpgradeLockedPtr& current,
248+
std::unique_ptr<Manifest> newManifest);
249+
218250
/**
219251
* Get a copy of stats which are relevant at a per-collection level.
220252
* The copied stats can then be used to format stats for one or more
@@ -294,6 +326,9 @@ class Manager {
294326

295327
/// Store the most recent (current) manifest received
296328
folly::Synchronized<Manifest> currentManifest;
329+
330+
/// Serialise updates to the manifest (set_collections core)
331+
folly::Synchronized<const void*> updateInProgress{nullptr};
297332
};
298333

299334
std::ostream& operator<<(std::ostream& os, const Manager& manager);
Lines changed: 80 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,80 @@
1+
Aim: Better manifest comparison - full path checks on update
2+
3+
E.g. If I currently have /scope1/c1 and those have ids sid:8 and cid:8 I want to detect:
4+
5+
/scope2/c1 (sid:9 and sid:8) // collection moved to a new scope
6+
/scope2/c1 (sid:8 and sid:8) // scope changed name
7+
/scope1/c2 (8 and 8) // collection name changed
8+
9+
10+
Problem(s)
11+
12+
1) vbucket_manifest only has IDs
13+
So if we try and do these detections during the vb.update code we have no name
14+
for comparison and cannot detect any of the above cases
15+
16+
2) Collections::Manifest stores the last recevied manifest from ns_Server, we
17+
can apply detections there as it has scope and collection names - but if we
18+
warmup this object is 'empty'.
19+
20+
3) Leading into how to solve warmup issue, we don't persist scope name, only id
21+
in _local
22+
23+
24+
Per vbucket does store collection names 1) in _local and 2) in system events
25+
Per vbucket stores scope names in 1) system event only (nothing in _local)
26+
27+
28+
Ideas:
29+
30+
I think Collections::Manifest A compare Collections::Manifest B is something to consider and means we need a way to warm-up something comparable. Could also mean some weird stuff from replica?
31+
32+
33+
Weird stuff:
34+
35+
Node {
36+
Active VB1: Thinks /scope1/c1
37+
Replica VB2: Thinks /scope2/c1 (as it got that over replication) - is this possible in the quorum case?
38+
}
39+
or
40+
41+
Node: {
42+
Active VB1: Thinks /scope1/c1 (sid:8, cid:8)
43+
Replica VB2: Thinks /scope2/c2 (sid:8, cid:8) (as it got that over replication) - is this possible in the quorum case?
44+
}
45+
46+
promote VB2 to active
47+
48+
new data structure that can be populated from warmup (_local) and allows storage of {id} ?? this is the manifest right?
49+
50+
hashes?
51+
52+
53+
Options:
54+
55+
A) Be defensive for *every* manifest update - this means KV
56+
57+
a) checks manifest.uid is incrementing
58+
b) checks any scope/collections it knows, the immutable properties are equal (i.e. id == id && name == name)
59+
60+
Reject if the incoming manifest fails checks
61+
Apply manifest otherwise
62+
63+
This has challenges in terms of comparison (b) - only the 'bucket' Manifest stores names in-memory and that data is populated from ns_server. Following warmup we cannot validate the input.
64+
65+
Solutions?
66+
67+
Store manifest:
68+
Persist each manifest (we always trust the first Manifest when KV goes from uid 0 to uid 0+). Means storing a new file and possibly changing the collection update logic - i.e. steps become newfile.onperist(update vbuckets to new manifest).
69+
A new file hmm, flatbuffers + crc32c , not a couchstore file, i guess we already have extra files such as access log - however if this file gets damaged, warmup fails.
70+
71+
Use vbuckets persisted data:
72+
Re-assemble a manifest from the active vbuckets at warmup. We could warmup with vb 0 being ahead of vb 1 (if we crashed before vb1 persisted a collection change), not necessarily a problem - given though that an update only ever begins if the manifest is progressing, we use the greatest manifest found? However we support a push that can create/drop many collections and these get split over many flush batches, only the final flush updates the manifest id, we could get vbuckets reporting equal manifest uid but being different to each other.
73+
This is getting messy and the partial persistence issue a problem.
74+
75+
Overall though do we need option A) trust that ns_server updates (non forced) are compliant.
76+
77+
B) Only care for 'force' update (quorum was forcefully removed and we may be given a manifest which changes collection state in some of the unusual ways identified earlier)
78+
79+
In force update case, we can 'afford' to take our time? I.e. read system events or _local data to get names - epehemeral will read system events, persistent can just do one read if it wants - scope names are not on disk.
80+

0 commit comments

Comments
 (0)