-
Notifications
You must be signed in to change notification settings - Fork 111
Online Indexer: replace the synchronized runner with a heartbeat #3530
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
7a9b508
791c4bf
5a43fbc
9411df8
440d419
1fd80b0
8dbbc3d
3eb92b5
9b3a91f
3f7340e
f102a36
1ceda45
f05a119
85db65a
274988b
0980757
97d02b6
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Large diffs are not rendered by default.
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,184 @@ | ||
/* | ||
* IndexingHeartbeat.java | ||
* | ||
* This source file is part of the FoundationDB open source project | ||
* | ||
* Copyright 2015-2025 Apple Inc. and the FoundationDB project authors | ||
* | ||
* Licensed under the Apache License, Version 2.0 (the "License"); | ||
* you may not use this file except in compliance with the License. | ||
* You may obtain a copy of the License at | ||
* | ||
* http://www.apache.org/licenses/LICENSE-2.0 | ||
* | ||
* Unless required by applicable law or agreed to in writing, software | ||
* distributed under the License is distributed on an "AS IS" BASIS, | ||
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
* See the License for the specific language governing permissions and | ||
* limitations under the License. | ||
*/ | ||
|
||
package com.apple.foundationdb.record.provider.foundationdb; | ||
|
||
import com.apple.foundationdb.KeyValue; | ||
import com.apple.foundationdb.async.AsyncIterator; | ||
import com.apple.foundationdb.async.AsyncUtil; | ||
import com.apple.foundationdb.record.IndexBuildProto; | ||
import com.apple.foundationdb.record.logging.LogMessageKeys; | ||
import com.apple.foundationdb.record.metadata.Index; | ||
import com.apple.foundationdb.synchronizedsession.SynchronizedSessionLockedException; | ||
import com.google.protobuf.InvalidProtocolBufferException; | ||
|
||
import javax.annotation.Nonnull; | ||
import java.util.HashMap; | ||
import java.util.Map; | ||
import java.util.UUID; | ||
import java.util.concurrent.CompletableFuture; | ||
import java.util.concurrent.atomic.AtomicInteger; | ||
|
||
public class IndexingHeartbeat { | ||
// [prefix, indexerId] -> [indexing-type, genesis time, heartbeat time] | ||
final UUID indexerId; | ||
final String info; | ||
final long genesisTimeMilliseconds; | ||
final long leaseLength; | ||
final boolean allowMutual; | ||
|
||
public IndexingHeartbeat(final UUID indexerId, String info, long leaseLength, boolean allowMutual) { | ||
this.indexerId = indexerId; | ||
this.info = info; | ||
this.leaseLength = leaseLength; | ||
this.allowMutual = allowMutual; | ||
this.genesisTimeMilliseconds = nowMilliseconds(); | ||
} | ||
|
||
public void updateHeartbeat(@Nonnull FDBRecordStore store, @Nonnull Index index) { | ||
byte[] key = IndexingSubspaces.indexheartbeatSubspace(store, index, indexerId).pack(); | ||
byte[] value = IndexBuildProto.IndexBuildHeartbeat.newBuilder() | ||
.setInfo(info) | ||
.setGenesisTimeMilliseconds(genesisTimeMilliseconds) | ||
.setHeartbeatTimeMilliseconds(nowMilliseconds()) | ||
.build().toByteArray(); | ||
store.ensureContextActive().set(key, value); | ||
} | ||
|
||
public CompletableFuture<Void> checkAndUpdateHeartbeat(@Nonnull FDBRecordStore store, @Nonnull Index index) { | ||
// complete exceptionally if non-mutual, other exists | ||
if (allowMutual) { | ||
updateHeartbeat(store, index); | ||
return AsyncUtil.DONE; | ||
} | ||
|
||
final AsyncIterator<KeyValue> iterator = heartbeatsIterator(store, index); | ||
final long now = nowMilliseconds(); | ||
return AsyncUtil.whileTrue(() -> iterator.onHasNext() | ||
.thenApply(hasNext -> { | ||
if (!hasNext) { | ||
return false; | ||
} | ||
final KeyValue kv = iterator.next(); | ||
try { | ||
final UUID otherIndexerId = heartbeatKeyToIndexerId(store, index, kv.getKey()); | ||
if (!otherIndexerId.equals(this.indexerId)) { | ||
final IndexBuildProto.IndexBuildHeartbeat otherHeartbeat = IndexBuildProto.IndexBuildHeartbeat.parseFrom(kv.getValue()); | ||
final long age = now - otherHeartbeat.getHeartbeatTimeMilliseconds(); | ||
if (age > 0 && age < leaseLength) { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Should this clear the heartbeat if it is expired? If the instance crashes does the heartbeat stick around until you complete an index build (successfully, or with an exception)? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. At this point I believe that every process clears its own heartbeat. Instance crashes should be too rare to cause significant DB space leak (or so I hope). If needed, a cleanup can be triggered by the user with the Indexer's new Saying that, we can clear all heartbeats in There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ✅ Clearing all heartbeats with an index becomes readable. |
||
// For practical reasons, this exception is backward compatible to the Synchronized Lock one | ||
throw new SynchronizedSessionLockedException("Failed to initialize the session because of an existing session in progress") | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Probably reasonable to keep this exception as is, for backwards-compatibility, but worth a comment. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I had added a comment. I know about some users that have special handling on this exception, it might be worth keeping it backward compatible. |
||
.addLogInfo(LogMessageKeys.INDEXER_ID, indexerId) | ||
.addLogInfo(LogMessageKeys.EXISTING_INDEXER_ID, otherIndexerId) | ||
.addLogInfo(LogMessageKeys.AGE_MILLISECONDS, age) | ||
.addLogInfo(LogMessageKeys.TIME_LIMIT_MILLIS, leaseLength); | ||
} | ||
} | ||
} catch (InvalidProtocolBufferException e) { | ||
throw new RuntimeException(e); | ||
} | ||
return true; | ||
})) | ||
.thenApply(ignore -> { | ||
updateHeartbeat(store, index); | ||
return null; | ||
}); | ||
} | ||
|
||
public void clearHeartbeat(@Nonnull FDBRecordStore store, @Nonnull Index index) { | ||
store.ensureContextActive().clear(IndexingSubspaces.indexheartbeatSubspace(store, index, indexerId).pack()); | ||
} | ||
|
||
public static void clearAllHeartbeats(@Nonnull FDBRecordStore store, @Nonnull Index index) { | ||
store.ensureContextActive().clear(IndexingSubspaces.indexheartbeatSubspace(store, index).range()); | ||
} | ||
|
||
public static CompletableFuture<Map<UUID, IndexBuildProto.IndexBuildHeartbeat>> getIndexingHeartbeats(FDBRecordStore store, Index index, int maxCount) { | ||
final Map<UUID, IndexBuildProto.IndexBuildHeartbeat> ret = new HashMap<>(); | ||
final AsyncIterator<KeyValue> iterator = heartbeatsIterator(store, index); | ||
final AtomicInteger iterationCount = new AtomicInteger(0); | ||
return AsyncUtil.whileTrue(() -> iterator.onHasNext() | ||
.thenApply(hasNext -> { | ||
if (!hasNext) { | ||
return false; | ||
} | ||
if (maxCount > 0 && maxCount < iterationCount.incrementAndGet()) { | ||
return false; | ||
} | ||
final KeyValue kv = iterator.next(); | ||
final UUID otherIndexerId = heartbeatKeyToIndexerId(store, index, kv.getKey()); | ||
try { | ||
final IndexBuildProto.IndexBuildHeartbeat otherHeartbeat = IndexBuildProto.IndexBuildHeartbeat.parseFrom(kv.getValue()); | ||
ret.put(otherIndexerId, otherHeartbeat); | ||
} catch (InvalidProtocolBufferException e) { | ||
// Let the caller know about this invalid heartbeat. | ||
ret.put(otherIndexerId, IndexBuildProto.IndexBuildHeartbeat.newBuilder() | ||
.setInfo("<< Invalid Heartbeat >>") | ||
.build()); | ||
} | ||
return true; | ||
})) | ||
.thenApply(ignore -> ret); | ||
} | ||
|
||
public static CompletableFuture<Integer> clearIndexingHeartbeats(@Nonnull FDBRecordStore store, @Nonnull Index index, long minAgenMilliseconds, int maxIteration) { | ||
final AsyncIterator<KeyValue> iterator = heartbeatsIterator(store, index); | ||
final AtomicInteger deleteCount = new AtomicInteger(0); | ||
final AtomicInteger iterationCount = new AtomicInteger(0); | ||
final long now = nowMilliseconds(); | ||
return AsyncUtil.whileTrue(() -> iterator.onHasNext() | ||
.thenApply(hasNext -> { | ||
if (!hasNext) { | ||
return false; | ||
} | ||
if (maxIteration > 0 && maxIteration < iterationCount.incrementAndGet()) { | ||
return false; | ||
} | ||
final KeyValue kv = iterator.next(); | ||
boolean shouldRemove; | ||
try { | ||
final IndexBuildProto.IndexBuildHeartbeat otherHeartbeat = IndexBuildProto.IndexBuildHeartbeat.parseFrom(kv.getValue()); | ||
// remove heartbeat if too old | ||
shouldRemove = now + minAgenMilliseconds >= otherHeartbeat.getHeartbeatTimeMilliseconds(); | ||
} catch (InvalidProtocolBufferException e) { | ||
// remove heartbeat if invalid | ||
shouldRemove = true; | ||
} | ||
if (shouldRemove) { | ||
store.ensureContextActive().clear(kv.getKey()); | ||
deleteCount.incrementAndGet(); | ||
} | ||
return true; | ||
})) | ||
.thenApply(ignore -> deleteCount.get()); | ||
} | ||
|
||
private static AsyncIterator<KeyValue> heartbeatsIterator(FDBRecordStore store, Index index) { | ||
return store.getContext().ensureActive().snapshot().getRange(IndexingSubspaces.indexheartbeatSubspace(store, index).range()).iterator(); | ||
} | ||
|
||
private static UUID heartbeatKeyToIndexerId(FDBRecordStore store, Index index, byte[] key) { | ||
return IndexingSubspaces.indexheartbeatSubspace(store, index).unpack(key).getUUID(0); | ||
} | ||
|
||
private static long nowMilliseconds() { | ||
return System.currentTimeMillis(); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like a bit of the code could be more generic. Particularly, I see that the
indexingMethod
in the heartbeat is not read by anything, indicating that is more for logging purposes, and could be replaced by a string.If you pull out the logic for what the limit should be you could have this completely independent of indexing, and provide a raw subspace to store the information. Doing so would make some testing scenarios easier, and allow for re-use in different scenarios.
It would also allow for testing some of this logic with
ThrottledRetryingIterator
, and eventually could allow for aLimitedConcurrencyThrottledIterator
which would be analogous toSynchronizedSession
. That wouldn't be immediately relevant, but would allow for more controlled testing of some of the behaviors as you would be able to have better control over when individual futures complete, and run more interesting code in the transactions (such as fetching the list of existing heartbeats yourself).There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Going in a completely different direction, would it make sense to combine the heartbeats more with the type stamp?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The
indexingMethod
is for:a) Debugging (if we'll ever find a leftover heartbeat).
b) Providing extra information when heartbeats are queried. Hopefully that will enable better decision making in any automated system that monitors indexing processes (decisions such as: adding a mutual indexing thread, converting to mutual, stopping a slow by-source-index session and restart indexing, etc.)
I am not sure that I understand your
LimitedConcurrencyThrottledIterator
idea.The type stamp is complementary to the heartbeat. Initially I thought about making a copy of the type stamp to each heartbeat, but it seems easier - for integrity reasons - to keep the type stamp as is.