-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Check if merge is aborted before executing integrity checks and access fields and documents #132450
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Relates elastic#107513 Relates ES-11749 Relates ES-11384
Pinging @elastic/es-distributed-indexing (Team:Distributed Indexing) |
Hi @tlrx, I've created a changelog YAML for you. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, just a few minor comments.
|
||
@Override | ||
public void document(int docID, StoredFieldVisitor visitor) throws IOException { | ||
if (visitedDocs % 100L == 0L) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if either modulo 128 or using a bit and would be slightly faster? Or just reseting the counter when checking.
server/src/internalClusterTest/java/org/elasticsearch/index/engine/CheckAbortedMergesIT.java
Show resolved
Hide resolved
); | ||
|
||
assertBusy(() -> { | ||
if (checkAbortedMerges) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if/else can be removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll keep them but I'll revert this:
final boolean checkAbortedMerges = false;
randomBoolean();
to
final boolean checkAbortedMerges = randomBoolean();
server/src/internalClusterTest/java/org/elasticsearch/index/engine/CheckAbortedMergesIT.java
Outdated
Show resolved
Hide resolved
|
||
void ensureNotAborted() throws MergeAbortedException; | ||
|
||
void ensureNotAborted(CheckedRunnable<IOException> runnable, String method) throws IOException; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this variant necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean, it looks to me that the above void ensureNotAborted() throws MergeAbortedException;
is sufficient.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The IT is a bit of a stretch, IMO. I'm not sure it really adds to coverage. A simple unit test is sufficient (assuming that throwing in those methods is OK from Lucene's POV).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. I left a question and some minor suggestions 👍
// Number of time a field has been accessed during merges | ||
private final AtomicLong mergedFieldsCount = new AtomicLong(0L); | ||
|
||
// Number of time a doc has been accessed during merges |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: // Number of times
internalCluster().startMasterOnlyNode(); | ||
var nodeA = internalCluster().startDataOnlyNode(); | ||
|
||
var pluginA = internalCluster().getInstance(PluginsService.class, nodeA) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: maybe we can name this nodeAMergePolicy
instead?
ensureGreen(indexName); | ||
|
||
var indexServiceB = internalCluster().getInstance(IndicesService.class, nodeB).indexService(resolveIndex(indexName)); | ||
assertBusy(() -> assertThat(indexServiceB.hasShard(0), equalTo(true))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: isn't this implicitly asserted by ensureGreen
? it waits for relocations to complete, but maybe it's good to have the explicit assertion.
// Only the first integrity check is completed, the following ones should have been aborted | ||
assertThat(pluginA.checkIntegrityCount.get(), equalTo(1L)); | ||
} else { | ||
assertThat(pluginA.mergedDocsCount.get(), greaterThan(0L)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just for my understanding, Lucene would check the aborted flag likely at the beginning of the merge and that's why we see the same numbers here? wouldn't it make sense to let the merge make some progress so we actually see/check the effects of this PR code?
Thanks for the feedback everyone. I'd like to adjust the PR and improve test before merging, mostly:
|
This change introduces a new merge policy wrapper that checks if the merge has been aborted before accessing fields/docs to merge and before executing data integrity checks. The goal here is to address #107513 by aborting the merge earlier than the Lucene
SegmentMerger
would do.Relates #107513
Relates ES-11749
Relates ES-11384