-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Refactor IndexRouting.ExtractFromSource to be an abstract class #135206
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor IndexRouting.ExtractFromSource to be an abstract class #135206
Conversation
With implementations IndexRouting.ExtractFromSource.ForRoutingPath and IndexRouting.ExtractFromSource.ForIndexDimensions. This addresses review comments from elastic#132566.
Pinging @elastic/es-distributed-indexing (Team:Distributed Indexing) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
server/src/main/java/org/elasticsearch/cluster/routing/IndexRouting.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/cluster/routing/IndexRouting.java
Outdated
Show resolved
Hide resolved
protected int hashSource(IndexRequest indexRequest) { | ||
BytesRef tsid = indexRequest.tsid(); | ||
if (tsid == null) { | ||
tsid = buildTsid(indexRequest.getContentType(), indexRequest.indexSource().bytes()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we do this (and the following line) unconditionally in preProcess
instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The tsid == null
condition is needed as the tsid
may already be provided. See also #134982. Also, I think it makes sense to execute this in the context of hashSource
, as that's what buildTsid
is doing. There are several places where we assume that source parsing happens within indexShard
and I think it makes sense that both strategies do that in the same method rather than different ones (preProcess
vs indexShard
).
The failing tests revealed a bug: when applying a translog operation, we don't store the Two approaches I see to fix this:
I think I'd prefer the first option but there may be dragons I'm not aware of. One thing that would make the second option very difficult is that in the @henningandersen WDYT? Click to expand stack trace...
|
In 9551184, I've tried another solution where the |
Pinging @elastic/es-storage-engine (Team:StorageEngine) |
In 9c680f0, I've reverted changes specific to the translog and create the |
test/framework/src/main/java/org/elasticsearch/test/InternalTestCluster.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/index/mapper/TimeSeriesIdFieldMapper.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a stray comment.
// the tsid is normally set on the coordinating node during shard routing and passed to the data node via the index request | ||
// but when applying a translog operation, shard routing is not happening, and we have to create the tsid from source | ||
tsid = forIndexDimensions.buildTsid(source.getXContentType(), source.source()); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally we would have
else {
assert tsid == forIndexDimensions.buildTsid(source.getXContentType(), source.source());
}
(for when dimensions are in use), but I guess this is not always upheld.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When it comes to replaying translog operations (which include the id but not the tsid), the effect of this check is similar because the id is based on the tsid, and it also checks in the ForRoutingPath
case:
elasticsearch/server/src/main/java/org/elasticsearch/index/mapper/TsidExtractingIdFieldMapper.java
Lines 86 to 97 in 40f3a1c
if (context.sourceToParse().id() != null && false == context.sourceToParse().id().equals(id)) { | |
throw new IllegalArgumentException( | |
String.format( | |
Locale.ROOT, | |
"_id must be unset or set to [%s] but was [%s] because [%s] is in time_series mode", | |
id, | |
context.sourceToParse().id(), | |
context.indexSettings().getIndexMetadata().getIndex().getName() | |
) | |
); | |
} | |
context.id(id); |
Henning and I have discussed a potential issue where the _id stored in the translog may differ when replaying the operation if a new dimension field gets added to the mappings in the meantime. Note that this isn't a new issue related to this change or to Thinking about it some more, maybe this can’t actually happen. IINM, we write the translog entry after the index operation, and after dynamic mapping updates have been processed. If that’s true, we can be sure that any fields present in the source of a translog entry are in the mappings already. Since you can’t make an existing field a dimension after the fact and because we route time series documents to backing indices based on the If there were any differences, we'd fail ingestion due to this check: elasticsearch/server/src/main/java/org/elasticsearch/index/mapper/TsidExtractingIdFieldMapper.java Lines 86 to 97 in 40f3a1c
Unrelated to this PR or the new |
With implementations
IndexRouting.ExtractFromSource.ForRoutingPath
andIndexRouting.ExtractFromSource.ForIndexDimensions
.This addresses review comments from #132566 (comment).