Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Drafts a change that adds the tsid to the translog.
The primary motivation for this is to avoid a specific failure scenario where replaying the translog when mappings have changed in the meantime would lead to an index that's recovered from the translog to diverge from the primary. As a positive side-effect, recovery for time_series indices should see performance improvements as we wouldn't need to re-calculate the _tsid during recovery.
I've added a test case (
IndexShardTests#testRecoverFromTranslogWhenDimensionsChange
) for this specific scenario, which fails on main and succeeds with the changes proposed in this PR.This is similar to the "Scenario B" I've described here: #135402 (comment).
After spiking on the approach to add the tsid to the translog, I'm not sure anymore if that's the right approach. At least, it would require more changes than I had originally anticipated. Not only do we need to change the translog itself, but this also affects recovery from an index (
SearchBasedChangesSnapshot
), where the _source and the _id field are fetched from the Lucene index via a query. From a first look, the _id field is fetched from stored field, which are not available for time_series indices. So I think we would need to synthesize the _id as well as add support for reading the doc_values for _tsid.At this point, I think it's a good idea to pause and double-check that this is indeed the approach we want to pursue.
cc @henningandersen @kkrik-es