Add tsid to the translog #135596

felixbarny · 2025-09-29T07:11:24Z

Drafts a change that adds the tsid to the translog.

The primary motivation for this is to avoid a specific failure scenario where replaying the translog when mappings have changed in the meantime would lead to an index that's recovered from the translog to diverge from the primary. As a positive side-effect, recovery for time_series indices should see performance improvements as we wouldn't need to re-calculate the _tsid during recovery.

I've added a test case (IndexShardTests#testRecoverFromTranslogWhenDimensionsChange) for this specific scenario, which fails on main and succeeds with the changes proposed in this PR.

This is similar to the "Scenario B" I've described here: #135402 (comment).

After spiking on the approach to add the tsid to the translog, I'm not sure anymore if that's the right approach. At least, it would require more changes than I had originally anticipated. Not only do we need to change the translog itself, but this also affects recovery from an index (SearchBasedChangesSnapshot), where the _source and the _id field are fetched from the Lucene index via a query. From a first look, the _id field is fetched from stored field, which are not available for time_series indices. So I think we would need to synthesize the _id as well as add support for reading the doc_values for _tsid.

At this point, I think it's a good idea to pause and double-check that this is indeed the approach we want to pursue.

cc @henningandersen @kkrik-es

henningandersen · 2025-09-29T08:50:25Z

this is indeed the approach we want to pursue.

Yeah, I am not too sure of that either. At a high level, I think mappings need to be no more updateable than being able to replay translog, peer recovery, CCR, retry etc. And this seems to try to solve a mapping update problem in a way where we think it does not affect translog (and some of the others).

A stronger validation or other clarification that any mapping update will be safe towards any of those replays seems more directly addressing this to me.

felixbarny · 2025-09-29T09:07:54Z

A stronger validation or other clarification that any mapping update will be safe towards any of those replays seems more directly addressing this to me.

That makes sense to me. However, stronger guarantees on the mapping side may have an impact on backwards compatibility. But that's a separate discussion.

While this change may still be something to consider in the future, it should only be a performance optimization rather than essential for correctness. I'll close this for now but we can keep discussing and pick this back up at any point if needed.

Add tsid to the translog

6d5cf31

felixbarny added the Team:Distributed Indexing Meta label for Distributed Indexing team label Sep 29, 2025

elasticsearchmachine added external-contributor Pull request authored by a developer outside the Elasticsearch team v9.2.0 labels Sep 29, 2025

felixbarny mentioned this pull request Sep 29, 2025

Remove index.dimensions feature flag #135402

Merged

[CI] Auto commit changes from spotless

9f27ee2

felixbarny closed this Sep 29, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add tsid to the translog #135596

Add tsid to the translog #135596

Uh oh!

felixbarny commented Sep 29, 2025

Uh oh!

henningandersen commented Sep 29, 2025

Uh oh!

felixbarny commented Sep 29, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Add tsid to the translog #135596

Add tsid to the translog #135596

Uh oh!

Conversation

felixbarny commented Sep 29, 2025

Uh oh!

henningandersen commented Sep 29, 2025

Uh oh!

felixbarny commented Sep 29, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants