Add tryWithEngineOrNull #132000

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Merged

kingherc merged 8 commits into elastic:main from kingherc:enhancement/ES-12215-with-engine-null-reset

Aug 1, 2025

Contributor

kingherc commented Jul 28, 2025 •

edited

Loading

And deprecate old style getEngine/OrNull methods.

Apply new functionality to several methods related to non-accurate metrics that do not need to wait for the engine being reset and can do with a null engine. These pertain typically to periodic operations that can skip a shard being reset and revisit it next time.

Relates ES-11457

kingherc self-assigned this

kingherc added >non-issue :Distributed Indexing/Engine Team:Distributed Indexing v9.2.0 labels

elasticsearchmachine added the serverless-linked label

kingherc force-pushed the enhancement/ES-12215-with-engine-null-reset branch 2 times, most recently from 4593d64 to b5f0212 Compare

July 28, 2025 11:03


          Add withEngineOrNullIfBeingReset

ff5b13a

And deprecate old style getEngine/OrNull methods.

Apply new functionality to several methods that do not need to
wait for the engine being reset and can do with a null engine.
These pertain typically to periodic operations that can skip
a shard being reset and revisit it next time.

Also return empty stats for a few stats in case the engine is
being reset. These are stats that are already returned empty
from a hollow engine.

Relates ES-11457

kingherc force-pushed the enhancement/ES-12215-with-engine-null-reset branch from b5f0212 to ff5b13a Compare

July 28, 2025 11:07

kingherc marked this pull request as ready for review

July 28, 2025 13:05

kingherc requested review from fcofdez and tlrx

July 28, 2025 13:05

Collaborator

elasticsearchmachine commented Jul 28, 2025

Pinging @elastic/es-distributed-indexing (Team:Distributed Indexing)

kingherc added 2 commits

July 29, 2025 16:55


          Merge remote-tracking branch 'kingherc/main' into enhancement/ES-1221…

bc4d6b8

…5-with-engine-null-reset


          Better comments

01e2e28

fcofdez reviewed

View reviewed changes

Contributor

fcofdez left a comment

I'm a bit concerned about the assumptions that we're making on this patch, namely that the reset engine would be always from InternalEngine -> Hollow where returning empty stats or discarding a flush request is acceptable because they're essentially no-ops. But this might not be true anymore if we end up adopting resetEngine for other engine implementations.

Contributor Author

kingherc commented Jul 29, 2025

@fcofdez do you hint we should maybe forego these changes at the moment? It might mean that some management threads or the Disk/Memory controller threads are temporarily stuck during massive resets, but it's not too bad either I guess (until we improve the situation ultimately). cc @tlrx for your opinion as well.

tlrx reviewed

View reviewed changes

Member

tlrx left a comment

I'm a bit mitigated about this change, I have the impression that we're tackling different problems with the new withEngineOrNullIfBeingReset.

As far as I understand, we have 3 situations:

stats or metrics that need to be reported while the engine is reset (ex: getWritingBytes(), flushStats(), indexingStats()`), which are mostly already best-effort in term of accuracy
actions that can be skipped during reset (flushOnIdle), which can be discussed for each case
actions that must be performed on a non-null engine (flush with waitIfOngoing=true, trimTranslog), which are the trickier ones to handle specially they are called on transport thread

I think we can craft something for the stats/metrics case (possible solutions could be to keep a copy of the stats for the time of the reset, or keep a reference on the engine-to-be-reset). For the skippable actions, I think something like you did in withEngineOrNullIfBeingReset can work (though I would call this tryWithEngineOrNull). For the non-skippable actions I think we need to find a solution for each case.

Happy to discuss this more

kingherc added 2 commits

July 30, 2025 19:11


          Merge remote-tracking branch 'kingherc/main' into enhancement/ES-1221…

154efda

…5-with-engine-null-reset


          Revert changes not related to metrics

7f57c4a

kingherc changed the title ~~Add withEngineOrNullIfBeingReset~~ Add tryWithEngineOrNull

kingherc requested review from fcofdez and tlrx

July 30, 2025 16:32

Contributor Author

kingherc commented Jul 30, 2025

@tlrx and @fcofdez , updated PR to only keep the parts related to the non-accurate metrics. Feel free to review again this PR and the serverless one.

tlrx reviewed

View reviewed changes

Member

tlrx left a comment

Looks good, I left some comments.

server/src/main/java/org/elasticsearch/index/shard/IndexShard.java Outdated Show resolved Hide resolved

server/src/main/java/org/elasticsearch/index/shard/IndexShard.java Outdated Show resolved Hide resolved

server/src/main/java/org/elasticsearch/index/shard/IndexShard.java

    
                              flushMetric.count(),

                              periodicFlushMetric.count(),

                              TimeUnit.NANOSECONDS.toMillis(flushMetric.sum()),

                              engine != null ? engine.getTotalFlushTimeExcludingWaitingOnLockInMillis() : 0L

Member

tlrx Aug 1, 2025

Mostly a note for myself, we could extract some stats at the shard level and pass them down to the engine instances so that they "survive" resets

Contributor Author

kingherc Aug 1, 2025

True, was thinking also of that, although there may be some concurrency challenges to handle. Also, since we will soon shorten the reset period drastically, I am worried about doing any more effort on this front, including this PR (which could arguable be skipped if we didn't have long resets). However, such efforts might be useful in the future if resets become long again or there may be other engines being reset.

server/src/main/java/org/elasticsearch/index/shard/IndexShard.java Outdated Show resolved Hide resolved

server/src/main/java/org/elasticsearch/index/shard/IndexShard.java Outdated Show resolved Hide resolved

server/src/test/java/org/elasticsearch/index/shard/IndexShardTests.java Show resolved Hide resolved

kingherc added 2 commits

August 1, 2025 14:11


          Merge remote-tracking branch 'kingherc/main' into enhancement/ES-1221…

cf1f92d

…5-with-engine-null-reset


          PR comments

6ceb986

kingherc commented

View reviewed changes

Contributor Author

kingherc left a comment

Thanks for the comments @tlrx ! Feel free to review again. Gentle reminder for @fcofdez to review as well.

server/src/main/java/org/elasticsearch/index/shard/IndexShard.java

    
                              flushMetric.count(),

                              periodicFlushMetric.count(),

                              TimeUnit.NANOSECONDS.toMillis(flushMetric.sum()),

                              engine != null ? engine.getTotalFlushTimeExcludingWaitingOnLockInMillis() : 0L

Contributor Author

kingherc Aug 1, 2025

True, was thinking also of that, although there may be some concurrency challenges to handle. Also, since we will soon shorten the reset period drastically, I am worried about doing any more effort on this front, including this PR (which could arguable be skipped if we didn't have long resets). However, such efforts might be useful in the future if resets become long again or there may be other engines being reset.

server/src/main/java/org/elasticsearch/index/shard/IndexShard.java Outdated Show resolved Hide resolved

server/src/test/java/org/elasticsearch/index/shard/IndexShardTests.java Show resolved Hide resolved

kingherc requested a review from tlrx

August 1, 2025 12:09

tlrx approved these changes

View reviewed changes

Member

tlrx left a comment

LGTM

server/src/main/java/org/elasticsearch/index/shard/IndexShard.java Outdated Show resolved Hide resolved

server/src/test/java/org/elasticsearch/index/shard/IndexShardTests.java Show resolved Hide resolved


          PR comments

5e4a636

fcofdez approved these changes

View reviewed changes

Contributor

fcofdez left a comment

LGTM

kingherc merged commit feaa580 into elastic:main

33 checks passed

kingherc deleted the enhancement/ES-12215-with-engine-null-reset branch

August 1, 2025 14:34

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Distributed Indexing/Engine >non-issue serverless-linked Team:Distributed Indexing v9.2.0