Skip to content

Conversation

@fcofdez
Copy link
Contributor

@fcofdez fcofdez commented Apr 24, 2025

Relates ES-10339

@fcofdez fcofdez added >enhancement :Distributed Indexing/Recovery Anything around constructing a new shard, either from a local or a remote source. Team:Distributed Indexing Meta label for Distributed Indexing team labels Apr 24, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-distributed-indexing (Team:Distributed Indexing)

@elasticsearchmachine
Copy link
Collaborator

Hi @fcofdez, I've created a changelog YAML for you.

@elasticsearchmachine elasticsearchmachine added the serverless-linked Added by automation, don't add manually label Apr 24, 2025
@fcofdez fcofdez requested a review from henningandersen April 24, 2025 14:18
Copy link
Contributor

@henningandersen henningandersen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good though I have a comment on the cleanup.

onGoingRecoveries.markRecoveryAsDone(recoveryId);
return null;
}), indexShard::preRecovery);
try (onCompletion) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would think this releases the recovery monitor and the recovery-ref too soon? My intuition would be that it should only be done when the action completes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding is that the RecoveryTarget would be retained until the recovery is marked as done (since the initial refCount=1 from the AbstractRefCounted corresponds to that decRef). But just to be on the safe side I've reverted to the previous behaviour that would release the RecoveryRef once the action returns.

Copy link
Contributor

@henningandersen henningandersen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

(though I'd like Iraklis to have a look at RecoveriesCollection if possible).

throw new IndexShardClosedException(shardId);
}
assert recoveryRef.target().shardId().equals(shardId);
assert recoveryRef.target().indexShard().routingEntry().isPromotableToPrimary();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this was added here, I am also not sure I understand why, perhaps @kingherc remember and can confirm that the assertion is not significant?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not out of the top of my head. But going back to the code, I see we've made a special branch in PeerRecoveryTargetService#doRecovery() with if (indexShard.routingEntry().isPromotableToPrimary() == false) { for unpromotables that basically quick skips all recovery stages, and closes the RecoveryRef as well. So the point of the assertion at the time was that there should be no other coordination needed for unpromotables to justify getting the RecoveryRef.

Seeing though that now this PR introduces some sort of coordination between unpromotables, it probably makes to remove the assertion.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I did not fully review this PR, but feel free to tell me if I should)

@fcofdez fcofdez merged commit c5c3615 into elastic:main May 1, 2025
16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Distributed Indexing/Recovery Anything around constructing a new shard, either from a local or a remote source. >enhancement serverless-linked Added by automation, don't add manually Team:Distributed Indexing Meta label for Distributed Indexing team v9.1.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants