Skip to content

Conversation

@JeremyDahlgren
Copy link
Contributor

Came across this while working on issue 123248 and was seeing duplicate calls in the stats service (when caching was disabled).

The current SingleResultDeduplicator will run the computation action again, in the same thread, if other threads call execute() while the current computation is running. For a potentially expensive action this can delay the response to the original caller by an additional action execution time. Since we are caching in
TransportGetAllocationStatsAction it also is inconsistent with the strict requirements in the current SingleResultDeduplicator, since we can potentially return a response that was calculated before the call to execute().
Note also that due to the recursive nature of the current SingleResultDeduplicator implementation it is possible to continuously delay the original thread if additional threads call execute() while these computations run.

This change refactors SingleResultDeduplicator into an interface with two implementations, a strict form which is the same as the original SingleResultDeduplicator, and a relaxed version that completes all waiting listeners, along with the original call's listener, with the single computation result. This change will be used in the cancellation support that will be added for 123248.

The current SingleResultDeduplicator will run the computation action
again, in the same thread, if other threads call execute() while the
current computation is running.  For a potentially expensive action
this can delay the response to the original caller by an additional
action execution time.  Since we are caching in
TransportGetAllocationStatsAction it also is inconsistent with the
strict requirements in the current SingleResultDeduplicator, since we
can potentially return a response that was calculated before the call
to execute().
Note also that due to the recursive nature of the current
SingleResultDeduplicator implementation it is possible to continuously
delay the original thread if additional threads call execute() while
these computations run.

This change refactors SingleResultDeduplicator into an interface with
two implementations, a strict form which is the same as the original
SingleResultDeduplicator, and a relaxed version that completes all
waiting listeners, along with the original call's listener, with the
single computation result.
@JeremyDahlgren JeremyDahlgren added >non-issue :Distributed Coordination/Allocation All issues relating to the decision making around placing a shard (both master logic & on the nodes) Team:Distributed Coordination Meta label for Distributed Coordination team labels Apr 21, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-distributed-coordination (Team:Distributed Coordination)

Copy link
Contributor

@DaveCTurner DaveCTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For a potentially expensive action this can delay the response to the original caller by an additional action execution time.

In practice all the usages today look to fork the expensive work to a background thread, freeing up the original thread to complete its listeners straight away. We should probably document that this is the expected usage pattern.

we can potentially return a response that was calculated before the call to execute()

Only in the case of TransportGetAllocationStatsAction, and this is deliberate.

I'm not 100% convinced we need to do this, particularly since we're only using this abstraction in one place. If we do need TransportGetAllocationStatsAction to behave differently, maybe we should just do something with our bare hands there for now, until we see some other spot where this abstraction is useful.

Comment on lines +44 to +45
// The first thread will block until all the other callers have added a waiting listener.
safeAwait(countDownLatch);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is ok here but we should also have tests that don't orchestrate the order of events quite so carefully so that we can be more confident there's no races that might lead to lost listeners or duplicate concurrent executions. Admittedly the testing of SingleResultDeduplicator today is rather weak in this area, I'd be happy to see some more stringent testing of its invariants under concurrent use too.

}));
}

private static class ActionListenerList<T> implements ActionListener<T> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WDYT about just using SubscribableListener here? It uses a linked list so slightly more expensive in general I guess but also more efficient in the common one-subscriber case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, makes sense. This is trimmed down quite a bit and as you mentioned this code could just be used directly as needed in TransportGetAllocationStatsAction. I'm fine with closing this and not touching SingleResultDeduplicator. I can create a separate PR to update the class javadoc for SingleResultDeduplicator with the expected usage pattern, or revert changes here and just make that javadoc update.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Distributed Coordination/Allocation All issues relating to the decision making around placing a shard (both master logic & on the nodes) >non-issue Team:Distributed Coordination Meta label for Distributed Coordination team v9.1.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants