Skip to content

Conversation

benchaplin
Copy link
Contributor

CI is borked in #127150 so creating a duplicate PR here.

original-brownbear and others added 6 commits April 22, 2025 15:45
Making this work with batched execution and fixing a memory leak:

* Fix memory leak by removing listener on first message. There really only is a single message
here per node anyway with batched execution in the mix. Either it's a single shard on the data node
and we get a single query message or it's multiple shards and we get a single batched message,
so fine to remove listener after the first message since all tests do a single request only anyway.
* Add a new hook that allows inspection of the actual response. This is needed for batched since batched
sends a non-error response even if the data node failed all searches. We had this before in the
`onResponseSent` hook but checking the instance after it's been sent over the wire causes needless
overhead in the production code so moving to a "before-style" hook here.
@benchaplin benchaplin added >test Issues or PRs that are addressing/adding tests Team:Search Foundations Meta label for the Search Foundations team in Elasticsearch :Search Foundations/Search Catch all for Search Foundations v9.2.0 labels Jul 30, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search-foundations (Team:Search Foundations)

* @param action the request action
* @param response response instance
*/
default void onBeforeResponseSent(long requestId, String action, TransportResponse response) {}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure I follow the issue in the first place, as well as the proposed fix. The proposal is to add a new default method to TransportMessageListener, to then only ever use it in a test? Could you help me better understand what other options we have?

Copy link
Contributor Author

@benchaplin benchaplin Aug 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Async/SearchErrorTraceIT forces an error on all shards during a search. Before batched execution, this would trigger onResponseSent(long requestId, String action, Exception error) - called for every failed action response. We could then implement onResponseSent and inspect the exception to assert things (like whether the stack trace was present).

Batched execution sends a non-error response even if all searches fail. We still want to inspect this response to assert the same things that were once captured in the exception. But TransportMessageListener has no hook to inspect non-error payloads, so this PR introduces one.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hey Ben, sorry for the delay. I understand the problem better. I think this is a bit of an intrusive fix though. Isn't there another way to intercept responses in this specific test as opposed to using transport message listener? You may want to reach out to the distrib team and ask for advice on this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@javanna I've reworked the tests using a nice suggestion from @DaveCTurner - thanks for pushing for a better solution!

Copy link
Member

@javanna javanna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! good work!

@benchaplin benchaplin merged commit 9eadfac into elastic:main Sep 4, 2025
33 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Search Foundations/Search Catch all for Search Foundations Team:Search Foundations Meta label for the Search Foundations team in Elasticsearch >test Issues or PRs that are addressing/adding tests v9.2.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants