Skip to content

Conversation

@ldematte
Copy link
Contributor

The RecordingApmServer needs to be started before the ElasticsearchCluster; however, it needs to be stopped (or, at least, it needs stopping processing messages) before the ElasticsearchCluster.
JUnit rules do process before/after in reverse order at startup/shutdown; using a RuleChain slightly improves the situation (as opposed to today's ClassRule + Rule), but does not solve the issue.
Here I am introducing an additional "stop" method; the method needs to be called explicitly at the end of the test, but this way we can make RecordingApmServer stop processing requests before the ElasticsearchCluster is gone and communication is interrupted exceptionally.

Fixes #129651

@ldematte ldematte requested a review from a team June 25, 2025 12:44
@ldematte ldematte added >test Issues or PRs that are addressing/adding tests :Core/Infra/Metrics Metrics and metering infrastructure v9.1.0 labels Jun 25, 2025
@elasticsearchmachine elasticsearchmachine added Team:Core/Infra Meta label for core/infra team v9.2.0 labels Jun 25, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-core-infra (Team:Core/Infra)

Copy link
Contributor

@mosche mosche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, though wondering if it wouldn't be easier to reason if instead handling IOExceptions

Copy link
Member

@rjernst rjernst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused how this addresses the linked failure. I thought the problem was the apm agent continues to run in the background, so requests to the recording server begin failing. Can you explain how this change fixes the issue?

@ldematte
Copy link
Contributor Author

Turned out that the issue is that the APM agent is in the middle of writing, ES gets shut down, the connection breaks, the mock APM server throws.
Shutting down the APM server before ES brings things down cleanly.
But thinking again, this is still brittle and an implementation detail; probably catching all exceptions like we were discussing is still better (more robust and simpler)

@ldematte
Copy link
Contributor Author

@mosche I've decided to go with your suggestion; I tried to refactor the RecordingApmServer to better coordinate the APM mock server and the ES cluster, but it's error prone and I was not satisfied. So better to go with the minimalist and more robust solution, at least until we stay with the APM agent.

Copy link
Contributor

@mosche mosche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm 👍

@ldematte ldematte enabled auto-merge (squash) June 26, 2025 13:22
@ldematte ldematte merged commit d6e2b57 into elastic:main Jun 26, 2025
32 checks passed
smalyshev pushed a commit to smalyshev/elasticsearch that referenced this pull request Jun 27, 2025
mridula-s109 pushed a commit to mridula-s109/elasticsearch that referenced this pull request Jul 3, 2025
ldematte added a commit to ldematte/elasticsearch that referenced this pull request Jul 12, 2025
elasticsearchmachine pushed a commit that referenced this pull request Jul 14, 2025
ldematte added a commit to ldematte/elasticsearch that referenced this pull request Jul 14, 2025
ldematte added a commit to ldematte/elasticsearch that referenced this pull request Jul 14, 2025
@ldematte ldematte deleted the fix/testApmIntegration-2 branch July 16, 2025 12:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Core/Infra/Metrics Metrics and metering infrastructure Team:Core/Infra Meta label for core/infra team >test Issues or PRs that are addressing/adding tests v9.2.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[CI] TracesApmIT testApmIntegration failing

4 participants