Skip to content

Conversation

brunobat
Copy link
Contributor

@brunobat brunobat commented Sep 30, 2025

Test for #47409 (comment)

@mjurc @rsvoboda, I'm fairly confident that the metric in Quarkus works as expected on current main. I tried the 3.27 branch but it doesn't compile on my machine (didn't debug that).

Added refactoring because the client metric name for errors on open was not aligned with the server.

@brunobat
Copy link
Contributor Author

@mkouba I was expecting that an exception onOpen would not open the connection, but it does.

@rsvoboda
Copy link
Member

@brunobat What's wrong with the reproducer if there are no changes needed in the codebase?

I asked Michal to take a look, but we both are pretty busy with RHBQ releases (I'm on 3.15, Michal is on 3.27). So it may take time before we respond.

@brunobat brunobat marked this pull request as ready for review September 30, 2025 15:14
@brunobat
Copy link
Contributor Author

@rsvoboda I didn't debug your test because I was not able to build your test suite on my machine (I didn't spent much time on it).
This might be a timing issue or the assertion is not finding the right thing. Please check the raw output on your side for the quarkus_websockets_server_connections_onopen_errors* metrics.

This comment has been minimized.

@mjurc
Copy link
Contributor

mjurc commented Sep 30, 2025

@brunobat I've tried this manually - the metric does NOT indeed appear on metrics endpoint after opening ws.next connection and getting an exception on open.

I'm quite convinced the compilation error is because the branch is based on SNAPSHOT version, one that can be built and installed into your local maven repo directly out of main or version branch of Quarkus.

More info at https://github.com/quarkusio/quarkus/blob/main/CONTRIBUTING.md#using-snapshots

I've updated the reproducer to use 999-SNAPSHOT of Quarkus, but you can easily change the version of Quarkus to a released one with -Dquarkus.platform.version=${YOUR_DESIRED_VERSION}.

Reproducer:

git clone [email protected]:mjurc/quarkus-test-suite.git --branch QUARKUS-5667 && cd quarkus-test-suite
./mvnw clean verify -pl websockets/websocket-next -Dit.test=WebSocketsNextMetricsIT#serverErrorMetricsTest -Dreruns=0

But you can also run the app in devmode (and set the version of Quarkus with -Dquarkus.platform.version=${YOUR_DESIRED_VERSION}:

cd websockets/websockets-next
mvn quarkus:dev

The FailingWebSocket endpoint is the one producing exceptions on open.

@brunobat brunobat force-pushed the ws-next-onopen-test branch from b51e14c to 68edd23 Compare October 1, 2025 12:59
@brunobat
Copy link
Contributor Author

brunobat commented Oct 1, 2025

@rsvoboda @mjurc I added a test that replicates the issue you are finding: testServerEndpoint_OnConnectionErrorHandler
Seems that because the server has a @OnError handler, the metric is not incremented.

@mkouba should we really assume that if an error is handled we shouldn't increment the metric?

On a side note, with the test suite it is though to reproduce this because there is too much wiring and assertion magic going on.

This comment has been minimized.

@mkouba
Copy link
Contributor

mkouba commented Oct 1, 2025

@mkouba should we really assume that if an error is handled we shouldn't increment the metric?

I have no idea what this metric actually means 🤷. If an error handler exists then we don't apply the unhandled-failure-strategy (i.e. close the connection by default) because we expect that the error handler will react appropriately.

CC @michalvavrik

@mjurc
Copy link
Contributor

mjurc commented Oct 1, 2025

@brunobat the real issue is that we have metrics that don't have a documented behaviour, and that was the root cause of the original issue too.

Judging by the name, I'd say the counter should incremenet if @onopen produces an exception no matter what the other behaviours are.

@michalvavrik
Copy link
Member

@brunobat I've tried this manually - the metric does NOT indeed appear on metrics endpoint after opening ws.next connection and getting an exception on open.

&

I have no idea what this metric actually means 🤷. If an error handler exists then we don't apply the unhandled-failure-strategy (i.e. close the connection by default) because we expect that the error handler will react appropriately.

What I implemented based on @brunobat feedback (I could have misunderstood it) was that quarkus_websockets_server_connections_opened_errors_total are errors that happens during opening connections. We explicitly discussed the scenario of error in the @OnOpen callback, but that is happening after the connection has been opened. So later I think @geoand made some changes or renamed that metric and I think he changed what is expected #47415.

@brunobat the real issue is that we have metrics that don't have a documented behaviour, and that was the root cause of the original issue too.

I don't think this is completely fair, all the metrics are documented, when you look at them with Prometheus, for this metric you get Number of failures occurred when opening server connection failed.

I hope I am wrong, but this feels like heated discussion for such a minor thing. I think it can be easily fixed whatever you think is issue right now. The "docs" is just exporting what we have to adoc.

@mjurc
Copy link
Contributor

mjurc commented Oct 1, 2025

@michalvavrik The issue is minor and I am sure it can be fixed, it's just that that metric description can lead to ambiguous interpretation, see the issue opened all the way back in April. Since then, the behaviour was changed (I think; the issue was closed by a PR), but the reproducer I wrote back in April still fails, so I think the description needs to be clarified a bit. I don't mind it getting the description added to docs, but it would be nice to know what to expect in the test.

If the test is right, then we'll need a fix in Quarkus :)

@brunobat
Copy link
Contributor Author

brunobat commented Oct 1, 2025

Will find a way to increment the metric even when the @OnOpen is present.

@brunobat brunobat marked this pull request as draft October 2, 2025 06:34
@brunobat
Copy link
Contributor Author

brunobat commented Oct 2, 2025

After taking another look at the code and trying to implement what was discussed above, I still think the current code does the right thing.
Expanded the tests on this PR to explain what's going on. Please take a look.

  • The first test: Server doesn't implement @OnErrorhandler, therefore, an on open exception is fatal and the onopen.errors counter in incremented. The count.errors (all errors) counter is also incremented. We have 1 connection up and consequent disconnection metric also incremented. All good.
  • The second test: The server implements an @OnError handler and the connection is salvaged. The connection opening did NOT FAILED and the onopen.errors was not incremented. As you can see, there was an error because the count.errors (all errors) counter is incremented, however, the connections.closed counter was not incremented because the connection remains active, meaning the handshake was successful, therefore confirming that onopen.errors should mark 0.

Added refactoring because the client metric name for errors on open was not aligned with the server.
Also added some comments to clarify behavior.

These are all new metrics on a new domain, it's normal to have doubts about what thing really mean. I'm open to documentation improvements but I believe the metrics are correct.

@brunobat brunobat marked this pull request as ready for review October 2, 2025 09:04
@brunobat brunobat requested a review from geoand October 2, 2025 09:04
Copy link

quarkus-bot bot commented Oct 2, 2025

Status for workflow Quarkus CI

This is the status report for running Quarkus CI on commit 424e667.

✅ The latest workflow run for the pull request has completed successfully.

It should be safe to merge provided you have a look at the other checks in the summary.

You can consult the Develocity build scans.

@rsvoboda rsvoboda changed the title Add test for onOpen exception on WS Next Refactor and add test for onOpen exception on WS Next Oct 2, 2025
@michalvavrik michalvavrik requested a review from mkouba October 9, 2025 15:07
@mjurc
Copy link
Contributor

mjurc commented Oct 9, 2025

@brunobat So just to make sure, the aim for the metric is to be increased on unrecoverable connection failure, yes?

@brunobat
Copy link
Contributor Author

brunobat commented Oct 9, 2025

@brunobat So just to make sure, the aim for the metric is to be increased on unrecoverable connection failure, yes?

Correct

@brunobat brunobat merged commit 8ea2b28 into quarkusio:main Oct 9, 2025
51 checks passed
@quarkus-bot quarkus-bot bot added this to the 3.29 - main milestone Oct 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants