Skip to content

Conversation

@dkropachev
Copy link

@dkropachev dkropachev commented Mar 13, 2025

DefaultConvictionPolicy.openConnections can become negative:

java.lang.AssertionError
at com.datastax.driver.core.ConvictionPolicy$DefaultConvictionPolicy.signalConnectionClosed(ConvictionPolicy.java:90)
at com.datastax.driver.core.Connection.closeAsync(Connection.java:953)
at com.datastax.driver.core.HostConnectionPool.discardAvailableConnections(HostConnectionPool.java:882)
at com.datastax.driver.core.HostConnectionPool.closeAsync(HostConnectionPool.java:843)

It happens because there's a loose connection between Connection and ConvictionPolicy (i.e., Host), which makes it impossible to guarantee that a single connection always address same openConnections counter.

This can occasionally lead to assertion failures in rare cases on given line:
assert remaining >= 0;

Fixing this would require refactoring Connection.java to bind connections to Host instead of Endpoint, which is not justified considering low impact of this issue.

Fixes: #82

@dkropachev dkropachev force-pushed the dk/fix-conviction-policy-counter-go-negative branch from 3c28cef to 5bc688d Compare March 13, 2025 04:00
@fruch
Copy link

fruch commented Mar 13, 2025

nice warpup, of you fixing the issue you raise almost 3 years ago :)

Copy link

@Bouncheck Bouncheck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

List<Connection> newConnections(HostConnectionPool pool, int count) calls List<Connection> connections = Lists.newArrayListWithCapacity(count); which should throw IllegalArgumentException when the count is negative. While it's better to guard signal against negatives which technically could still happen, was that exception ever seen? It would be weird if not.

@dkropachev
Copy link
Author

List<Connection> newConnections(HostConnectionPool pool, int count) calls List<Connection> connections = Lists.newArrayListWithCapacity(count); which should throw IllegalArgumentException when the count is negative. While it's better to guard signal against negatives which technically could still happen, was that exception ever seen? It would be weird if not.

It is 0 at the time, reduced by 1 afterwards.

@dkropachev dkropachev requested a review from Bouncheck March 13, 2025 11:04
@Bouncheck
Copy link

List<Connection> newConnections(HostConnectionPool pool, int count) calls List<Connection> connections = Lists.newArrayListWithCapacity(count); which should throw IllegalArgumentException when the count is negative. While it's better to guard signal against negatives which technically could still happen, was that exception ever seen? It would be weird if not.

It is 0 at the time, reduced by 1 afterwards.

I don't follow. The count argument in newConnections? openConnections after signalConnectionsOpening?
What about the exception?

Copy link

@Bouncheck Bouncheck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unblocking

@dkropachev
Copy link
Author

dkropachev commented Mar 13, 2025

List<Connection> newConnections(HostConnectionPool pool, int count) calls List<Connection> connections = Lists.newArrayListWithCapacity(count); which should throw IllegalArgumentException when the count is negative. While it's better to guard signal against negatives which technically could still happen, was that exception ever seen? It would be weird if not.

It is 0 at the time, reduced by 1 afterwards.

I don't follow. The count argument in newConnections? openConnections after signalConnectionsOpening? What about the exception?

Yes, you are correct, let me look into it furthere.

@dkropachev dkropachev force-pushed the dk/fix-conviction-policy-counter-go-negative branch 3 times, most recently from 44baa62 to b8fe3c6 Compare March 17, 2025 20:53
@dkropachev
Copy link
Author

@Bouncheck , please check again, I have decided to remove assert, reasoning on the PR description, commit and comments.

Copy link

@Bouncheck Bouncheck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC assert removal is necessary. I think removing assert line and adding a comment is enough.
I see no reason to increment the counter when going negative. It does not fix the behavior and with this increment the number loses its meaning. Even if it was negative it would at least always be equal to <opens performed - closes performed>.

Loose connection between Connection and ConvictionPolicy (i.e Host)
it is impossible to guarantee that single connection addresses same `openConnections` counter
Which opens up opportunity for this assertion to fail in rare cases.
Fixing this issue would involve major refactoring of `Connection.java` binding connection to `Host`, instead of `Endpoint`
Giving low impact of given issue doing refactoring makes no sense.
@dkropachev dkropachev force-pushed the dk/fix-conviction-policy-counter-go-negative branch from b8fe3c6 to fa4fbf9 Compare March 18, 2025 13:14
@dkropachev
Copy link
Author

IIUC assert removal is necessary. I think removing assert line and adding a comment is enough. I see no reason to increment the counter when going negative. It does not fix the behavior and with this increment the number loses its meaning. Even if it was negative it would at least always be equal to <opens performed - closes performed>.

You are correct, but we have to fix signalConnectionFailure then to return true when counter less then 0 too, please take look.

@dkropachev dkropachev merged commit 840b528 into scylladb:scylla-3.x Mar 18, 2025
11 checks passed
dkropachev added a commit that referenced this pull request Mar 21, 2025
Loose connection between Connection and ConvictionPolicy (i.e Host)
it is impossible to guarantee that single connection addresses same `openConnections` counter
Which opens up opportunity for this assertion to fail in rare cases.
Fixing this issue would involve major refactoring of `Connection.java` binding connection to `Host`, instead of `Endpoint`
Giving low impact of given issue doing refactoring makes no sense.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants