Create connections concurrently in separate threads#2317
Open
tbadalov wants to merge 6 commits intobrettwooldridge:devfrom
Open
Create connections concurrently in separate threads#2317tbadalov wants to merge 6 commits intobrettwooldridge:devfrom
tbadalov wants to merge 6 commits intobrettwooldridge:devfrom
Conversation
tbadalov
commented
May 28, 2025
| @SuppressWarnings("NullableProblems") | ||
| public Thread newThread(Runnable r) { | ||
| var thread = new Thread(r, threadName); | ||
| var thread = new Thread(r, threadName + "_" + counter.incrementAndGet()); |
Author
There was a problem hiding this comment.
P.S. If the overall idea is good, gotta change this part so that thread naming doesn't interfere with other existing executors.
tbadalov
commented
May 28, 2025
| var backoffMs = 10L; | ||
| long jitterMs = ThreadLocalRandom.current().nextLong(Math.max(10, config.getConnectionTimeout() / 10)); | ||
| if (connectionsInProgress.get() >= Runtime.getRuntime().availableProcessors()) { | ||
| quietlySleep(jitterMs); |
Author
There was a problem hiding this comment.
Added sleeping for some jittered time to avoid potential CPU throttling in CPU-limited environments like Kubernetes, when multiple connections are created at once.
tbadalov
commented
May 28, 2025
| final int maxPoolSize = config.getMaximumPoolSize(); | ||
| this.addConnectionExecutor = createThreadPoolExecutor(maxPoolSize, poolName + ":connection-adder", threadFactory, new CustomDiscardPolicy()); | ||
| this.addConnectionExecutor = createCreatorThreadPoolExecutor( | ||
| Math.min(Runtime.getRuntime().availableProcessors() * 2, config.getMinimumIdle()), |
Author
There was a problem hiding this comment.
Also, core pool size will have to be the same as max pool size (not Hikari's), so that executor doesn't hold the task in the queue and executes immediately. Something to fix
d7a9b2f to
e56fa46
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The problem
When the cost of creating a connection is high enough (e.g. longer than connection-timeout, like a few seconds or more), the pool fails to reach sufficient number of connections to satisfy demand (rate of requests). Even when
com.zaxxer.hikari.blockUntilFilledis set totrueto make the app wait at startup to fill the pool with minimum idle connections, after max-lifetime of connections reaches, the connections are closed faster than the refill rate can keep up. This leads to the high number of waiting threads and small number of active and idle connections (less than the minimum-idle) either periodically or, in the worst case, becomes a new norm.In addition, since
PoolEntryCreatorcreates and adds new connections sequentially,initialization-fail-timeoutshould be large enough to guarantee minimum idle connections are ready which can quickly turn into several minutes, though unnecessarily - had the connections been created concurrently. Moreover, when we prefer the smallest max-lifetime (which is 30s), closing of connections can start in the middle ofblockUntilFilledat the startup, hence by the end of the prefill, we will have less than the desired number of idle connections.Another point is that, while a creator task is already running,
refillPool()method simply submits a creator task which ends up in the queue, because executor'scorePoolsize = maxPoolSize = 1. As the creator task sequentially adds connections in the loop, by the time it finishes not much changes when the next creator task starts its loop again becauseshouldContinueCreating()has just returned false, and even if the condition satisfies again, it will barely help by adding a few connections. At this point, the queue becomes nothing more than just an indicator of demand, based on its size, rather than truly preserving refill requests and helping to meet the demand.Discussion
Some might ask why would establishing a connection take longer than a few seconds, and that this problem should be fixed in the first place. Although in most of the cases that would be fair enough but this is not always possible, and in fact, I believe that is the whole point of a connection pool to take the burden/cost of creating connections on itself and have warm connections available regardless of the time it takes to create a connection.
Another question some might ask, why would connection-timeout be less than the time it takes to establish a connection? That might be a legit argument if requests can afford running that long, which could be the case with apps with relatively low traffic. However, with comparatively low-latency systems and large number of requests, we have to make sure if DB is not available, we don't wait longer than a few hundreds of milliseconds not to saturate threads and avoid cascading failures. In which case we rely on the connection pool to maintain connections on the side and have ready connections whenever needed.
I think for this type of scenarios, the strategy of maintaining connections could be discussed further and taken to the next level perhaps. In the short term though, creating new connections concurrently should be good enough of a solution to meet the aforementioned requirements.
Overall, if there is a better way of tackling the problem, feel free to comment on how the code change should look like. Will appreciate if you review this PR and let me know what you think. Cheers!