-
Notifications
You must be signed in to change notification settings - Fork 4
Description
Hello @rbygrave,
as explained in our last meeting, we observed the pool becomes very slow, when there is a network outage (e.g. a pool.reset took nearly one hour).
What we know, that drivers lag, if TCP/IP packets are dropped. Of course, every "active" command (like restoring schema etc) may lag or in a complete network outage, it may also throw an error.
So it is important, that these commands are done outside the queue-lock, so that we do not prevent other threads of getting new connections, only some of them lagging.
Unfortunately, also the close method may block. Especially the close method of a DB2-prepared statement tries to communicate with the server, when the resultset is partially read. (close took in the tests 2 minutes)
We assume, that we had either a real network outage or the response time of the DB increased due to overload. In this case, the connection pool tries to perform a reset. Due to the huge amount of connections in the pool and an average close time of 2 minutes, we assume this was the reason, why it took up to 1hr until the pool recovered.
I have PRs for discussion, what could be the optimal strategy to solve the problem
- Option 1: Close connections asynchroniously: Close connections asyncronously #127 - This would be our preferred solution
- Option 2: Move the closeConnectionFully calls outside the lock. See closeConnectionFully is performed without lock #115 - some places are a bit cumbersome...
@rbygrave This would be our next "high priority" change. It would be great, if you find time to reviewe #127 and give us feedback, if this could be the way to go or if you think #115 is the better option.