Jenkins and plugins versions report
Noticed in the latest version of jenkins (2.528.3) and this plugin (1308.vff6e33248305).
While connecting with the Public IP of docker host VM, if for some reason the connection on controller becomes zombie (i.e. socket connection present on controller, but not on docker), the build triggers get stuck in contacting docker host at,
thread dump
"jenkins.util.Timer [#5]" #68 [103] daemon prio=5 os_prio=0 cpu=1486.31ms elapsed=7407.21s tid=0x000078fc40004630 nid=103 runnable [0x000078fce68fb000]
java.lang.Thread.State: RUNNABLE
at sun.nio.ch.Net.poll(java.base@21.0.9/Native Method)
at sun.nio.ch.NioSocketImpl.park(java.base@21.0.9/NioSocketImpl.java:191)
at sun.nio.ch.NioSocketImpl.timedFinishConnect(java.base@21.0.9/NioSocketImpl.java:548)
at sun.nio.ch.NioSocketImpl.connect(java.base@21.0.9/NioSocketImpl.java:592)
at java.net.SocksSocketImpl.connect(java.base@21.0.9/SocksSocketImpl.java:327)
at java.net.Socket.connect(java.base@21.0.9/Socket.java:751)
at org.apache.hc.client5.http.impl.io.DefaultHttpClientConnectionOperator.connect(DefaultHttpClientConnectionOperator.java:205)
at org.apache.hc.client5.http.impl.io.PoolingHttpClientConnectionManager.connect(PoolingHttpClientConnectionManager.java:490)
at org.apache.hc.client5.http.impl.classic.InternalExecRuntime.connectEndpoint(InternalExecRuntime.java:164)
at org.apache.hc.client5.http.impl.classic.InternalExecRuntime.connectEndpoint(InternalExecRuntime.java:174)
at org.apache.hc.client5.http.impl.classic.ConnectExec.execute(ConnectExec.java:144)
at org.apache.hc.client5.http.impl.classic.ExecChainElement.execute(ExecChainElement.java:51)
at org.apache.hc.client5.http.impl.classic.ExecChainElement$$Lambda/0x000078fcee157b58.proceed(Unknown Source)
at org.apache.hc.client5.http.impl.classic.ProtocolExec.execute(ProtocolExec.java:195)
at org.apache.hc.client5.http.impl.classic.ExecChainElement.execute(ExecChainElement.java:51)
at org.apache.hc.client5.http.impl.classic.ExecChainElement$$Lambda/0x000078fcee157b58.proceed(Unknown Source)
at org.apache.hc.client5.http.impl.classic.ContentCompressionExec.execute(ContentCompressionExec.java:150)
at org.apache.hc.client5.http.impl.classic.ExecChainElement.execute(ExecChainElement.java:51)
at org.apache.hc.client5.http.impl.classic.ExecChainElement$$Lambda/0x000078fcee157b58.proceed(Unknown Source)
at org.apache.hc.client5.http.impl.classic.HttpRequestRetryExec.execute(HttpRequestRetryExec.java:113)
at org.apache.hc.client5.http.impl.classic.ExecChainElement.execute(ExecChainElement.java:51)
at org.apache.hc.client5.http.impl.classic.ExecChainElement$$Lambda/0x000078fcee157b58.proceed(Unknown Source)
at org.apache.hc.client5.http.impl.classic.RedirectExec.execute(RedirectExec.java:110)
at org.apache.hc.client5.http.impl.classic.ExecChainElement.execute(ExecChainElement.java:51)
at org.apache.hc.client5.http.impl.classic.InternalHttpClient.doExecute(InternalHttpClient.java:185)
at org.apache.hc.client5.http.impl.classic.CloseableHttpClient.execute(CloseableHttpClient.java:87)
at org.apache.hc.client5.http.impl.classic.CloseableHttpClient.execute(CloseableHttpClient.java:55)
at org.apache.hc.client5.http.classic.HttpClient.executeOpen(HttpClient.java:183)
at com.github.dockerjava.httpclient5.ApacheDockerHttpClientImpl.execute(ApacheDockerHttpClientImpl.java:189)
at com.github.dockerjava.httpclient5.ApacheDockerHttpClient.execute(ApacheDockerHttpClient.java:9)
at com.github.dockerjava.core.DefaultInvocationBuilder.execute(DefaultInvocationBuilder.java:228)
at com.github.dockerjava.core.DefaultInvocationBuilder.get(DefaultInvocationBuilder.java:202)
at com.github.dockerjava.core.DefaultInvocationBuilder.get(DefaultInvocationBuilder.java:74)
at com.github.dockerjava.core.exec.ListContainersCmdExec.execute(ListContainersCmdExec.java:44)
at com.github.dockerjava.core.exec.ListContainersCmdExec.execute(ListContainersCmdExec.java:15)
at com.github.dockerjava.core.exec.AbstrSyncDockerCmdExec.exec(AbstrSyncDockerCmdExec.java:21)
at com.github.dockerjava.core.command.AbstrDockerCmd.exec(AbstrDockerCmd.java:33)
at com.nirima.jenkins.plugins.docker.DockerCloud.countContainersInDocker(DockerCloud.java:638)
at com.nirima.jenkins.plugins.docker.DockerCloud.canAddProvisionedAgent(DockerCloud.java:656)
at com.nirima.jenkins.plugins.docker.DockerCloud.provision(DockerCloud.java:394)
- locked <0x000000069217bb88> (a com.nirima.jenkins.plugins.docker.DockerCloud)
at io.jenkins.docker.FastNodeProvisionerStrategy.applyToCloud(FastNodeProvisionerStrategy.java:71)
at io.jenkins.docker.FastNodeProvisionerStrategy.apply(FastNodeProvisionerStrategy.java:41)
at hudson.slaves.NodeProvisioner.update(NodeProvisioner.java:327)
at hudson.slaves.NodeProvisioner.lambda$suggestReviewNow$4(NodeProvisioner.java:199)
at hudson.slaves.NodeProvisioner$$Lambda/0x000078fcedd2ea28.run(Unknown Source)
at jenkins.security.ImpersonatingScheduledExecutorService$1.run(ImpersonatingScheduledExecutorService.java:67)
at java.util.concurrent.Executors$RunnableAdapter.call(java.base@21.0.9/Executors.java:572)
at java.util.concurrent.FutureTask.run(java.base@21.0.9/FutureTask.java:317)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(java.base@21.0.9/ScheduledThreadPoolExecutor.java:304)
at java.util.concurrent.ThreadPoolExecutor.runWorker(java.base@21.0.9/ThreadPoolExecutor.java:1144)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(java.base@21.0.9/ThreadPoolExecutor.java:642)
at java.lang.Thread.runWith(java.base@21.0.9/Thread.java:1596)
at java.lang.Thread.run(java.base@21.0.9/Thread.java:1583)
Locked ownable synchronizers:
- <0x000000068c284050> (a java.util.concurrent.locks.ReentrantLock$NonfairSync)
- <0x000000068ea724c8> (a java.util.concurrent.ThreadPoolExecutor$Worker)
- <0x00000006ac631450> (a java.util.concurrent.locks.ReentrantLock$NonfairSync)
No connections were seen on docker host VM when checked with netstat -natup.
In this case the connectionTimeout seemed to ineffective.
What Operating System are you using (both controller, and any agents involved in the problem)?
Controller was CloudBees CI running in k8s uses RHEL 9, docker host was Debian 12.
Reproduction steps
It's hard to reproduce as the JVM should still keep waiting for the other side (ie. docker host), but docker host should have already dropped the connection.
Tried on docker host,
sudo apt install iptables iptables-persistent
sudo iptables -A OUTPUT -p tcp -d <ip of the controller> --sport 2375 -j DROP
But doesn't reproduce systematically
Expected Results
Some kind of timeout should unblock the provision method being stuck.
Actual Results
Stuck waiting for other side - a zombie connection.
Anything else?
Currently it seems there is no SO_TIMEOUT possibility to detect dead connection.
Perhaps at https://github.com/docker-java/docker-java/blob/faa88e16460a8cb321c9695cdbc34cb7a662458e/docker-java-transport-httpclient5/src/main/java/com/github/dockerjava/httpclient5/ApacheDockerHttpClientImpl.java#L117-L122 ?
Are you interested in contributing a fix?
No response
Jenkins and plugins versions report
Noticed in the latest version of jenkins (
2.528.3) and this plugin (1308.vff6e33248305).While connecting with the Public IP of docker host VM, if for some reason the connection on controller becomes zombie (i.e. socket connection present on controller, but not on docker), the build triggers get stuck in contacting docker host at,
thread dump
No connections were seen on docker host VM when checked with
netstat -natup.In this case the
connectionTimeoutseemed to ineffective.What Operating System are you using (both controller, and any agents involved in the problem)?
Controller was CloudBees CI running in k8s uses RHEL 9, docker host was Debian 12.
Reproduction steps
It's hard to reproduce as the JVM should still keep waiting for the other side (ie. docker host), but docker host should have already dropped the connection.
Tried on docker host,
But doesn't reproduce systematically
Expected Results
Some kind of timeout should unblock the
provisionmethod being stuck.Actual Results
Stuck waiting for other side - a zombie connection.
Anything else?
Currently it seems there is no
SO_TIMEOUTpossibility to detect dead connection.Perhaps at https://github.com/docker-java/docker-java/blob/faa88e16460a8cb321c9695cdbc34cb7a662458e/docker-java-transport-httpclient5/src/main/java/com/github/dockerjava/httpclient5/ApacheDockerHttpClientImpl.java#L117-L122 ?
Are you interested in contributing a fix?
No response