Skip to content

Fix thread leakage in OLS backend #1161

@haideriqbal

Description

@haideriqbal

Alert from K8 admins:

Image

Upon investigating backend pods on culprit worker nodes found thousands of idle connection evictor threads from Apache HttpClient. So looks like somewhere in our code, we're creating new HttpClient instances repeatedly instead of reusing a single shared instance. Each HttpClient creates its own IdleConnectionEvictor thread that never gets cleaned up.

[spotbot@ontotools-pipelines k8s]$ grep "java.lang.Thread.State" thread_dump.txt | sort | uniq -c | sort -rn
  17877    java.lang.Thread.State: TIMED_WAITING (sleeping)
     31    java.lang.Thread.State: TIMED_WAITING (parking)
     15    java.lang.Thread.State: RUNNABLE
      2    java.lang.Thread.State: TIMED_WAITING (on object monitor)
      1    java.lang.Thread.State: WAITING (parking)
      1    java.lang.Thread.State: WAITING (on object monitor)
[spotbot@ontotools-pipelines k8s]$ grep "at " thread_dump.txt | sort | uniq -c | sort -rn | head -20
  17916 	at java.lang.Thread.runWith(java.base@24.0.1/Thread.java:1460)
  17916 	at java.lang.Thread.run(java.base@24.0.1/Thread.java:1447)
  17877 	at java.lang.Thread.sleepNanos(java.base@24.0.1/Thread.java:482)
  17877 	at java.lang.Thread.sleepNanos0(java.base@24.0.1/Native Method)
  17877 	at java.lang.Thread.sleep(java.base@24.0.1/Thread.java:513)
  17876 	at org.apache.http.impl.client.IdleConnectionEvictor$1.run(IdleConnectionEvictor.java:66)
     32 	at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:63)
     32 	at jdk.internal.misc.Unsafe.park(java.base@24.0.1/Native Method)
     31 	at java.util.concurrent.locks.LockSupport.parkNanos(java.base@24.0.1/LockSupport.java:271)
     31 	at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(java.base@24.0.1/AbstractQueuedSynchronizer.java:1802)
     30 	at org.apache.tomcat.util.threads.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:658)
     30 	at org.apache.tomcat.util.threads.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1174)
     30 	at org.apache.tomcat.util.threads.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1112)
     30 	at org.apache.tomcat.util.threads.TaskQueue.poll(TaskQueue.java:92)
     30 	at org.apache.tomcat.util.threads.TaskQueue.poll(TaskQueue.java:33)
     30 	at java.util.concurrent.LinkedBlockingQueue.poll(java.base@24.0.1/LinkedBlockingQueue.java:460)
      5 	at sun.nio.ch.SelectorImpl.lockAndDoSelect(java.base@24.0.1/SelectorImpl.java:130)
      5 	at sun.nio.ch.EPoll.wait(java.base@24.0.1/Native Method)
      5 	at sun.nio.ch.EPollSelectorImpl.doSelect(java.base@24.0.1/EPollSelectorImpl.java:117)
      4 	at sun.nio.ch.SelectorImpl.select(java.base@24.0.1/SelectorImpl.java:147)

Metadata

Metadata

Assignees

Labels

High priorityWidespread issue that reduces usability substantially or an ontology that is completely unusablebug

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions