-
-
Notifications
You must be signed in to change notification settings - Fork 8.6k
Open
Labels
A-needs-triagingA Selenium member will evaluate this soon!A Selenium member will evaluate this soon!B-gridEverything grid and server relatedEverything grid and server relatedC-javaJava BindingsJava BindingsD-firefoxI-defectSomething is not working as intendedSomething is not working as intendedOS-linux
Description
Description
We are experiencing random SessionNotCreatedException
in Selenium Grid 4.13 (hub). The problem mostly occurs under high concurrency and is timing-sensitive — it is not reproducible reliably in small-scale or synthetic tests.
Environment
- Hub: Selenium 4.13
- Nodes: 20–50, each with 19 Firefox + 19 Chrome slots (38 slots per node)
- OS: Linux, Java 11
- Browsers: Firefox and Chrome
Exception in client
TimeoutException after three minutes
SEVERE: Exception occurred while doing remote webdriver testing
org.openqa.selenium.SessionNotCreatedException: Could not start a new session. Possible causes are invalid address of the remote server or browser start-up failure.
Host info: host: 'test', ip: 'fe80:0:0:0:0:95e7:1b5d:ae0e%en0'
at org.openqa.selenium.remote.RemoteWebDriver.execute(RemoteWebDriver.java:563)
at org.openqa.selenium.remote.RemoteWebDriver.startSession(RemoteWebDriver.java:245)
at org.openqa.selenium.remote.RemoteWebDriver.<init>(RemoteWebDriver.java:174)
at org.openqa.selenium.remote.RemoteWebDriver.<init>(RemoteWebDriver.java:152)
at com.test.selenium.GridTest.getChromeDriver(GridTest.java:400)
at com.test.selenium.GridTest.doTest(GridTest.java:143)
at com.test.selenium.GridTest.main(GridTest.java:98)
Caused by: org.openqa.selenium.TimeoutException: java.util.concurrent.TimeoutException
Build info: version: '4.23.1', revision: '656257d8e9'
System info: os.name: 'Mac OS X', os.arch: 'aarch64', os.version: '15.6.1', java.version: '11.0.16.1'
Driver info: driver.version: RemoteWebDriver
at org.openqa.selenium.remote.http.jdk.JdkHttpClient.execute0(JdkHttpClient.java:418)
at org.openqa.selenium.remote.http.AddSeleniumUserAgent.lambda$apply$0(AddSeleniumUserAgent.java:42)
at org.openqa.selenium.remote.http.Filter.lambda$andFinally$1(Filter.java:55)
at org.openqa.selenium.remote.http.jdk.JdkHttpClient.execute(JdkHttpClient.java:374)
at org.openqa.selenium.remote.tracing.TracedHttpClient.execute(TracedHttpClient.java:54)
at org.openqa.selenium.remote.ProtocolHandshake.createSession(ProtocolHandshake.java:89)
at org.openqa.selenium.remote.ProtocolHandshake.createSession(ProtocolHandshake.java:75)
at org.openqa.selenium.remote.ProtocolHandshake.createSession(ProtocolHandshake.java:61)
at org.openqa.selenium.remote.HttpCommandExecutor.execute(HttpCommandExecutor.java:162)
at org.openqa.selenium.remote.TracedCommandExecutor.execute(TracedCommandExecutor.java:53)
at org.openqa.selenium.remote.RemoteWebDriver.execute(RemoteWebDriver.java:545)
... 6 more
Caused by: java.util.concurrent.TimeoutException
at java.base/java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1886)
at java.base/java.util.concurrent.CompletableFuture.get(CompletableFuture.java:2021)
at org.openqa.selenium.remote.http.jdk.JdkHttpClient.execute0(JdkHttpClient.java:401)
... 16 more
Problem details
- Thread dumps show “Local Distributor - Session Creation” threads blocked in
DefaultSlotSelector.selectSlot()
/NodeStatus.getLoad()
:
java.lang.Thread.State: BLOCKED (on object monitor)
at java.lang.Object.wait([email protected]/Native Method)
- waiting on <no object reference available>
at java.util.concurrent.ForkJoinTask.externalAwaitDone([email protected]/Unknown Source)
- waiting to re-lock in wait() <0x00000007e5193e50> (a java.util.stream.ReduceOps$ReduceTask)
at java.util.concurrent.ForkJoinTask.doInvoke([email protected]/Unknown Source)
at java.util.concurrent.ForkJoinTask.invoke([email protected]/Unknown Source)
at java.util.stream.ReduceOps$ReduceOp.evaluateParallel([email protected]/Unknown Source)
at java.util.stream.ReduceOps$5.evaluateParallel([email protected]/Unknown Source)
at java.util.stream.ReduceOps$5.evaluateParallel([email protected]/Unknown Source)
at java.util.stream.AbstractPipeline.evaluate([email protected]/Unknown Source)
at java.util.stream.ReferencePipeline.count([email protected]/Unknown Source)
at org.openqa.selenium.grid.data.NodeStatus.getLoad(NodeStatus.java:174)
at org.openqa.selenium.grid.distributor.selector.DefaultSlotSelector.lambda$selectSlot$1(DefaultSlotSelector.java:75)
at org.openqa.selenium.grid.distributor.selector.DefaultSlotSelector$$Lambda$1141/0x0000000800562040.applyAsDouble(Unknown Source)
at java.util.Comparator.lambda$comparingDouble$8dcf42ea$1([email protected]/Unknown Source)
at java.util.Comparator$$Lambda$1142/0x0000000800562440.compare([email protected]/Unknown Source)
at java.util.Comparator.lambda$thenComparing$36697e65$1([email protected]/Unknown Source)
at java.util.Comparator$$Lambda$1143/0x0000000800562840.compare([email protected]/Unknown Source)
at java.util.Comparator.lambda$thenComparing$36697e65$1([email protected]/Unknown Source)
at java.util.Comparator$$Lambda$1143/0x0000000800562840.compare([email protected]/Unknown Source)
at java.util.Comparator.lambda$thenComparing$36697e65$1([email protected]/Unknown Source)
at java.util.Comparator$$Lambda$1143/0x0000000800562840.compare([email protected]/Unknown Source)
at java.util.TimSort.binarySort([email protected]/Unknown Source)
at java.util.TimSort.sort([email protected]/Unknown Source)
at java.util.Arrays.sort([email protected]/Unknown Source)
at java.util.ArrayList.sort([email protected]/Unknown Source)
at java.util.stream.SortedOps$RefSortingSink.end([email protected]/Unknown Source)
at java.util.stream.Sink$ChainedReference.end([email protected]/Unknown Source)
at java.util.stream.AbstractPipeline.copyInto([email protected]/Unknown Source)
at java.util.stream.AbstractPipeline.wrapAndCopyInto([email protected]/Unknown Source)
at java.util.stream.ReduceOps$ReduceOp.evaluateSequential([email protected]/Unknown Source)
at java.util.stream.AbstractPipeline.evaluate([email protected]/Unknown Source)
at java.util.stream.ReferencePipeline.collect([email protected]/Unknown Source)
at org.openqa.selenium.grid.distributor.selector.DefaultSlotSelector.selectSlot(DefaultSlotSelector.java:86)
at org.openqa.selenium.grid.distributor.local.LocalDistributor.reserveSlot(LocalDistributor.java:669)
at org.openqa.selenium.grid.distributor.local.LocalDistributor.newSession(LocalDistributor.java:551)
[tdump1.txt](https://github.com/user-attachments/files/22044863/tdump1.txt)
[tdump2.txt](https://github.com/user-attachments/files/22044864/tdump2.txt)
[tdump3.txt](https://github.com/user-attachments/files/22044865/tdump3.txt)
at org.openqa.selenium.grid.distributor.local.LocalDistributor$NewSessionRunnable.handleNewSessionRequest(LocalDistributor.java:829)
at org.openqa.selenium.grid.distributor.local.LocalDistributor$NewSessionRunnable.lambda$run$1(LocalDistributor.java:787)
at org.openqa.selenium.grid.distributor.local.LocalDistributor$NewSessionRunnable$$Lambda$1125/0x0000000800552840.run(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor.runWorker([email protected]/Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run([email protected]/Unknown Source)
at java.lang.Thread.run([email protected]/Unknown Source)
- Observations:
- The write lock on LocalDistributor is held during
reserveSlot()
which invokesselectSlot()
. NodeStatus.getLoad()
uses a parallelStream(), which creates many ForkJoinPool tasks.- This does not appear to be a deadlock or hard blocking, but rather latency/slow execution under load.
- Other session requests queue up behind the write lock, leading to random
SessionNotCreatedException
when delays exceed timeouts.
- Thread dump behavior:
- Different dumps show different threads inside
selectSlot()
or ForkJoinPool tasks. - Confirms the issue is contention/latency, not a deadlock.
- Probable cause:
selectSlot()
performs O(N log N) comparator calls during node sorting.- Each comparator calls
getLoad()
, which scans all slots via parallelStream(), creating thousands of ForkJoinPool tasks per session request. - This repeated scanning is likely the main source of latency.
Impact
- Random
SessionNotCreatedException
under load. - Not reproducible under light or synthetic load.
tdump1.txt
tdump2.txt
tdump3.txt
Reproducible Code
Not easily reproducible
Debugging Logs
Find the attached thread dumps
Metadata
Metadata
Assignees
Labels
A-needs-triagingA Selenium member will evaluate this soon!A Selenium member will evaluate this soon!B-gridEverything grid and server relatedEverything grid and server relatedC-javaJava BindingsJava BindingsD-firefoxI-defectSomething is not working as intendedSomething is not working as intendedOS-linux