- 
          
- 
                Notifications
    You must be signed in to change notification settings 
- Fork 8.6k
Description
What happened?
Hello,
on our setup (a hub with 15 nodes), under high load, we see that the hub tries to create sessions that are already timed out.
I can reproduce this with a rather smaller setup:
- 1 hub, 5 nodes allowing each at most 2 sessions
- Hub is configured with a --session-request-timeout 30 --session-retry-interval 1000 --reject-unsupported-caps true
 Hub is running on a small server (1 CPU)
- 1 TestNG test that runs 60 tests with a parallelism of 15 (so more than the grid can handle)
 This test is very simple: it starts a browser, fill 2 text fields and click on 1 button. If necessary I will provide a pure Selenium script (for now, it's using our framework, but I'm pretty sure the problem is not there
- Wait a bit (may take 15 to 30 mins to happen) and at a certain point, no more sessions are created
Expected results
Grid continues to create session according to its capacity
Current result:
No more session created: this is due to #12848 (which is a good point) which kills the browser immediately if the session is timed out
At this point, client (see the logs) receives the message "New session request timed out " as expected
Reading the code in LocalNewSessionQueue, I can't see why this can happen because there are 2 guards
- timeoutSessions () which should remove such orphan / timed out sessions every 10 secsselenium/java/src/org/openqa/selenium/grid/sessionqueue/local/LocalNewSessionQueue.java Line 160 in 49214cd private void timeoutSessions() { 
- addToQueue mechanism
 https://github.com/SeleniumHQ/selenium/blob/49214cd40467c2926d1b450d86dde9363bd0acd3/java/src/org/openqa/selenium/grid/sessionqueue/local/LocalNewSessionQueue.java#L235C1-L240C8
 which removes session request from queue in any case
I've added some logs to grid and to capabilities to follow the session request flow
Looking at the logs, I can see 2 problems:
- timeoutSessions method does not seem to run regularly. There are some gaps between executions. As it's a standard JVM feature, this gaps may be due to the method taking more time to execute
- Probable source of problem: looking at the logs, we can see that following the "(17:02:39,917) addToQueue", a "(17:02:40,783) getNextAvailable" is called, bringing the session request to the LocalDistributor.
 Here, the session request is not in the queue anymore so it cannot be timed out
 But, (17:04:18,884) handleNewSessionRequest is done very late, I suppose because there is queueing in sessionCreatorExecutor
Do you confirm my analysis ?
A workaround would be to increase the CPU number / or thread pool of the executor, but this would only be a workaround
I'll try to imagine a correction, but don't hesitate to suggest ideas
Bertrand
How can we reproduce the issue?
See description aboveRelevant log output
############## HUB ############################
Logs are filtered for readability. I've kept only logs relative to a specific session request and the "timeoutSessions" logs
 INFO  2024-03-22 17:02:21,772 [Local New Session Queue] LocalNewSessionQueue: timeout sessions
...
INFO  2024-03-22 17:02:39,917 [pool-2-thread-64] LocalNewSessionQueue: addToQueue 7fe2a2fd-0d41-446e-ad3b-1c992677012d date: [Capabilities {acceptInsecureCerts: true, browserName: chrome, goog:chromeOptions: {args: [--disable-translate, --disable-web-security, --no-sandbox, --disable-site-isolation-tr..., --disable-features=IsolateO..., --remote-allow-origins=*], extensions: [], prefs: {profile.exit_type: Normal}}, pageLoadStrategy: normal, platformName: any, proxy: {proxyType: autodetect}, sr:beta: false, sr:creationRequestTime: 2024-03-22T17:02:39.8291502, sr:testName: loginInvalidPerf15, sr:try: 15}]
 INFO  2024-03-22 17:02:40,783 [Local Distributor - New Session Queue] LocalNewSessionQueue: getNextAvailable
 INFO  2024-03-22 17:02:40,783 [Local Distributor - New Session Queue] LocalNewSessionQueue: remove 7fe2a2fd-0d41-446e-ad3b-1c992677012d
 INFO  2024-03-22 17:02:40,783 [Local Distributor - New Session Queue] LocalNewSessionQueue: end remove 7fe2a2fd-0d41-446e-ad3b-1c992677012d
...
 INFO  2024-03-22 17:02:31,732 [Local New Session Queue] LocalNewSessionQueue: timeout sessions
 INFO  2024-03-22 17:02:41,735 [Local New Session Queue] LocalNewSessionQueue: timeout sessions
 INFO  2024-03-22 17:02:51,742 [Local New Session Queue] LocalNewSessionQueue: timeout sessions
 INFO  2024-03-22 17:03:01,750 [Local New Session Queue] LocalNewSessionQueue: timeout sessions
 INFO  2024-03-22 17:03:09,919 [pool-2-thread-64] LocalNewSessionQueue: end addToQueue 7fe2a2fd-0d41-446e-ad3b-1c992677012d
 INFO  2024-03-22 17:03:11,756 [Local New Session Queue] LocalNewSessionQueue: timeout sessions
 INFO  2024-03-22 17:03:21,772 [Local New Session Queue] LocalNewSessionQueue: timeout sessions
 INFO  2024-03-22 17:03:31,773 [Local New Session Queue] LocalNewSessionQueue: timeout sessions
...
... (at this point, client has received "New session request timed out" error, see below)
...
INFO  2024-03-22 17:04:18,884 [Local Distributor - Session Creation] LocalDistributor$NewSessionRunnable: handleNewSessionRequest 7fe2a2fd-0d41-446e-ad3b-1c992677012d
 INFO  2024-03-22 17:04:18,884 [Local Distributor - Session Creation] LocalDistributor: newSession 7fe2a2fd-0d41-446e-ad3b-1c992677012d
 INFO  2024-03-22 17:04:18,884 [Local Distributor - Session Creation] Sys$Out: 17:04:18.884 INFO [LocalDistributor.newSession_aroundBody0] - Session request received by the Distributor: 
 [Capabilities {acceptInsecureCerts: true, browserName: chrome, goog:chromeOptions: {args: [--disable-translate, --disable-web-security, --no-sandbox, --disable-site-isolation-tr..., --disable-features=IsolateO..., --remote-allow-origins=*], extensions: [], prefs: {profile.exit_type: Normal}}, pageLoadStrategy: normal, platformName: any, proxy: {proxyType: autodetect}, sr:beta: false, sr:creationRequestTime: 2024-03-22T17:02:39.8291502, sr:testName: loginInvalidPerf15, sr:try: 15}]
...
INFO  2024-03-22 17:04:22,887 [Local Distributor - Session Creation] Sys$Out: 17:04:22.887 INFO [LocalDistributor.newSession_aroundBody0] - Session created by the Distributor. Id: 6bb90d1c0268b036743cc04aea1680a4 
 Caps: Capabilities {acceptInsecureCerts: true, browserName: chrome, browserVersion: 120.0.6099.130, chrome: {chromedriverVersion: 120.0.6099.109 (3419140ab66..., userDataDir: C:\Users\ZSJP_C~1\AppData\L...}, fedcm:accounts: true, goog:chromeOptions: {debuggerAddress: localhost:53256}, networkConnectionEnabled: false, pageLoadStrategy: normal, platformName: any, proxy: {proxyType: autodetect}, se:bidiEnabled: false, se:cdp: ws://selenium-grid-hub2.com..., se:cdpVersion: 120.0.6099.130, setWindowRect: true, sr:beta: false, sr:creationRequestTime: 2024-03-22T17:02:39.8291502, sr:testName: loginInvalidPerf15, sr:try: 15, strictFileInteractability: false, timeouts: {implicit: 0, pageLoad: 300000, script: 30000}, unhandledPromptBehavior: dismiss and notify, webauthn:extension:credBlob: true, webauthn:extension:largeBlob: true, webauthn:extension:minPinLength: true, webauthn:extension:prf: true, webauthn:virtualAuthenticators: true}
 INFO  2024-03-22 17:04:23,069 [Event Bus Listener Notifier] Sys$Out: 17:04:23.068 INFO [GridModel.release] - Releasing slot for session id 6bb90d1c0268b036743cc04aea1680a4
 INFO  2024-03-22 17:04:23,069 [Event Bus Listener Notifier] Sys$Out: 17:04:23.068 INFO [LocalSessionMap.lambda$new$0] - Deleted session from local Session Map, Id: 6bb90d1c0268b036743cc04aea1680a4
############## Client #########################
WARN  2024-03-22 17:03:09,931 [TestNG-methods-15] SeleniumGridDriverFactory: Error creating driver on hub http://selenium-grid-hub.company:4444/wd/hub: Could not start a new session. Response code 500. Message: Could not start a new session. New session request timed out 
Host info: host: 'xxx', ip: 'a.b.c.d'
Build info: version: '4.16.1', revision: '9b4c83354e'
System info: os.name: 'Linux', os.arch: 'amd64', os.version: '3.10.0-1160.105.1.el7.x86_64', java.version: '21.0.1'
Driver info: driver.version: unknown
Build info: version: '4.11.0', revision: '040bc5406b'
System info: os.name: 'Windows 10', os.arch: 'amd64', os.version: '10.0', java.version: '17.0.6'
Driver info: org.openqa.selenium.remote.RemoteWebDriver
Command: [null, newSession {capabilities=[Capabilities {acceptInsecureCerts: true, browserName: chrome, goog:chromeOptions: {args: [--disable-translate, --disable-web-security, --no-sandbox, --disable-site-isolation-tr..., --disable-features=IsolateO..., --remote-allow-origins=*], extensions: [], prefs: {profile.exit_type: Normal}}, pageLoadStrategy: normal, platformName: any, proxy: Proxy(autodetect), sr:beta: false, sr:creationRequestTime: 2024-03-22T17:02:39.8291502, sr:testName: loginInvalidPerf15, sr:try: 15}]}]Operating System
Linux
Selenium version
4.11.0
What are the browser(s) and version(s) where you see this issue?
Chrome / not related to browser
What are the browser driver(s) and version(s) where you see this issue?
not related to browser
Are you using Selenium Grid?
4.16.1