-
-
Notifications
You must be signed in to change notification settings - Fork 8.6k
Description
Feature and motivation
Overview
In Selenium 4 Standalone mode, there is a scheduled service (purgeDeadNodesService) that runs every 30 seconds to remove “dead” nodes. This functionality makes sense when operating in a Hub+Node or fully distributed setup, where multiple nodes might come and go. However, in Standalone mode there is effectively only one node, and thus the service can prematurely mark that single node as “dead,” causing brief outages or session failures.
Some environments (for example, those with unexpected clock adjustments or ephemeral deployments) may encounter time discrepancies that can make the node’s health check timestamps appear stale. Since the node is the only Selenium instance in Standalone mode, losing it even briefly causes user-facing failures. An option to disable or configure the purge service would prevent these unnecessary outages.
Steps to Reproduce
- Run Selenium 4 in Standalone mode in an environment where system time may shift unexpectedly (e.g., ephemeral VMs, containers that quickly rebuild, or other situations causing time drifts).
- Observe that if the time drift is significant,
purgeDeadNodesServiceconsiders the node’s last health check as outdated. - The node is marked as “dead,” leading to “session creation failure” or “node not available” errors until the next heartbeat or restart.
Expected Behavior
- In Standalone mode, the node should not be marked as dead if there is only one node/instance running.
- Users would like a straightforward way to disable or configure the purge service to avoid accidental downtime in time-shifted or short-lived environments.
Actual Behavior
purgeDeadNodesServiceruns on a fixed schedule (every 30 seconds) and can mark the node as dead due to stale timestamps in Standalone mode.- This results in transient failures for tests until the node’s heartbeat is refreshed.
Proposed Solution
- Add a Command-Line Flag or Configuration: e.g.
--disable-purge-dead-nodesor a similar parameter that deactivates this service in Standalone mode. - Make Purge Frequency Configurable: e.g.
--purge-dead-nodes-interval=0(disables), or allow the interval to be increased in Standalone mode to reduce false positives. - Stand-Alone Specific Behavior: Automatically disable or significantly delay purge checks when running in Standalone mode, since there is only a single node.
Environment
- Selenium version: 4.x (issue observed in multiple 4.x releases)
- Java version: 17 (but likely applies to other versions)
- OS: Any environment subject to ephemeral or time-drift scenarios
Additional Context
- This issue is especially relevant in CI/CD, cloud-based ephemeral environments, or any system where time-based actions might cause false health check results.
- Disabling or adjusting the purge service in Standalone mode would reduce random downtime caused by inaccurate timestamps.
Thank you for considering this feature request. Disabling or making the purge service optional in Standalone mode would improve stability for those running Selenium in simple, single-instance setups. If any additional details or logs would be helpful, please let me know and I will provide them.
Usage example
Usage Example
Imagine you have a small team running automated tests in a simple environment—just Selenium Standalone mode on a single VM or container. If the clock on that environment shifts unexpectedly (e.g., due to ephemeral nature of the setup), Selenium’s purgeDeadNodesService may mark the node as “dead” when it sees an outdated last heartbeat. That causes all tests to fail until the next heartbeat arrives or the node is restarted.
By having a flag or configuration option to disable or adjust the purge interval, you eliminate these spurious outages. This benefits not only teams running very basic, single-machine Selenium setups but also larger organizations that create rapid, short-lived test instances. Everyone gains increased stability and faster feedback from their tests without worrying that the node might be killed due to a brief time synchronization issue.