Skip to content

[🚀 Feature]: Provide an Option to Disable purgeDeadNodesService in Selenium 4 Standalone Mode #15168

@RutvikChandla

Description

@RutvikChandla

Feature and motivation

Overview
In Selenium 4 Standalone mode, there is a scheduled service (purgeDeadNodesService) that runs every 30 seconds to remove “dead” nodes. This functionality makes sense when operating in a Hub+Node or fully distributed setup, where multiple nodes might come and go. However, in Standalone mode there is effectively only one node, and thus the service can prematurely mark that single node as “dead,” causing brief outages or session failures.

Some environments (for example, those with unexpected clock adjustments or ephemeral deployments) may encounter time discrepancies that can make the node’s health check timestamps appear stale. Since the node is the only Selenium instance in Standalone mode, losing it even briefly causes user-facing failures. An option to disable or configure the purge service would prevent these unnecessary outages.


Steps to Reproduce

  1. Run Selenium 4 in Standalone mode in an environment where system time may shift unexpectedly (e.g., ephemeral VMs, containers that quickly rebuild, or other situations causing time drifts).
  2. Observe that if the time drift is significant, purgeDeadNodesService considers the node’s last health check as outdated.
  3. The node is marked as “dead,” leading to “session creation failure” or “node not available” errors until the next heartbeat or restart.

Expected Behavior

  • In Standalone mode, the node should not be marked as dead if there is only one node/instance running.
  • Users would like a straightforward way to disable or configure the purge service to avoid accidental downtime in time-shifted or short-lived environments.

Actual Behavior

  • purgeDeadNodesService runs on a fixed schedule (every 30 seconds) and can mark the node as dead due to stale timestamps in Standalone mode.
  • This results in transient failures for tests until the node’s heartbeat is refreshed.

Proposed Solution

  • Add a Command-Line Flag or Configuration: e.g. --disable-purge-dead-nodes or a similar parameter that deactivates this service in Standalone mode.
  • Make Purge Frequency Configurable: e.g. --purge-dead-nodes-interval=0 (disables), or allow the interval to be increased in Standalone mode to reduce false positives.
  • Stand-Alone Specific Behavior: Automatically disable or significantly delay purge checks when running in Standalone mode, since there is only a single node.

Environment

  • Selenium version: 4.x (issue observed in multiple 4.x releases)
  • Java version: 17 (but likely applies to other versions)
  • OS: Any environment subject to ephemeral or time-drift scenarios

Additional Context

  • This issue is especially relevant in CI/CD, cloud-based ephemeral environments, or any system where time-based actions might cause false health check results.
  • Disabling or adjusting the purge service in Standalone mode would reduce random downtime caused by inaccurate timestamps.

Thank you for considering this feature request. Disabling or making the purge service optional in Standalone mode would improve stability for those running Selenium in simple, single-instance setups. If any additional details or logs would be helpful, please let me know and I will provide them.

Usage example

Usage Example
Imagine you have a small team running automated tests in a simple environment—just Selenium Standalone mode on a single VM or container. If the clock on that environment shifts unexpectedly (e.g., due to ephemeral nature of the setup), Selenium’s purgeDeadNodesService may mark the node as “dead” when it sees an outdated last heartbeat. That causes all tests to fail until the next heartbeat arrives or the node is restarted.

By having a flag or configuration option to disable or adjust the purge interval, you eliminate these spurious outages. This benefits not only teams running very basic, single-machine Selenium setups but also larger organizations that create rapid, short-lived test instances. Everyone gains increased stability and faster feedback from their tests without worrying that the node might be killed due to a brief time synchronization issue.

Metadata

Metadata

Assignees

Labels

B-gridEverything grid and server relatedI-enhancementSomething could be betterR-help wantedIssues looking for contributions

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions