Skip to content

[πŸ› Bug]: Distributor Requires Restart to Register New Nodes After Hours of InactivityΒ #2991

@giuliohome

Description

@giuliohome

What happened?

After hours of inactivity, a new session that triggers a node scale-up in a new node pool is ignored until the distributor is restarted.

Command used to start Selenium Grid with Docker (or Kubernetes)

Verified on DigitalOcean, which natively uses Cilium (this could be relevant).
Installation was done via Helm Chart with separated components, specifically a Chrome node assigned to a dedicated node pool selector (with autoscaling enabled).

helm install selenium-grid-release ./selenium-grid  --set chromeNode.nodeEnableManagedDownloads=true --set chromeNode.replicas=1 --set isolateComponents=true --set chromeNode.nodeSelector."doks\.digitalocean\.com/node-pool"=pool-green --set firefoxNode.enabled=false --set edgeNode.enabled=false

When scaling down the Chrome deployment, the distributor immediately drains the node.

kubectl scale deployment selenium-grid-release-selenium-node-chrome --replicas=0

After waiting a few hours (e.g., 8 hours), if you delete and recreate the aforementioned node pool, then scale up the Chrome deployment, a triggered node scale-up occurs. The node begins sending the registration request, but the distributor does not recognize it.

At the network level, connections from the pod appear fine when checked via bash. The issue is likely related to ZMQ cache becoming stale. Using the ZMQ heartbeat might help (see zeromq/jeromq#364).

Relevant log output

The distributor log is no longer updated when the new Chrome pod is running. The Chrome pod simply times out after 120 seconds, as described above

Operating System

DigitalOcean Kubernetes

Docker Selenium version (image tag)

latest

Selenium Grid chart version (chart version)

latest

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions