[🐛 Bug]: Distributor Requires Restart to Register New Nodes After Hours of Inactivity

### What happened?

After hours of inactivity, a new session that triggers a node scale-up in a new node pool is ignored until the distributor is restarted.

### Command used to start Selenium Grid with Docker (or Kubernetes)

Verified on DigitalOcean, which natively uses Cilium (this could be relevant).
Installation was done via Helm Chart with separated components, specifically a Chrome node assigned to a dedicated node pool selector (with autoscaling enabled).

```shell
helm install selenium-grid-release ./selenium-grid  --set chromeNode.nodeEnableManagedDownloads=true --set chromeNode.replicas=1 --set isolateComponents=true --set chromeNode.nodeSelector."doks\.digitalocean\.com/node-pool"=pool-green --set firefoxNode.enabled=false --set edgeNode.enabled=false
```

When scaling down the Chrome deployment, the distributor immediately drains the node.

```shell
kubectl scale deployment selenium-grid-release-selenium-node-chrome --replicas=0
```

After waiting a few hours (e.g., 8 hours), if you delete and recreate the aforementioned node pool, then scale up the Chrome deployment, a triggered node scale-up occurs. The node begins sending the registration request, but the distributor does not recognize it.

At the network level, connections from the pod appear fine when checked via bash. The issue is likely related to ZMQ cache becoming stale. Using the ZMQ heartbeat  might help (see https://github.com/zeromq/jeromq/issues/364).


### Relevant log output

The distributor log is no longer updated when the new Chrome pod is running. The Chrome pod simply times out after 120 seconds, as described above

### Operating System

DigitalOcean Kubernetes

### Docker Selenium version (image tag)

latest

### Selenium Grid chart version (chart version)

latest

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[🐛 Bug]: Distributor Requires Restart to Register New Nodes After Hours of Inactivity #2991

What happened?

Command used to start Selenium Grid with Docker (or Kubernetes)

Relevant log output

Operating System

Docker Selenium version (image tag)

Selenium Grid chart version (chart version)

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[🐛 Bug]: Distributor Requires Restart to Register New Nodes After Hours of Inactivity #2991

Description

What happened?

Command used to start Selenium Grid with Docker (or Kubernetes)

Relevant log output

Operating System

Docker Selenium version (image tag)

Selenium Grid chart version (chart version)

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions