Skip to content

Can't register Selenium Grid node in distributed scheme over docker overlay network, running on remote swarm-node. Event failed after timeout, will not attempt to register again #15904

@dubhokku

Description

@dubhokku

Description

Hello,
I'm trying to make distributed Selenium Grid scheme with some few virtual machine on VMware Esxi version 6.7.0 Update 3 (Build 15160138) hypervisor.

I'm using two Ubuntu 24.04.1 LTS as vm OS with Docker version 27.5.1, build 9f9e405
and one CentOS Stream release 9 as vm OS with Docker version 27.5.1, build 9f9e405

docker swarm init
docker swarm join
docker network create -d overlay se_net --attachable --gateway 10.2.29.250 --subnet 10.2.29.0/24

Also i'm using selenium-server-4.32.0.jar which run with this Dockerfile.

I'm running that as show Selenium documentation with containers in docker swarm overlay network:

https://www.selenium.dev/images/documentation/grid/components.png
https://www.selenium.dev/documentation/grid/getting_started/#distributed

and then I can see docker service really launched all instances of the 5-nodes docker image on the machines included in docker swarm node,

but in the router web interface you can only see instances of 5-nodes that were launched in docker locally on the machine where the 4-router and 3-distribution images are running, while the 5-nodes containers launched on remote swarm nodes still work.

Image

if the 5-nodes docker image to run on one of the remote swarm node machines using docker run -it, then you can see that the Selenium node component makes several attempts to "Sending registration event" and after a timeout reports "Registration event failed. Node will not attempt to register again".

Please, could you understand why the component Selenium node running on remote machines docker swarm node cannot register an event in the component ( docker image ) 3-distribution running on the machine docker swarm leader?

I checked the firewall and added the necessary rules for the ports on the CentOS Stream machine. On the Ubuntu 24 machine, the firewall does not run by default. And the docker firewall automatically adds allow rules for the ports specified when starting the container ( -p 4444:4444 ).

Below are the startup commands, dockerfile and added rules for the firewall and the log of the launch and registration of the selenium-node container.

Reproducible Code

Start Selenium Grid docker distributed network:

  # docker run --ip 10.2.29.101 -p 4442:4442 -p 4443:4443 -p 5557:5557 --net se_net -e SE_EVENT_BUS_HOST=10.2.29.101  -e SE_EVENT_BUS_PUBLISH_PORT=4442 -e SE_EVENT_BUS_SUBSCRIBE_PORT=4443 0-event-bus
  # docker run --ip 10.2.29.103 -p 5559:5559 --net se_net 1-new-session
  # docker run --ip 10.2.29.104 -p 5556:5556 --net se_net 2-map-session
  # docker run --ip 10.2.29.108 -p 5553:5553 --net se_net 3-distribution
  # docker run --ip 10.2.29.109 -p 4444:4444 --net se_net 4-router

  # docker service create   --name se-node  -p 5555:5555 --replicas=7 --network se_net -e SE_EVENT_BUS_HOST=10.2.29.101 -e SE_EVENT_BUS_PUBLISH_PORT=4442    -e SE_EVENT_BUS_SUBSCRIBE_PORT=4443 5-nodes

Dockerfiles for all Selenium Grid components:

 $ cat Dockerfile 
FROM ubuntu:24.04

RUN apt-get update && apt-get install -y iproute2 util-linux iputils-ping openjdk-11-jdk python3-pip
RUN pip install selenium --break-system-packages
RUN apt-get install -y python3-numpy python3-argparse-addons python3-urllib3 python3-random2 python3-pytest

RUN /bin/mkdir -p /opt/selenium
COPY selenium-server-4.32.0.jar /opt/selenium/selenium-server-4.32.0.jar
WORKDIR /opt/selenium

ENTRYPOINT ["/bin/java"]

...

# -- 0-event-bus
CMD ["-jar", "selenium-server-4.32.0.jar", "event-bus", "--publish-events", "tcp://10.2.29.101:4442", "--subscribe-events", "tcp://10.2.29.101:4443", "--port", "5557"]

# -- 1-new-session
CMD ["-jar", "selenium-server-4.32.0.jar", "sessionqueue","--port", "5559"]

# -- 2-map-session
CMD ["-jar", "selenium-server-4.32.0.jar", "sessions","--publish-events", "tcp://10.2.29.101:4442", "--subscribe-events", "tcp://10.2.29.101:4443", "--port", "5556"]

# -- 3-distribution
CMD ["-jar", "selenium-server-4.32.0.jar", "distributor", "--publish-events", "tcp://10.2.29.101:4442", "--subscribe-events", "tcp://10.2.29.101:4443","--sessions", "http://10.2.29.104:5556","--sessionqueue", "http://10.2.29.103:5559","--port", "5553","--bind-bus", "false"]

# -- 4-router
CMD ["-jar", "selenium-server-4.32.0.jar", "router", "--sessions", "http://10.2.29.104:5556","--distributor", "http://10.2.29.108:5553","--sessionqueue", "http://10.2.29.103:5559","--port", "4444"]

# -- 5-nodes
CMD ["-jar", "selenium-server-4.32.0.jar", "node", "--publish-events", "tcp://10.2.29.101:4442", "--subscribe-events", "tcp://10.2.29.101:4443"]'

Added firewall rules and a link on the topic of docker firewall:

Ubuntu 24.04 by default has no any running firewall

" Restrict external connections to containers
By default, all external source IPs are allowed to connect to ports that have been published to the Docker host's addresses. "
https://docs.docker.com/engine/network/packet-filtering-firewalls/

CentOS Stream release 9 was made additional rules:

swar-leader
firewall-cmd --permanent --zone=public --add-port=2377/tcp

event-bus
firewall-cmd --permanent --zone=public --add-port=4442/tcp
firewall-cmd --permanent --zone=public --add-port=4443/tcp
firewall-cmd --permanent --zone=public --add-port=5557/tcp

session-queue
firewall-cmd --permanent --zone=public --add-port=5559/tcp

session-map
firewall-cmd --permanent --zone=public --add-port=5556/tcp

distribution
firewall-cmd --permanent --zone=public --add-port=5553/tcp

router
firewall-cmd --permanent --zone=public --add-port=4444/tcp

nodes
firewall-cmd --permanent --zone=public --add-port=5555/tcp

sudo firewall-cmd --reload'

Debugging Logs

# docker node ls
ID                            HOSTNAME   STATUS    AVAILABILITY   MANAGER STATUS   ENGINE VERSION
tg3u5dyminayzzjx5krq6todq *   cthree     Ready     Active         Leader           27.5.1
t4mcup18rd437upot4e40kjwq     u24        Ready     Active                          27.5.1
na8w9qhx301j6f9313b1jflx3     utwo       Ready     Active                          27.5.1
# docker service ps se-node
ID             NAME        IMAGE            NODE      DESIRED STATE   CURRENT STATE            ERROR     PORTS
j76zxcqumefy   se-node.1   5_nodes:latest   u24       Running         Running 30 seconds ago             
grjhr4iboo2r   se-node.2   5_nodes:latest   cthree    Running         Running 31 seconds ago             
yxgt0m4kk67s   se-node.3   5_nodes:latest   utwo      Running         Running 30 seconds ago             
bf3tqbgfp12i   se-node.4   5_nodes:latest   utwo      Running         Running 30 seconds ago             
j3b4tsmxhqqs   se-node.5   5_nodes:latest   u24       Running         Running 30 seconds ago             
y65bcp03uked   se-node.6   5_nodes:latest   cthree    Running         Running 31 seconds ago             
gvcniu2levfz   se-node.7   5_nodes:latest   utwo      Running         Running 30 seconds ago             
# docker run -it -p 5555:5555 --net se_net  -e SE_EVENT_BUS_HOST=10.2.29.101 -e SE_EVENT_BUS_PUBLISH_PORT=4442 -e SE_EVENT_BUS_SUBSCRIBE_PORT=4443 5_nodes
13:39:23.705 INFO [LoggingOptions.configureLogEncoding] - Using the system default encoding
13:39:23.710 INFO [OpenTelemetryTracer.createTracer] - Using OpenTelemetry for tracing
13:39:23.821 INFO [UnboundZmqEventBus.<init>] - Connecting to tcp://10.2.29.101:4442 and tcp://10.2.29.101:4443
13:39:23.870 INFO [UnboundZmqEventBus.<init>] - Sockets created
13:39:24.873 INFO [UnboundZmqEventBus.<init>] - Event bus ready
13:39:25.017 INFO [NodeServer.createHandlers] - Reporting self as: http://10.2.29.63:5555
13:39:25.048 INFO [NodeOptions.getSessionFactories] - Detected 4 available processors
13:39:25.049 INFO [NodeOptions.discoverDrivers] - Looking for existing drivers on the PATH.
13:39:25.049 INFO [NodeOptions.discoverDrivers] - Add '--selenium-manager true' to the startup command to setup drivers automatically.
13:39:25.171 WARN [SeleniumManager.lambda$runCommand$1] - Unable to discover proper chromedriver version in offline mode
13:39:25.193 WARN [SeleniumManager.lambda$runCommand$1] - Unable to discover proper msedgedriver version in offline mode
13:39:25.213 WARN [SeleniumManager.lambda$runCommand$1] - Unable to discover proper geckodriver version in offline mode
13:39:25.252 INFO [NodeOptions.report] - Adding Chrome for {"browserName": "chrome","platformName": "linux"} 4 times
13:39:25.254 INFO [NodeOptions.report] - Adding Firefox for {"browserName": "firefox","platformName": "linux"} 4 times
13:39:25.255 INFO [NodeOptions.report] - Adding Edge for {"browserName": "MicrosoftEdge","platformName": "linux"} 4 times
13:39:25.351 INFO [Node.<init>] - Binding additional locator mechanisms: relative
13:39:25.540 INFO [NodeServer$2.start] - Starting registration process for Node http://10.2.29.63:5555
13:39:25.542 INFO [NodeServer.execute] - Started Selenium node 4.32.0 (revision d17c8aa950): http://10.2.29.63:5555
13:39:25.566 INFO [NodeServer$2.lambda$start$2] - Sending registration event...
13:39:35.578 INFO [NodeServer$2.lambda$start$2] - Sending registration event...
13:39:45.591 INFO [NodeServer$2.lambda$start$2] - Sending registration event...
13:39:55.605 INFO [NodeServer$2.lambda$start$2] - Sending registration event...
13:40:05.614 INFO [NodeServer$2.lambda$start$2] - Sending registration event...
13:40:15.625 INFO [NodeServer$2.lambda$start$2] - Sending registration event...
13:40:25.633 INFO [NodeServer$2.lambda$start$2] - Sending registration event...
13:40:35.642 INFO [NodeServer$2.lambda$start$2] - Sending registration event...
13:40:45.649 INFO [NodeServer$2.lambda$start$2] - Sending registration event...
13:40:55.656 INFO [NodeServer$2.lambda$start$2] - Sending registration event...
13:41:05.664 INFO [NodeServer$2.lambda$start$2] - Sending registration event...
13:41:15.673 INFO [NodeServer$2.lambda$start$2] - Sending registration event...
13:41:25.551 INFO [NodeServer$2.lambda$start$2] - Sending registration event...
13:41:25.552 ERROR [NodeServer$2.lambda$start$1] - Registration event failed after period of 120 seconds. Node will not attempt to register again

Metadata

Metadata

Assignees

No one assigned

    Labels

    A-needs-triagingA Selenium member will evaluate this soon!B-gridEverything grid and server relatedC-pyPython BindingsI-defectSomething is not working as intendedI-regressionSomething was working but we "fixed" itOS-linux

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions