-
-
Notifications
You must be signed in to change notification settings - Fork 2.6k
Closed
Labels
I-autoscaling-k8sIssue relates to autoscaling in Kubernetes, or the scaler in KEDAIssue relates to autoscaling in Kubernetes, or the scaler in KEDA
Milestone
Description
What happened?
We have an consistent behavior where Chrome nodes get stuck on Terminating state.
I'm not sure I can provide the exact steps to reproduce but I'm happy to share logs from a system where this is happening.
Command used to start Selenium Grid with Docker (or Kubernetes)
global:
seleniumGrid:
imageRegistry: {{ fvt_image_registry }}/selenium
imagePullSecret: xxx
hub:
imageTag: 4.18.1-20240224
chromeNode:
imageTag: 122.0-20240224
resources:
requests:
cpu: "0.1"
firefoxNode:
enabled: false
edgeNode:
enabled: false
autoscaling:
enabled: true
scalingType: job
scaledOptions:
maxReplicaCount: 999
scaledJobOptions:
scalingStrategy:
strategy: default
ingress:
hostname: selenium-grid.local
path: /selenium
Relevant log output
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
keda-operator-d44bc8ffc-f7rzk 1/1 Running 0 161m 172.30.107.176 10.74.145.3 <none> <none>
keda-operator-metrics-apiserver-b994566dc-8b59f 1/1 Running 1 (14h ago) 14h 172.30.180.80 10.74.145.8 <none> <none>
selenium-grid-selenium-chrome-node-25f9x-5b6dz 1/1 Terminating 0 3h24m 172.30.93.77 10.48.76.225 <none> <none>
selenium-grid-selenium-chrome-node-5cx4x-j9clf 1/1 Terminating 0 174m 172.30.180.158 10.74.145.8 <none> <none>
selenium-grid-selenium-chrome-node-5hpg8-m2xdz 1/1 Running 0 160m 172.30.202.92 10.48.76.223 <none> <none>
selenium-grid-selenium-chrome-node-7mt5f-gwgm6 1/1 Running 0 160m 172.30.139.72 10.74.145.20 <none> <none>
selenium-grid-selenium-chrome-node-cmggr-vvl2l 1/1 Running 0 160m 172.30.180.178 10.74.145.8 <none> <none>
selenium-grid-selenium-chrome-node-fmq9j-qlvmj 1/1 Terminating 0 174m 172.30.107.131 10.74.145.3 <none> <none>
selenium-grid-selenium-chrome-node-fxgnj-hb8qf 1/1 Terminating 0 174m 172.30.93.97 10.48.76.225 <none> <none>
selenium-grid-selenium-chrome-node-gsrsp-h9tzz 1/1 Terminating 0 3h16m 172.30.93.83 10.48.76.225 <none> <none>
selenium-grid-selenium-chrome-node-xd72s-vws74 1/1 Terminating 0 3h24m 172.30.202.89 10.48.76.223 <none> <none>
selenium-grid-selenium-chrome-node-xm8h6-69wl8 1/1 Terminating 0 3h24m 172.30.139.32 10.74.145.20 <none> <none>
selenium-grid-selenium-chrome-node-xqt4h-hkt67 1/1 Terminating 0 3h16m 172.30.139.102 10.74.145.20 <none> <none>
selenium-grid-selenium-hub-5f49c8fc47-vzmt6 1/1 Running 0 14h 172.30.202.81 10.48.76.223 <none> <none>
Chrome node log
2024-03-11 18:58:48,902 INFO Included extra file "/etc/supervisor/conf.d/selenium.conf" during parsing
2024-03-11 18:58:48,905 INFO RPC interface 'supervisor' initialized
2024-03-11 18:58:48,905 CRIT Server 'unix_http_server' running without any HTTP authentication checking
2024-03-11 18:58:48,906 INFO supervisord started with pid 8
2024-03-11 18:58:49,909 INFO spawned: 'xvfb' with pid 9
2024-03-11 18:58:49,912 INFO spawned: 'vnc' with pid 10
2024-03-11 18:58:49,915 INFO spawned: 'novnc' with pid 11
2024-03-11 18:58:49,917 INFO spawned: 'selenium-node' with pid 12
2024-03-11 18:58:49,938 INFO success: selenium-node entered RUNNING state, process has stayed up for > than 0 seconds (startsecs)
E: [pulseaudio] main.c: Daemon startup failed.
Home directory not accessible: Permission denied
No PulseAudio daemon running, or not running as session daemon.
Home directory not accessible: Permission denied
No PulseAudio daemon running, or not running as session daemon.
Home directory not accessible: Permission denied
No PulseAudio daemon running, or not running as session daemon.
Appending Selenium options: --session-timeout 300
Appending Selenium options: --register-period 60
Appending Selenium options: --register-cycle 5
Appending Selenium options: --heartbeat-period 30
Appending Selenium options: --log-level INFO
Generating Selenium Config
Setting up SE_NODE_HOST...
Tracing is disabled
Selenium Grid Node configuration:
[events]
publish = "tcp://selenium-grid-selenium-hub.selenium:4442"
subscribe = "tcp://selenium-grid-selenium-hub.selenium:4443"
[server]
port = "5555"
[node]
grid-url = "http://admin:[email protected]:4444"
session-timeout = "300"
override-max-sessions = false
detect-drivers = false
drain-after-session-count = 1
max-sessions = 1
[[node.driver-configuration]]
display-name = "chrome"
stereotype = '{"browserName": "chrome", "browserVersion": "122.0", "platformName": "Linux", "goog:chromeOptions": {"binary": "/usr/bin/google-chrome"}}'
max-sessions = 1
Starting Selenium Grid Node...
2024-03-11 18:58:51,777 INFO success: xvfb entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2024-03-11 18:58:51,777 INFO success: vnc entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2024-03-11 18:58:51,777 INFO success: novnc entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
18:58:51.977 INFO [LoggingOptions.configureLogEncoding] - Using the system default encoding
18:58:51.985 INFO [OpenTelemetryTracer.createTracer] - Using OpenTelemetry for tracing
18:58:52.554 INFO [UnboundZmqEventBus.<init>] - Connecting to tcp://selenium-grid-selenium-hub.selenium:4442 and tcp://selenium-grid-selenium-hub.selenium:4443
18:58:52.767 INFO [UnboundZmqEventBus.<init>] - Sockets created
18:58:53.781 INFO [UnboundZmqEventBus.<init>] - Event bus ready
18:58:54.166 INFO [NodeServer.createHandlers] - Reporting self as: http://172.30.93.77:5555
18:58:54.252 INFO [NodeOptions.getSessionFactories] - Detected 1 available processors
18:58:54.763 INFO [NodeOptions.report] - Adding chrome for {"browserName": "chrome","browserVersion": "122.0","goog:chromeOptions": {"binary": "\u002fusr\u002fbin\u002fgoogle-chrome"},"platformName": "linux","se:noVncPort": 7900,"se:vncEnabled": true} 1 times
2024-03-11T18:58:54UTC [Probe.Startup] - Wait for the Node to report its status
18:58:54.884 INFO [Node.<init>] - Binding additional locator mechanisms: relative
18:58:55.373 INFO [NodeServer$1.start] - Starting registration process for Node http://172.30.93.77:5555
18:58:55.375 INFO [NodeServer.execute] - Started Selenium node 4.18.1 (revision b1d3319b48): http://172.30.93.77:5555
18:58:55.464 INFO [NodeServer$1.lambda$start$1] - Sending registration event...
18:58:55.984 INFO [NodeServer.lambda$createHandlers$2] - Node has been added
18:58:57.964 INFO [LocalNode.checkSessionCount] - Draining Node, configured sessions value (1) has been reached.
18:58:57.972 INFO [LocalNode.newSession] - Session created by the Node. Id: b03b291c5d5108416cf0ac1327aeeda8, Caps: Capabilities {acceptInsecureCerts: true, browserName: chrome-headless-shell, browserVersion: 122.0.6261.69, chrome: {chromedriverVersion: 122.0.6261.69 (81bc525b6a36..., userDataDir: /tmp/.org.chromium.Chromium...}, fedcm:accounts: true, goog:chromeOptions: {debuggerAddress: localhost:43491}, networkConnectionEnabled: false, pageLoadStrategy: normal, platformName: linux, proxy: Proxy(), se:bidiEnabled: false, se:cdp: ws://admin:admin@selenium-g..., se:cdpVersion: 122.0.6261.69, se:vnc: ws://admin:admin@selenium-g..., se:vncEnabled: true, se:vncLocalAddress: ws://172.30.93.77:7900, setWindowRect: true, strictFileInteractability: false, timeouts: {implicit: 0, pageLoad: 300000, script: 30000}, unhandledPromptBehavior: dismiss and notify, webauthn:extension:credBlob: true, webauthn:extension:largeBlob: true, webauthn:extension:minPinLength: true, webauthn:extension:prf: true, webauthn:virtualAuthenticators: true}
2024-03-11T18:58:58UTC [Probe.Startup] - Node responds the ID: 81a24be5-1a59-402b-9407-62e5be087a72 with status: UP
2024-03-11T18:58:58UTC [Probe.Startup] - Grid responds a matched Node ID: 81a24be5-1a59-402b-9407-62e5be087a72
2024-03-11T18:58:58UTC [Probe.Startup] - Node ID: 81a24be5-1a59-402b-9407-62e5be087a72 is found in the Grid. Node is ready.
19:03:34.310 INFO [SessionSlot.stop] - Stopping session b03b291c5d5108416cf0ac1327aeeda8
19:03:34.350 INFO [LocalNode.stopTimedOutSession] - Node draining complete!
19:03:35.357 INFO [NodeServer.lambda$createHandlers$3] - Shutting down
2024-03-11 19:03:35,722 INFO exited: selenium-node (exit status 0; expected)
2024-03-11 19:03:35,722 WARN received SIGINT indicating exit request
2024-03-11 19:03:35,723 INFO waiting for xvfb, vnc, novnc to die
2024-03-11 19:03:37,727 INFO stopped: novnc (terminated by SIGTERM)
2024-03-11 19:03:38,730 INFO stopped: vnc (terminated by SIGTERM)
2024-03-11 19:03:38,731 INFO waiting for xvfb to die
2024-03-11 19:03:39,732 INFO stopped: xvfb (terminated by SIGTERM)
Chrome node yml:
Name: selenium-grid-selenium-chrome-node-25f9x-5b6dz
Namespace: selenium
Priority: 0
Service Account: selenium-grid-selenium-serviceaccount
Node: 10.48.76.225/10.48.76.225
Start Time: Mon, 11 Mar 2024 18:58:22 +0000
Labels: app=selenium-grid-selenium-chrome-node
app.kubernetes.io/component=selenium-grid-4.18.1-20240224
app.kubernetes.io/instance=selenium-grid
app.kubernetes.io/managed-by=helm
app.kubernetes.io/name=selenium-grid-selenium-chrome-node
app.kubernetes.io/version=4.18.1-20240224
controller-uid=449ef28c-fc3f-4da4-8b3f-31469fb86d9d
helm.sh/chart=selenium-grid-0.28.4
job-name=selenium-grid-selenium-chrome-node-25f9x
scaledjob.keda.sh/name=selenium-grid-selenium-chrome-node
Annotations: checksum/event-bus-configmap: 2698802d0bbf358d1634b47dff1ef36c5fc2501a27a9d2eef02c7874eb9496f8
checksum/logging-configmap: 7f721b250f90c8a5877dc9217b97f0a14392b420edf0e5af105a60944d2b9dc3
checksum/node-configmap: 3b6c0fffa6e6a10d57e5455ce21e1e7ee55e0638f15ff521b8c96fe8c10d8e91
checksum/server-configmap: ac6520a86bfffa04b4946bbce02ac8f1be341d800f4ab09f4e7cf274f74d3770
cni.projectcalico.org/containerID: a880de0585776e16db52c8eb9c290406e1b69dfd510cd15fc0c421cd6a9ed1dc
cni.projectcalico.org/podIP:
cni.projectcalico.org/podIPs:
k8s.v1.cni.cncf.io/network-status:
[{
"name": "k8s-pod-network",
"ips": [
"172.30.93.77"
],
"default": true,
"dns": {}
}]
k8s.v1.cni.cncf.io/networks-status:
[{
"name": "k8s-pod-network",
"ips": [
"172.30.93.77"
],
"default": true,
"dns": {}
}]
openshift.io/scc: restricted-v2
seccomp.security.alpha.kubernetes.io/pod: runtime/default
Status: Terminating (lasts 3h15m)
Termination Grace Period: 30s
SeccompProfile: RuntimeDefault
IP: 172.30.93.77
IPs:
IP: 172.30.93.77
Controlled By: Job/selenium-grid-selenium-chrome-node-25f9x
Containers:
selenium-grid-selenium-chrome-node:
Container ID: cri-o://f177006210af63a402465588e62b42ca96387427102f59824d4a6b29b197ab21
Image: docker-na-private.artifactory.swg-devops.com/wiotp-docker-local/selenium/node-chrome:122.0-20240224
Image ID: docker-na-private.artifactory.swg-devops.com/wiotp-docker-local/selenium/node-chrome@sha256:3b50643ff9885215c9142fefecace7b17efdde64a8c863ef702bd4a6c3e6a378
Port: 5555/TCP
Host Port: 0/TCP
State: Running
Started: Mon, 11 Mar 2024 18:58:48 +0000
Ready: True
Restart Count: 0
Limits:
cpu: 1
memory: 1Gi
Requests:
cpu: 100m
memory: 1Gi
Startup: exec [bash -c /opt/selenium/nodeProbe.sh Startup >> /proc/1/fd/1] delay=0s timeout=60s period=5s #success=1 #failure=12
Environment Variables from:
selenium-grid-selenium-event-bus ConfigMap Optional: false
selenium-grid-selenium-node-config ConfigMap Optional: false
selenium-grid-selenium-logging-config ConfigMap Optional: false
selenium-grid-selenium-server-config ConfigMap Optional: false
selenium-grid-selenium-secrets Secret Optional: false
Environment:
SE_OTEL_SERVICE_NAME: selenium-grid-selenium-chrome-node
SE_NODE_PORT: 5555
SE_NODE_REGISTER_PERIOD: 60
SE_NODE_REGISTER_CYCLE: 5
Mounts:
/dev/shm from dshm (rw)
/opt/selenium/nodePreStop.sh from selenium-grid-selenium-node-config (rw,path="nodePreStop.sh")
/opt/selenium/nodeProbe.sh from selenium-grid-selenium-node-config (rw,path="nodeProbe.sh")
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-n6b64 (ro)
Conditions:
Type Status
Initialized True
Ready True
ContainersReady True
PodScheduled True
Volumes:
selenium-grid-selenium-node-config:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: selenium-grid-selenium-node-config
Optional: false
dshm:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium: Memory
SizeLimit: 1Gi
kube-api-access-n6b64:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
ConfigMapName: openshift-service-ca.crt
ConfigMapOptional: <nil>
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: node.kubernetes.io/memory-pressure:NoSchedule op=Exists
node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedPreStopHook 165m kubelet Exec lifecycle hook ([bash -c /opt/selenium/nodePreStop.sh >> /proc/1/fd/1]) for Container "selenium-grid-selenium-chrome-node" in Pod "selenium-grid-selenium-chrome-node-25f9x-5b6dz_selenium(8ff39bf5-08ec-4467-b609-0db98b02ff8c)" failed - error: rpc error: code = Unknown desc = command error: time="2024-03-11T14:37:35-05:00" level=fatal msg="nsexec-1[153181]: failed to open /proc/187153/ns/ipc: No such file or directory"
time="2024-03-11T14:37:35-05:00" level=fatal msg="nsexec-0[153169]: failed to sync with stage-1: next state: Invalid argument"
time="2024-03-11T14:37:35-05:00" level=error msg="exec failed: unable to start container process: error executing setns process: exit status 1"
, stdout: , stderr: , exit code -1, message: ""
Warning FailedPreStopHook 135m kubelet Exec lifecycle hook ([bash -c /opt/selenium/nodePreStop.sh >> /proc/1/fd/1]) for Container "selenium-grid-selenium-chrome-node" in Pod "selenium-grid-selenium-chrome-node-25f9x-5b6dz_selenium(8ff39bf5-08ec-4467-b609-0db98b02ff8c)" failed - error: rpc error: code = Unknown desc = command error: time="2024-03-11T15:08:06-05:00" level=fatal msg="nsexec-1[358906]: failed to open /proc/187153/ns/ipc: No such file or directory"
time="2024-03-11T15:08:06-05:00" level=fatal msg="nsexec-0[358893]: failed to sync with stage-1: next state: Invalid argument"
time="2024-03-11T15:08:06-05:00" level=error msg="exec failed: unable to start container process: error executing setns process: exit status 1"
, stdout: , stderr: , exit code -1, message: ""
Warning FailedPreStopHook 104m kubelet Exec lifecycle hook ([bash -c /opt/selenium/nodePreStop.sh >> /proc/1/fd/1]) for Container "selenium-grid-selenium-chrome-node" in Pod "selenium-grid-selenium-chrome-node-25f9x-5b6dz_selenium(8ff39bf5-08ec-4467-b609-0db98b02ff8c)" failed - error: rpc error: code = Unknown desc = command error: time="2024-03-11T15:38:36-05:00" level=fatal msg="nsexec-1[149417]: failed to open /proc/187153/ns/ipc: No such file or directory"
time="2024-03-11T15:38:36-05:00" level=fatal msg="nsexec-0[149388]: failed to sync with stage-1: next state: Invalid argument"
time="2024-03-11T15:38:36-05:00" level=error msg="exec failed: unable to start container process: error executing setns process: exit status 1"
, stdout: , stderr: , exit code -1, message: ""
Warning FailedPreStopHook 74m kubelet Exec lifecycle hook ([bash -c /opt/selenium/nodePreStop.sh >> /proc/1/fd/1]) for Container "selenium-grid-selenium-chrome-node" in Pod "selenium-grid-selenium-chrome-node-25f9x-5b6dz_selenium(8ff39bf5-08ec-4467-b609-0db98b02ff8c)" failed - error: rpc error: code = Unknown desc = command error: time="2024-03-11T16:09:06-05:00" level=fatal msg="nsexec-1[454447]: failed to open /proc/187153/ns/ipc: No such file or directory"
time="2024-03-11T16:09:06-05:00" level=fatal msg="nsexec-0[454427]: failed to sync with stage-1: next state: Invalid argument"
time="2024-03-11T16:09:06-05:00" level=error msg="exec failed: unable to start container process: error executing setns process: exit status 1"
, stdout: , stderr: , exit code -1, message: ""
Warning FailedPreStopHook 43m kubelet Exec lifecycle hook ([bash -c /opt/selenium/nodePreStop.sh >> /proc/1/fd/1]) for Container "selenium-grid-selenium-chrome-node" in Pod "selenium-grid-selenium-chrome-node-25f9x-5b6dz_selenium(8ff39bf5-08ec-4467-b609-0db98b02ff8c)" failed - error: rpc error: code = Unknown desc = command error: time="2024-03-11T16:39:37-05:00" level=fatal msg="nsexec-1[259528]: failed to open /proc/187153/ns/ipc: No such file or directory"
time="2024-03-11T16:39:37-05:00" level=fatal msg="nsexec-0[259505]: failed to sync with stage-1: next state: Invalid argument"
time="2024-03-11T16:39:37-05:00" level=error msg="exec failed: unable to start container process: error executing setns process: exit status 1"
, stdout: , stderr: , exit code -1, message: ""
Normal Killing 13m (x7 over 3h16m) kubelet Stopping container selenium-grid-selenium-chrome-node
Warning FailedKillPod 13m (x6 over 165m) kubelet error killing pod: [failed to "KillContainer" for "selenium-grid-selenium-chrome-node" with KillContainerError: "rpc error: code = DeadlineExceeded desc = context deadline exceeded", failed to "KillPodSandbox" for "8ff39bf5-08ec-4467-b609-0db98b02ff8c" with KillPodSandboxError: "rpc error: code = DeadlineExceeded desc = context deadline exceeded"]
Warning FailedPreStopHook 13m kubelet Exec lifecycle hook ([bash -c /opt/selenium/nodePreStop.sh >> /proc/1/fd/1]) for Container "selenium-grid-selenium-chrome-node" in Pod "selenium-grid-selenium-chrome-node-25f9x-5b6dz_selenium(8ff39bf5-08ec-4467-b609-0db98b02ff8c)" failed - error: rpc error: code = Unknown desc = command error: time="2024-03-11T17:10:07-05:00" level=fatal msg="nsexec-1[51553]: failed to open /proc/187153/ns/ipc: No such file or directory"
time="2024-03-11T17:10:07-05:00" level=fatal msg="nsexec-0[51532]: failed to sync with stage-1: next state: Invalid argument"
time="2024-03-11T17:10:07-05:00" level=error msg="exec failed: unable to start container process: error executing setns process: exit status 1"
, stdout: , stderr: , exit code -1, message: ""
Operating System
Openshift 4.12.49
Docker Selenium version (image tag)
4.18.1
Selenium Grid chart version (chart version)
0.28.4
Metadata
Metadata
Assignees
Labels
I-autoscaling-k8sIssue relates to autoscaling in Kubernetes, or the scaler in KEDAIssue relates to autoscaling in Kubernetes, or the scaler in KEDA