-
-
Notifications
You must be signed in to change notification settings - Fork 2.6k
Closed as not planned
Labels
Milestone
Description
What happened?
We have Selenium Grid with Keda enabled on an AKS cluster which is installed using terraform.
The problem we have detected is that if the number of queued requests exceeds the maximum that keda can scale to, it stops detecting if there are things queued but the pod hub does detect it.
Command used to start Selenium Grid with Docker (or Kubernetes)
# -- Repository: https://github.com/SeleniumHQ/docker-selenium/blob/trunk/charts/selenium-grid/values.yaml
hub:
nameOverride: selenium-router
imagePullPolicy: IfNotPresent
nodeSelector:
agentpool: npappspot
tolerations:
- key: app
operator: Equal
value: app
effect: NoSchedule
- key: kubernetes.azure.com/scalesetpriority
operator: Equal
value: spot
effect: NoSchedule
resources:
requests:
cpu: ${hub_cpu_requests}
memory: ${hub_memory_requests}
limits:
cpu: ${hub_cpu_limits}
memory: ${hub_memory_limits}
extraEnvironmentVariables:
- name: SE_SESSION_REQUEST_TIMEOUT
value: ${SE_SESSION_REQUEST_TIMEOUT}
autoscaling:
enabled: true
scaleOptions:
minReplicaCount: 0
maxReplicaCount: 40
tls:
create: false
serviceAccount:
create: true
ingress:
enabled: true
chromeNode:
enabled: true
replicas: 0
resources:
requests:
cpu: ${chrome_node_cpu_requests}
memory: ${chrome_node_memory_requests}
limits:
cpu: ${chrome_node_cpu_limits}
memory: ${chrome_node_memory_limits}
nodeSelector:
agentpool: npappspot
tolerations:
- key: app
operator: Equal
value: app
effect: NoSchedule
- key: kubernetes.azure.com/scalesetpriority
operator: Equal
value: spot
effect: NoSchedule
extraEnvironmentVariables:
- name: SE_NODE_SESSION_TIMEOUT
value: ${SE_NODE_SESSION_TIMEOUT}
firefoxNode:
enabled: true
replicas: 0
resources:
requests:
cpu: ${firefox_node_cpu_requests}
memory: ${firefox_node_memory_requests}
limits:
cpu: ${firefox_node_cpu_limits}
memory: ${firefox_node_memory_limits}
nodeSelector:
agentpool: npappspot
tolerations:
- key: app
operator: Equal
value: app
effect: NoSchedule
- key: kubernetes.azure.com/scalesetpriority
operator: Equal
value: spot
effect: NoSchedule
extraEnvironmentVariables:
- name: SE_NODE_SESSION_TIMEOUT
value: ${SE_NODE_SESSION_TIMEOUT}
edgeNode:
enabled: true
replicas: 0
resources:
requests:
cpu: ${edge_node_cpu_requests}
memory: ${edge_node_memory_requests}
limits:
cpu: ${edge_node_cpu_limits}
memory: ${edge_node_memory_limits}
nodeSelector:
agentpool: npappspot
tolerations:
- key: app
operator: Equal
value: app
effect: NoSchedule
- key: kubernetes.azure.com/scalesetpriority
operator: Equal
value: spot
effect: NoSchedule
extraEnvironmentVariables:
- name: SE_NODE_SESSION_TIMEOUT
value: ${SE_NODE_SESSION_TIMEOUT}
Relevant log output
No errors appear in the log. Keda indicates that there are 10 glued jobs, but in selenium there are 20 and when Selenium finish with those 10 (that appears on keda) it does not lift any more pods from the browser.
INFO scaleexecutor Remove a job by reaching the historyLimit {"scaledJob.Name": "selenium-grid-selenium-chrome-node", "scaledJob.Namespace": "selenium", "job.Name": "selenium-grid-selenium-chrome-node-vrjcm", "historyLimit": 0}
2024-10-23T11:40:39Z INFO scaleexecutor Scaling Jobs {"scaledJob.Name": "selenium-grid-selenium-chrome-node", "scaledJob.Namespace": "selenium", "Number of running Jobs": 40}
2024-10-23T11:40:39Z INFO scaleexecutor Scaling Jobs {"scaledJob.Name": "selenium-grid-selenium-chrome-node", "scaledJob.Namespace": "selenium", "Number of pending Jobs": 10}
Operating System
Aks Kubernetes version 1.28.9
Docker Selenium version (image tag)
4.25.0-20240922
Selenium Grid chart version (chart version)
0.36.1