Skip to content

[πŸ› Bug]: Keda and Selenium Grid mismatch queuesΒ #2442

@jorgegb95

Description

@jorgegb95

What happened?

We have Selenium Grid with Keda enabled on an AKS cluster which is installed using terraform.
The problem we have detected is that if the number of queued requests exceeds the maximum that keda can scale to, it stops detecting if there are things queued but the pod hub does detect it.

Command used to start Selenium Grid with Docker (or Kubernetes)

# -- Repository: https://github.com/SeleniumHQ/docker-selenium/blob/trunk/charts/selenium-grid/values.yaml

hub:
  nameOverride: selenium-router
  imagePullPolicy: IfNotPresent
  nodeSelector:
    agentpool: npappspot
  tolerations:
    - key: app
      operator: Equal
      value: app
      effect: NoSchedule
    - key: kubernetes.azure.com/scalesetpriority
      operator: Equal
      value: spot
      effect: NoSchedule
  resources:
    requests:
      cpu: ${hub_cpu_requests}
      memory: ${hub_memory_requests}
    limits:
      cpu: ${hub_cpu_limits}
      memory: ${hub_memory_limits}
  extraEnvironmentVariables:
    - name: SE_SESSION_REQUEST_TIMEOUT
      value: ${SE_SESSION_REQUEST_TIMEOUT}

autoscaling:
  enabled: true
  scaleOptions:
    minReplicaCount: 0
    maxReplicaCount: 40

tls:
  create: false

serviceAccount:
  create: true

ingress:
  enabled: true

chromeNode:
  enabled: true
  replicas: 0
  resources:
    requests:
      cpu: ${chrome_node_cpu_requests}
      memory: ${chrome_node_memory_requests}
    limits:
      cpu: ${chrome_node_cpu_limits}
      memory: ${chrome_node_memory_limits}
  nodeSelector:
    agentpool: npappspot
  tolerations:
    - key: app
      operator: Equal
      value: app
      effect: NoSchedule
    - key: kubernetes.azure.com/scalesetpriority
      operator: Equal
      value: spot
      effect: NoSchedule
  extraEnvironmentVariables:
    - name: SE_NODE_SESSION_TIMEOUT
      value: ${SE_NODE_SESSION_TIMEOUT}

firefoxNode:
  enabled: true
  replicas: 0
  resources:
    requests:
      cpu: ${firefox_node_cpu_requests}
      memory: ${firefox_node_memory_requests}
    limits:
      cpu: ${firefox_node_cpu_limits}
      memory: ${firefox_node_memory_limits}
  nodeSelector:
    agentpool: npappspot
  tolerations:
    - key: app
      operator: Equal
      value: app
      effect: NoSchedule
    - key: kubernetes.azure.com/scalesetpriority
      operator: Equal
      value: spot
      effect: NoSchedule
  extraEnvironmentVariables:
    - name: SE_NODE_SESSION_TIMEOUT
      value: ${SE_NODE_SESSION_TIMEOUT}


edgeNode:
  enabled: true
  replicas: 0
  resources:
    requests:
      cpu: ${edge_node_cpu_requests}
      memory: ${edge_node_memory_requests}
    limits:
      cpu: ${edge_node_cpu_limits}
      memory: ${edge_node_memory_limits}
  nodeSelector:
    agentpool: npappspot
  tolerations:
    - key: app
      operator: Equal
      value: app
      effect: NoSchedule
    - key: kubernetes.azure.com/scalesetpriority
      operator: Equal
      value: spot
      effect: NoSchedule
  extraEnvironmentVariables:
    - name: SE_NODE_SESSION_TIMEOUT
      value: ${SE_NODE_SESSION_TIMEOUT}

Relevant log output

No errors appear in the log. Keda indicates that there are 10 glued jobs, but in selenium there are 20 and when Selenium finish with those 10 (that appears on keda) it does not lift any more pods from the browser.

INFO	scaleexecutor	Remove a job by reaching the historyLimit	{"scaledJob.Name": "selenium-grid-selenium-chrome-node", "scaledJob.Namespace": "selenium", "job.Name": "selenium-grid-selenium-chrome-node-vrjcm", "historyLimit": 0}

2024-10-23T11:40:39Z	INFO	scaleexecutor	Scaling Jobs	{"scaledJob.Name": "selenium-grid-selenium-chrome-node", "scaledJob.Namespace": "selenium", "Number of running Jobs": 40}

2024-10-23T11:40:39Z	INFO	scaleexecutor	Scaling Jobs	{"scaledJob.Name": "selenium-grid-selenium-chrome-node", "scaledJob.Namespace": "selenium", "Number of pending Jobs": 10}

Operating System

Aks Kubernetes version 1.28.9

Docker Selenium version (image tag)

4.25.0-20240922

Selenium Grid chart version (chart version)

0.36.1

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions