Skip to content

StatefulSet stuck in Pending state due to unavailable Akri instance #750

@ar3s3ru

Description

@ar3s3ru

Describe the bug
StatefulSet is stuck on Pending state with the following error:

│   Type     Reason            Age                   From               Message                        
│   ----     ------            ----                  ----               -------                        
│   Warning  FailedScheduling  27m (x61 over 5h15m)  default-scheduler  0/2 nodes are available: 2 Insufficient akri.sh/sonoff-zigbee-antenna. preemption: 0/2 nodes are available: 2 No preemption victims found for incoming pod.

Where akri.sh/sonoff-zigbee-antenna has the following manifest:

apiVersion: akri.sh/v0
kind: Configuration
spec:
  capacity: 1
  discoveryHandler:
    discoveryDetails: |
      groupRecursive: true
      udevRules:
      - ATTRS{idVendor}=="10c4", ATTRS{idProduct}=="ea60"
    name: udev

Which results in the following Instance:

apiVersion: akri.sh/v0
kind: Instance
metadata:
  creationTimestamp: "2025-03-19T22:39:52Z"
  finalizers:
  - momonoke
  generation: 70225
  name: sonoff-zigbee-antenna-663571
  namespace: home-automation
  ownerReferences:
  - apiVersion: akri.sh/v0
    controller: true
    kind: Configuration
    name: sonoff-zigbee-antenna
    uid: bff9ebf4-49fb-44b4-8e49-ea94cb52082f
  resourceVersion: "7903817"
  uid: e5772e6a-9d1b-4590-98b3-c62065649264
spec:
  brokerProperties:
    UDEV_DEVNODE_1: /dev/ttyUSB0
    UDEV_DEVNODE_3: /dev/gpiochip0
    UDEV_DEVNODE_4: /dev/bus/usb/001/003
    UDEV_DEVPATH: /devices/pci0000:00/0000:00:14.0/usb1/1-6
  capacity: 1
  cdiName: akri.sh/sonoff-zigbee-antenna=663571
  configurationName: sonoff-zigbee-antenna
  deviceUsage: {}
  nodes:
  - momonoke
  shared: false

Output of kubectl get pods,akrii,akric -o wide

NAME                             READY   STATUS    RESTARTS        AGE     IP              NODE       NOMINATED NODE   READINESS GATES
pod/emqx-0                       1/1     Running   1 (6d15h ago)   7d5h    10.42.1.24      eq14-001   <none>           <none>
pod/emqx-1                       1/1     Running   0               7d5h    10.42.0.249     momonoke   <none>           <none>
pod/emqx-2                       1/1     Running   1 (6d15h ago)   7d5h    10.42.1.31      eq14-001   <none>           <none>
pod/esphome-0                    1/1     Running   0               29h     10.42.1.123     eq14-001   <none>           <none>
pod/govee2mqtt-cfdf7b6bc-gs2vt   1/1     Running   1 (6d15h ago)   15d     192.168.2.38    eq14-001   <none>           <none>
pod/home-assistant-0             1/1     Running   0               9d      192.168.2.109   momonoke   <none>           <none>
pod/music-assistant-0            1/1     Running   0               5h24m   192.168.2.109   momonoke   <none>           <none>
pod/zigbee2mqtt-0                0/1     Pending   0               5h24m   <none>          <none>     <none>           <none>

NAME                                            CONFIG                  SHARED   NODES          AGE
instance.akri.sh/sonoff-zigbee-antenna-663571   sonoff-zigbee-antenna   false    ["momonoke"]   13d

NAME                                          CAPACITY   AGE
configuration.akri.sh/sonoff-zigbee-antenna   1          13d

Kubernetes Version: [e.g. Native Kubernetes 1.19, MicroK8s 1.19, Minikube 1.19, K3s]

k3s version v1.32.2+k3s1

To Reproduce
Cannot find steps to reproduce - it "randomly" happens without external input, and only notice this when deploying changes (e.g. version updates) to the StatefulSet that uses the Akri resource.

Expected behavior
A clear and concise description of what you expected to happen.

Logs (please share snips of applicable logs)
Shared in the first section.

Additional context
The situation seems to get fixed when I:

  1. Manually delete both Configuration and Instance
  2. Create Configuration and Instance once again
  3. Manually delete StatefulSet
  4. Re-deploy StatefulSet manifest

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    Status

    Backlog

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions