Skip to content

pendingChanges results in restart loops #716

@Reamer

Description

@Reamer

Dear PlanetScale Vitess Team,
I tried to use your operator under Openshift and set it up with the following yaml.

Operator-YAML
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: vitess-operator
  namespace: vitess-operator
spec:
  replicas: 1
  selector:
    matchLabels:
      app: vitess-operator
  template:
    metadata:
      labels:
        app: vitess-operator
    spec:
      priorityClassName: vitess-operator-control-plane
      serviceAccountName: vitess-operator
      containers:
      - name: vitess-operator
        image: planetscale/vitess-operator:latest
        command:
        - vitess-operator
        args:
        - --logtostderr
        - -v=4

        # SRC: https://github.com/planetscale/vitess-operator/pull/135/files#diff-fb546cb0aca88a6cfcc3327d36ad94358b1eb39eec38170f656fc27db21c1586R135-R145
        - --default_vitess_run_as_user=-1
        - --default_vitess_fs_group=-1
        env:
        - name: WATCH_NAMESPACE
          value: "vitess-db" # Comma-separated list
        - name: POD_NAME
          valueFrom:
            fieldRef:
              fieldPath: metadata.name
        - name: PS_OPERATOR_POD_NAMESPACE
          valueFrom:
            fieldRef:
              fieldPath: metadata.namespace
        - name: PS_OPERATOR_POD_NAME
          valueFrom:
            fieldRef:
              fieldPath: metadata.name
        - name: OPERATOR_NAME
          value: "vitess-operator"
        resources:
          requests:
            cpu: 100m
            memory: 128Mi
          limits:
            memory: 128Mi

Please note that I have changed the default values below because Openshift sets the user ID dynamically for security reasons.

--default_vitess_run_as_user=-1
--default_vitess_fs_group=-1

Furthermore, Openshift connects each service account to its internal registry.

default ServiceAccount
apiVersion: v1
imagePullSecrets:
- name: default-dockercfg-p95fq
kind: ServiceAccount
metadata:
  annotations:
    openshift.io/internal-registry-pull-secret-ref: default-dockercfg-p95fq
  creationTimestamp: "2025-08-12T14:29:31Z"
  name: default
  namespace: vitess-db
  resourceVersion: "803949635"
  uid: 0b3ce561-81e0-4c3f-921b-130636a858d2
secrets:
- name: default-dockercfg-p95fq

In addition, Openshift also sets the SecurityContext runAsNonRoot to false by default for security reasons.

All these good settings result in the following pending list, which leads to minute-by-minute restarts of the vttablet component.

pendingChanges: |
        spec:
          containers:
          - name: vttablet
            securityContext:
              runAsNonRoot: null
              runAsUser: null
          - name: mysqld
            securityContext:
              runAsNonRoot: null
              runAsUser: null
          - name: mysqld-exporter
            securityContext:
              runAsNonRoot: null
              runAsUser: null
          imagePullSecrets: null
          initContainers:
          - name: init-vt-root
            securityContext:
              runAsNonRoot: null
              runAsUser: null
          - name: init-mysql-socket
            securityContext:
              runAsNonRoot: null
              runAsUser: null

In my opinion, the operator should ignore null values. In addition to Openshift, there are other tools that use mutating webhooks to modify resources under the hood.

Relates to #514, #277

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions