Skip to content

MAS and Synapse in crashloop due to initContainer permission issue #925

@oazabir

Description

@oazabir

Issue Summary

The render-config init containers in ESS (Element Server
Suite) experience a permission denied error when trying
to write configuration files to an emptyDir volume,
causing pods to enter a crash loop.

Environment Details

  • Kubernetes Distribution: MicroK8s v1.33.5
  • OS: Ubuntu 24.04.3 LTS
  • Kernel: 6.8.0-87-generic
  • Container Runtime: containerd://1.7.27
  • Architecture: amd64
  • Helm Chart: matrix-stack-25.12.0
  • Matrix Tools Image:
    ghcr.io/element-hq/ess-helm/matrix-tools:0.5.6

Affected Components

  1. Synapse: ess-synapse-main-0 StatefulSet
  2. Matrix Authentication Service:
    ess-matrix-authentication-service-6644fd87cd-xxxx
    Deployment

Error Details

Error writing to file: open /conf/homeserver.yaml:
permission denied

Root Cause Analysis

  1. Security Context Configuration

Both containers have a restrictive security context:
securityContext:
allowPrivilegeEscalation: false
capabilities:
drop:
- ALL
readOnlyRootFilesystem: true

  1. Volume Configuration

The affected volume is an emptyDir with medium: Memory:
volumes:

  • name: rendered-config
    emptyDir:
    medium: Memory
  1. Volume Mounts
  • render-config init container: Mounts /conf (writable)
  • synapse container: Mounts /conf/homeserver.yaml
    (readOnly, subPath)
  1. Issue Pattern

The permission issue appears to be related to:

  1. The combination of readOnlyRootFilesystem: true with
    in-memory emptyDir
  2. Potential timing issue where the volume is not
    properly initialized with the correct permissions
  3. The containers running as a non-root user (implicit
    due to dropped ALL capabilities)

Timeline

  1. Initial deployment: Dec 6, 23:30:48 UTC (Revision 1)
  2. Helm upgrade: Dec 7, 00:39:10 UTC (Revision 2) - This
    might have triggered the issue
  3. Issue duration: Approximately 15 hours (until manual
    intervention)

Workaround Applied

Force-deleting and recreating the affected pods resolved
the issue:
kubectl delete pod -n ess ess-synapse-main-0 --force
--grace-period=0
kubectl delete pod -n ess
ess-matrix-authentication-service-xxx --force
--grace-period=0

Potential Solutions to Consider

  1. Add explicit user/group to securityContext:
    securityContext:
    runAsUser: 1000
    runAsGroup: 1000
    fsGroup: 1000
  2. Add volume permission initialization:
    securityContext:
    fsGroup: 1000
  3. Consider using a regular emptyDir instead of memory
    medium for debugging
  4. Add init container to set permissions:
  • name: set-permissions
    image: busybox
    command: ['sh', '-c', 'chmod 755 /conf']
    volumeMounts:
    • name: rendered-config
      mountPath: /conf

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions