feat: Publish debug container image and account-sync sidecar (#251)

DiamondJoseph · web-flow · commit 76c187cdfa1f · 2025-07-01T10:08:34.000+01:00
Closes #234 Adds the option to create a -debug image when publishing a container from a project managed by the copier template and documentation on how to use the debug image to debug within the DLS cluster infrastructure.
diff --git a/copier.yml b/copier.yml
@@ -103,6 +103,15 @@ docker:
         Would you like to publish your project in a Docker container?
         You should select this if you are making a service.
 
+docker_debug:
+    type: bool
+    when: "{{ docker }}"
+    help: |
+        Would you like to publish a debug image of your service?
+        This will increase the number of published images, but may
+        be useful if debugging the service inside of the cluster
+        infrastructure is required.
+
 docs_type:
     type: str
     help: |
diff --git a/docs/how-to/debug-in-cluster.md b/docs/how-to/debug-in-cluster.md
@@ -0,0 +1,99 @@
+# Debugging containers
+
+If the `docker_debug` option is chosen, the container build also publishes a debug container for each tagged release of the container suffixed with `-debug`. This container contains an editable install of the workspace & debugpy and has an alternate entrypoint which allows the devcontainer to attach.
+
+# Using Debug image in a Helm chart
+
+⚠️ If running with the Diamond filesystem mounted or as a specific user, further adjustments are required, as described in the next section.
+
+To use the debug image in a Helm chart can be as simple as modifying `image.tag` value in values.yaml to the tag with `-debug`, but this may run into issues if you have defined liveness or readiness probes, a custom command or args, or if the container is running as non-root. To make capturing these edge cases easier it's recommended to define a single flag `debug.enabled` in your `values.yaml` and make the following modifications to the `Deployment|ReplicaSet|StatefulSet`:
+
+```yaml
+spec:
+  template:
+    spec:
+      containers:
+        - image: "{{ .Values.image.repository }}:{{ .Values.image.tag | default .Chart.AppVersion }}{{ ternary "-debug" "" .Values.debug.enabled }}"
+          {{- if not .Values.debug.enabled }}  # If your Helm chart overrides the `CMD` Containerfile instruction, it should not when in debug mode
+          args: ["some", "example", "args"]
+          {{- end }}
+          {{- if not .Values.debug.enabled }}  # prevent probes causing issues before attaching and starting the service
+          {{- with .Values.livenessProbe }}
+          livenessProbe:
+            {{- toYaml . | nindent 12 }}
+          {{- end }}
+          {{- with .Values.readinessProbe }}
+          readinessProbe:
+            {{- toYaml . | nindent 12 }}
+          {{- end }}
+          {{- end }}
+          volumeMounts:
+          {{- if .Values.debug.enabled }}
+          - mountPath: /home  # required for VSCode to install extensions if running as non-root
+            name: home
+          {{- end }}
+          {{- with .Values.volumeMounts }}
+            {{- toYaml . | nindent 12 }}
+          {{- end }}
+      volumes:
+        {{- if .Values.debug.enabled }}
+        - name: home  # mount /home as an editable volume to prevent permission issues
+          emptyDir:
+            sizeLimit: 500Mi
+        {{- end }}
+        {{- with .Values.volumes }}
+          {{- toYaml . | nindent 8 }}
+        {{- end }}
+```
+
+# Using Debug image in a Helm chart that mounts the filesystem
+
+Containers running in the Diamond Kubernetes infrastructure as a specific uid (e.g. when mounting the filesystem) must provide name resolution from Diamond's LDAP infrastructure: inside the cluster the VSCode server will be running as that user, but requires that the name & home directory of the user can be found. The debug image configures the name lookup service to try finding the user internally (i.e. from `/etc/passwd`) then fall back to calling LDAP through a service called `libnss-ldapd`. As containers are designed to run a single process, this service is run in a sidecar container which must mutually mount the `/var/run/nslcd` socket with the primary container.
+
+It therefore requires the further additions to the template modified above:
+
+```yaml
+spec:
+  template:
+    spec:
+      containers:
+        - volumeMounts:
+          {{- if .Values.debug.enabled }}
+          - mountPath: /var/run/nslcd  # socket to place query for user information
+            name: nslcd
+          [...]
+        {{- if .Values.debug.enabled }}
+        - name: debug-account-sync
+          image: ghcr.io/diamondlightsource/account-sync-sidecar:3.0.0
+          volumeMounts:
+          - mountPath: /var/run/nslcd  # socket to pick queries for user information
+            name: nslcd
+        {{- end }}
+      volumes:
+        {{- if .Values.debug.enabled }}
+        - name: nslcd  # mutually mounted filesystem to both containers
+          emptyDir:
+            sizeLimit: 5Mi
+          [...]
+```
+
+# Debugging in the cluster
+
+With the [Kubernetes plugin for VSCode](https://marketplace.visualstudio.com/items?itemName=ms-kubernetes-tools.vscode-kubernetes-tools) it is then possible to attach to the container inside the cluster. From the VSCode Command Palette (Ctrl+Shift+P) use the `Kubernetes: Set Kubeconfig` to configure VSCode with the server to use, then`Kubernetes: Use Namespace`.
+
+```sh
+# To find the KUBECONFIG to use from a Diamond machine
+$ module load pollux
+...
+$ echo $KUBECONFIG
+~/.kube/config_pollux
+```
+
+![Location of the Kubernetes plugin in the plugin bar (screen left), with the Clusters>cluster>Workloads>Pods views expanded out to show a pod named "my-service", overlaid with a dropdown box, with "Attach Visual Studio Code" highlighted](../images/debugging-kubernetes.jpg)
+The Kubernetes plugin can be found in the plugin bar. Expanding the Clusters>`cluster`>Workloads>Pods views, your service should be visible. Right Click>Attach Visual Studio Code will initiate connecting to the workspace in the cluster. Select your service container from the top menu when prompted.
+
+After the connection to the cluster has been established open the workspace folder by clicking the Explorer option in the plugin bar, the repository will be mounted at `/workspaces/<service name>`, equivalent to when working with a local devcontainer.
+
+Starting your service with the command in the container definition starts it on the node, with access to Kubernetes resources, however it is also now possible to run with or attach a debugger, potentially configured to autoReload code, or to start and stop the service rapidly to implement prospective changes.
+
+After you are happy with the changes, commit them and release a new version of your container. Changes will otherwise not be persisted across container restarts. Your git and ssh config will be mounted inside the devcontainer while connected and for containers on github, the remote `origin` will be configured to use ssh.
diff --git a/docs/images/debugging-kubernetes.jpg b/docs/images/debugging-kubernetes.jpg
diff --git a/example-answers.yml b/example-answers.yml
@@ -6,6 +6,7 @@ component_lifecycle: experimental
 description: An expanded https://github.com/DiamondLightSource/python-copier-template to illustrate how it looks with all the options enabled.
 distribution_name: dls-python-copier-template-example
 docker: true
+docker_debug: true
 docs_type: sphinx
 git_platform: github.com
 github_org: DiamondLightSource
diff --git a/template/Dockerfile.jinja b/template/Dockerfile.jinja
@@ -14,10 +14,33 @@ ENV PATH=/venv/bin:$PATH{% if docker %}
 
 # The build stage installs the context into the venv
 FROM developer AS build
-COPY . /context
-WORKDIR /context
+# Requires buildkit 0.17.0
+COPY --chmod=o+wrX . /workspaces/{{ repo_name }}
+WORKDIR /workspaces/{{ repo_name }}
 RUN touch dev-requirements.txt && pip install -c dev-requirements.txt .
 
+{% if docker_debug %}
+FROM build AS debug
+
+{% if git_platform=="github.com" %}
+# Set origin to use ssh
+RUN git remote set-url origin git@github.com:{{github_org}}/{{repo_name}}.git
+{% endif %}
+
+# For this pod to understand finding user information from LDAP
+RUN apt update
+RUN DEBIAN_FRONTEND=noninteractive apt install libnss-ldapd -y
+RUN sed -i 's/files/ldap files/g' /etc/nsswitch.conf
+
+# Make editable and debuggable
+RUN pip install debugpy
+RUN pip install -e .
+
+# Alternate entrypoint to allow devcontainer to attach
+ENTRYPOINT [ "/bin/bash", "-c", "--" ]
+CMD [ "while true; do sleep 30; done;" ]
+
+{% endif %}
 # The runtime stage copies the built venv into a slim runtime container
 FROM python:${PYTHON_VERSION}-slim AS runtime
 # Add apt-get system dependecies for runtime here if needed
diff --git a/template/{% if git_platform=="github.com" %}.github{% endif %}/workflows/ci.yml.jinja b/template/{% if git_platform=="github.com" %}.github{% endif %}/workflows/ci.yml.jinja
@@ -41,7 +41,16 @@ jobs:
     permissions:
       contents: read
       packages: write
-{% endraw %}{% endif %}{% if sphinx %}
+{% endraw %}{% if docker_debug %}{% raw %}
+  debug_container:
+    needs: [container, test]
+    uses: ./.github/workflows/_debug_container.yml
+    with:
+      publish: ${{ needs.test.result == 'success' }}
+    permissions:
+      contents: read
+      packages: write
+{% endraw %}{% endif %}{% endif %}{% if sphinx %}
   docs:
     uses: ./.github/workflows/_docs.yml
 
diff --git a/template/{% if git_platform=="github.com" %}.github{% endif %}/workflows/{% if docker_debug %}_debug_container.yml{% endif %} b/template/{% if git_platform=="github.com" %}.github{% endif %}/workflows/{% if docker_debug %}_debug_container.yml{% endif %}
@@ -0,0 +1,49 @@
+on:
+  workflow_call:
+    inputs:
+      publish:
+        type: boolean
+        description: If true, pushes image to container registry
+
+jobs:
+  build:
+    runs-on: ubuntu-latest
+
+    steps:
+      - name: Checkout
+        uses: actions/checkout@v4
+        with:
+          # Need this to get version number from last tag
+          fetch-depth: 0
+
+      - name: Set up Docker Buildx
+        id: buildx
+        uses: docker/setup-buildx-action@v3
+
+      - name: Log in to GitHub Docker Registry
+        if: github.event_name != 'pull_request'
+        uses: docker/login-action@v3
+        with:
+          registry: ghcr.io
+          username: ${{ github.actor }}
+          password: ${{ secrets.GITHUB_TOKEN }}
+
+      - name: Create tags for publishing debug image
+        id: debug-meta
+        uses: docker/metadata-action@v5
+        with:
+          images: ghcr.io/${{ github.repository }}
+          tags: |
+            type=ref,event=tag,suffix=-debug
+            type=raw,value=latest-debug
+
+      - name: Build and publish debug image to container registry
+        if: github.ref_type == 'tag'
+        uses: docker/build-push-action@v6
+        env:
+          DOCKER_BUILD_RECORD_UPLOAD: false
+        with:
+          context: .
+          push: true
+          target: debug
+          tags: ${{ steps.debug-meta.outputs.tags }}