Singularity support by t0mdavid-m · Pull Request #18 · OpenMS/OpenDIAKiosk

t0mdavid-m · 2026-05-15T20:55:03Z

Summary by CodeRabbit

New Features
- Added support for running the application on Apptainer/Singularity (HPC environments).
- Enabled multi-instance Streamlit deployments with automatic load balancing.
- Improved container startup with better Redis and worker process management.
Documentation
- Added Apptainer/Singularity usage guide with bind mount configuration.
Chores
- Enhanced CI/CD pipeline with Docker image caching and automated Apptainer image building and publishing.
- Added automatic cleanup of old container images from the registry.

Apptainer mounts the container filesystem read-only by default, so the existing entrypoint failed in two places: * cron could not write /var/run/crond.pid * redis-server could not access /var/lib/redis (and on Ubuntu it loads the system /etc/redis/redis.conf which points there) Refactor the inline-heredoc entrypoint into entrypoint.sh and redirect all runtime state (Redis data + pidfile, generated nginx config, nginx pidfile + temp dirs) to RUNTIME_DIR (default /tmp/opendiakiosk), which is writable as tmpfs under Apptainer and as overlay under Docker. cron failure is now non-fatal so the rest of the app still starts when scheduled cleanup is unavailable. Docker behavior is unchanged - the queue is already configured with appendonly=no so moving the Redis dir to a tmpfs path costs nothing.

Apptainer (the HPC container runtime) mounts the root filesystem read-only by default and runs the container as the host user's UID. The existing entrypoint failed under both conditions: cron: can't open or create /var/run/crond.pid: Read-only file system FATAL CONFIG FILE ERROR (Redis 7.0.15) ... 'dir "/var/lib/redis"' Extract the inline heredoc from the Dockerfile into a standalone docker/entrypoint.sh that auto-detects the read-only root (APPTAINER_NAME env var or /var/run write probe) and falls back to /tmp/openms-runtime-\$\$ for Redis data dir, Redis/nginx PID files, and the generated nginx.conf. Skip cron entirely when the root FS is read-only — workspace cleanup is a nice-to-have, not a hard requirement. The same script powers both Dockerfile and Dockerfile_simple: the Redis/RQ section is gated on \`command -v redis-server\` so the simple image (no redis installed) is a no-op for that block. Drop \`chown redis:redis\` on /var/lib/redis — under apptainer the in-image redis UID is unreachable. Add a test-apptainer CI job that reuses the build artifact, installs apptainer, converts to SIF, starts an instance, and waits for /_stcore/health. Reproduces the bug on the pre-fix entrypoint and gates future regressions for both image variants.

CI exposed three follow-up issues after the initial apptainer port: 1. /root is mode 0700 in the stock ubuntu base image. Docker runs the entrypoint as root so this is invisible, but apptainer maps the host user UID into the container — that user can't traverse /root, so `source /root/miniforge3/bin/activate ...` (the first executable line of the entrypoint) fails with EACCES, set -e exits, and the apptainer instance dies before streamlit binds 8501. Add `chmod o+x /root` in both Dockerfiles so the path is traversable by anyone, keeping the directory listing private. 2. Bound the `until redis-cli ping` loop (CodeRabbit OpenMS#387 review). If redis-server fails to bind 6379 (e.g. apptainer's shared host net namespace has the port taken), the loop spun forever and the health check timed out with no actionable error. Now retries REDIS_STARTUP_RETRIES times (default 30s) and exits 1 with a clear message on timeout. 3. Drop `chmod 0777 /var/lib/redis` (CodeRabbit OpenMS#387 review). Docker mode writes here as root regardless of mode bits, and apptainer mode never uses this path (the entrypoint relocates to /tmp/openms-runtime-*), so 0755 root-owned is correct and matches least-privilege.

`apptainer instance start` does not consistently honor a Docker image's WORKDIR — the container's CWD ends up being the host CWD at invocation (e.g. the GH Actions checkout root), so `streamlit run app.py` resolves against the wrong directory, exits with "Error: file not found", and the apptainer instance dies before binding 8501. The health check then times out at "Wait for streamlit /_stcore/health" with no obvious trace — exactly the failure seen on test-apptainer (full) and (simple). Anchor the entrypoint at /app explicitly. In docker mode WORKDIR /app is already set so this is a no-op; in apptainer mode it's the actual fix. Also stamp pwd+uid into the "Starting Streamlit app" log line so future breakage shows the resolved CWD/user in the apptainer logs without needing to re-instrument.

A user setting STREAMLIT_SERVER_COUNT > 1 on the simple image variant (no nginx installed) currently gets a single Streamlit instance with no log indication, making the misconfiguration silent and hard to diagnose. Emit a clear WARN before the load-balancer branch falls through. Addresses CodeRabbit review on OpenMS#387.

The test-apptainer job's "Dump entrypoint logs on failure" step is post-mortem and easy to miss in the GH Actions UI; combined with the auth-walled API on the logs endpoint, every prior failure left us guessing at what the entrypoint actually printed. Two changes, both no-op when the test passes: 1. The wait loop now tails the apptainer instance .out / .err every five attempts (and dumps them in full on timeout), so failures surface inline. The "Start apptainer instance" step exports the discovered log dir into $GITHUB_ENV so the next step can read it without re-deriving from hostname + whoami. 2. The entrypoint logs uid/gid/cwd, the relevant APPTAINER_* env vars, and whether `streamlit` resolves after conda activation. Two echo lines — harmless in docker mode where the logs aren't read, and the missing data on the apptainer side has been the whole bottleneck.

The test-apptainer job's instance came up cleanly (apptainer instance list reported PID 2586) but the entrypoint's first echo never landed in .out/.err — both files were empty when dumped on timeout. The Docker ENTRYPOINT was translated into the SIF's %runscript only; %startscript on a docker-archive build defaults to a no-op `exec "$@"`. So `apptainer instance start` was launching an empty daemon and streamlit never bound 8501. `apptainer instance run` (added in apptainer 1.1) starts a persistent named instance AND executes %runscript inside it — the verb actually intended for OCI-derived SIFs. With this change the entrypoint runs, the breadcrumbs added in 189a94b will appear in the instance log, and the health endpoint should come up for both image variants.

…tainer-Sdrl8 Support Apptainer/Singularity with read-only root filesystem

…ds attach A user binding host storage onto /workspaces-streamlit-template via singularity hit "[Errno 30] Read-only file system" the moment the app tried to mkdir a workspace, even though they passed `:rw` on the bind. Root cause: neither Dockerfile creates /workspaces-streamlit-template or /mounted-data. Docker auto-creates missing `-v` mount targets, but singularity uses a read-only underlay when the destination isn't a real directory in the SIF and silently degrades the bind — writes then go to the read-only squashfs and fail with EROFS regardless of the `:rw` flag. `mkdir -p` both paths in Dockerfile and Dockerfile_simple. Cost: one inode each. Behavior in docker mode is unchanged (a `-v` mount, k8s volumeMount, or compose volume shadows the empty dir). Behavior in singularity-without-bind is unchanged — writes still fail with EROFS, just one frame later in the path (parent in squashfs vs. parent missing entirely); persistent storage still requires a bind. CI guard: the test-apptainer job now starts the instance with explicit `--bind /tmp/host-workspaces:/workspaces-streamlit-template:rw` and `--bind /tmp/host-mounted-data:/mounted-data:ro`, then exec's into the running instance to write a probe file and asserts it appears on the host with the expected contents (plus a read-side check on the :ro mount). Without the mkdir, the probe write would fail with EROFS and the test would fail closed.

…ence The previous detection in StreamlitUI._mounted_data_root() returned the path as soon as it resolved to an existing directory, treating "directory exists" as proof that an operator bound something there. That worked only because docker auto-creates `-v` targets and the image never pre-created /mounted-data — so existence was a reliable proxy for "mount happened." The companion fix (pre-creating /workspaces-streamlit-template and /mounted-data in the Dockerfile so singularity binds attach read-write) breaks that assumption: the path now always exists. Without this change, the upload widget would render an empty mounted-drive browser to every docker user without a `-v` flag and every apptainer user without a `--bind`. Switch to os.path.ismount(): a true mount point (docker -v, k8s volumeMount, singularity --bind) crosses filesystems and trips the kernel's mount detection; an empty image-baked dir doesn't. The detection now asks the question we actually meant to ask. CI guard: the test-apptainer bind step now asserts os.path.ismount() returns True for both /mounted-data and /workspaces-streamlit-template under `apptainer instance run --bind`, so the gating logic stays consistent with the kernel's view if either side drifts.

…ntpoints fix(singularity): pre-create /workspaces and /mounted-data so :rw bin…

…tainer-dHZXJ Extract entrypoint script to file for Apptainer compatibility

Diagnostic from a user reproducing the workflow EROFS revealed the real chain of failure under singularity: Starting Redis server (data=/tmp/openms-runtime-452993/redis)... Redis is ready Starting 1 RQ worker(s)... Starting Streamlit app (cwd=/app, uid=1000)... ERROR:root:There exists an active worker named 'worker-1' already Apptainer/singularity share the host's network namespace by default. When the host has anything listening on 6379 — a system redis-server, a docker container, a previous singularity instance that didn't clean up — our `redis-server --daemonize yes` silently fails to bind with EADDRINUSE, but because daemonize forks before the listen-error surfaces, the entrypoint's parent shell returns 0 and the subsequent `redis-cli ping` happily connects to the *host's* redis instead. From there: - RQ tries to register `worker-1` against the host's redis → conflicts with stale state from a previous run, the worker dies. - Streamlit enqueues to the host's redis; the workflow job is consumed by whatever stale worker is still alive on the host, which runs the mkdir outside our mount namespace (no /workspaces-streamlit-template bind there) and hits EROFS at the squashfs root. Unix-socket sidesteps the entire problem class: when the entrypoint detects read-only-root (apptainer mode), it now starts redis with `--unixsocket $RUNTIME_DIR/redis.sock --port 0` (no TCP listener at all) and exports `REDIS_URL=unix://<socket>` so streamlit's QueueManager and the RQ worker can only connect to *our* redis. docker mode is unchanged (TCP 6379 on localhost as before, no socket). Also: write the resolved URL to /tmp/openms-redis-url so `apptainer exec` can discover it for diagnostics (env doesn't propagate across exec invocations). The test-apptainer CI step now reads that marker and pings with `redis-cli -s <sock>` accordingly.

…ntpoints fix(apptainer): use unix socket for Redis so host:6379 can't shadow us

coderabbitai · 2026-05-15T20:55:11Z

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: dff8475a-142c-4366-a99e-db197783f2ba

📥 Commits

Reviewing files that changed from the base of the PR and between d58a94d and f01c16b.

📒 Files selected for processing (7)

.github/workflows/build-and-test.yml
.github/workflows/ghcr-cleanup.yml
Dockerfile
README.md
docker/entrypoint.sh
entrypoint.sh
src/workflow/StreamlitUI.py

📝 Walkthrough

Walkthrough

This PR introduces Apptainer/Singularity support for HPC deployments by adding a complete container build and test pipeline, modifying container startup to handle read-only filesystems with Unix sockets, publishing validated SIF images to GHCR with retention, and updating integration tests to consume image artifacts.

Changes

Apptainer Container Support

Layer / File(s)	Summary
Container Runtime & Entrypoint `Dockerfile`, `docker/entrypoint.sh`, `entrypoint.sh`	Dockerfile enables non-root execution (`chmod o+x /root`), installs Redis/nginx, and copies entrypoint script; entrypoint detects read-only mode (Apptainer/Singularity), selects runtime directories, manages Redis startup with unix socket support, launches RQ workers, and orchestrates single or multi-instance Streamlit behind nginx with cookie-based routing.
Apptainer SIF Build & Publish Pipeline `.github/workflows/build-and-test.yml`, `.github/workflows/ghcr-cleanup.yml`	Build job saves Docker image as tarball artifact; test-apptainer job loads tarball, builds SIF, verifies Streamlit health and Redis/mount-point behavior, and uploads validated artifact; publish-apptainer job pushes SIF to GHCR with tag variants using ORAS; cleanup job enforces retention policy for old commits and untagged manifests.
Integration Test Updates `.github/workflows/build-and-test.yml`	NGINX and Traefik test jobs download and load image artifact from build job, dynamically compute overlay SLUG from production kustomization, verify Redis readiness with discovered selector, wait for deployments, and validate health endpoints with extended readiness retry loops.
Documentation & Mount Validation `README.md`, `src/workflow/StreamlitUI.py`	README adds Apptainer/HPC section with pull/run commands, supported tags, and conversion fallback; StreamlitUI restricts data directory rendering to actual mount points using `os.path.ismount()` instead of directory existence alone.

A rabbit hops through clouds of containers bright,
From Docker's warmth to HPC's read-only night,
With Apptainer's speed and SLUG-derived grace,
Mount points are blessed, a validated place! 🐰📦

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch singularity_support

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Reuse the SIF that test-apptainer already builds and validates: upload it as a workflow artifact when validation passes, then push it to ghcr.io/<owner>/<repo>/sif:<tag> from a new publish-apptainer job. Tag scheme mirrors the docker image (branch/sha/version-<variant> plus bare `latest` for full+main). Sibling /sif package keeps tag lists clean and cleanup policies independent. README now points HPC users at the prebuilt ORAS path instead of the slow on-the-fly OCI->SIF conversion. https://claude.ai/code/session_01NumLyfkQ3w3JF3TU8jM1iX

…osting-eH5Gh Publish prebuilt Apptainer SIFs to GHCR via ORAS

t0mdavid-m and others added 16 commits May 13, 2026 11:14

Merge pull request OpenMS#387 from OpenMS/claude/fix-opendiakiosk-app…

b36fb90

…tainer-Sdrl8 Support Apptainer/Singularity with read-only root filesystem

Merge pull request OpenMS#388 from OpenMS/claude/singularity-bind-mou…

2765119

…ntpoints fix(singularity): pre-create /workspaces and /mounted-data so :rw bin…

Merge branch 'main' into claude/fix-opendiakiosk-apptainer-dHZXJ

d85c6d8

Merge pull request OpenMS#386 from OpenMS/claude/fix-opendiakiosk-app…

9128698

…tainer-dHZXJ Extract entrypoint script to file for Apptainer compatibility

Merge pull request OpenMS#389 from OpenMS/claude/singularity-bind-mou…

bce2e27

…ntpoints fix(apptainer): use unix socket for Redis so host:6379 can't shadow us

Merge remote-tracking branch 'template/main' into singularity_support

bb86534

claude and others added 3 commits May 15, 2026 21:12

Merge pull request OpenMS#390 from OpenMS/claude/singularity-github-h…

6ca8e97

…osting-eH5Gh Publish prebuilt Apptainer SIFs to GHCR via ORAS

Merge remote-tracking branch 'template/main' into singularity_support

f01c16b

t0mdavid-m merged commit 0e2df38 into main May 15, 2026
6 of 7 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Singularity support#18

Singularity support#18
t0mdavid-m merged 19 commits into
mainfrom
singularity_support

t0mdavid-m commented May 15, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented May 15, 2026 •

edited

Loading

Review failed

Walkthrough

Changes

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

t0mdavid-m commented May 15, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review failed

Walkthrough

Changes

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

t0mdavid-m commented May 15, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 15, 2026 •

edited

Loading