-
Notifications
You must be signed in to change notification settings - Fork 359
jac-scale: code-sync pod fails with Multi-Attach error when code-server holds PVC on a different node #5239
Description
Summary
When jac start --scale is run to update a deployed app, jac-scale creates a code-sync pod to push new code to the PVC. If the existing code-server pod is scheduled on a different node, the EBS volume (ReadWriteOnce) cannot be mounted by both pods simultaneously, causing a Multi-Attach error. The code-sync pod stays stuck in ContainerCreating indefinitely, so the app code is never updated even though CI reports a successful deployment.
Observed Behaviour
Warning FailedAttachVolume kubelet Multi-Attach error for volume "pvc-...":
Volume is already used by pod(s) jac-builder-dev-code-server-<hash>
code-syncpod stays inContainerCreatingindefinitelyjac start --scaletimes out or exits 124 (expected timeout)kubectl rollout statuspasses (the Deployment rolls out, just with old code)- App appears healthy but is running stale code from before the CI run
Root Cause
jac-scale's code distribution architecture:
code-server— a permanentDeployment(busybox httpd) that mounts the code PVC and serves it as a tar.gz over HTTPcode-sync— a standalonePodcreated at deploy time to push updated code onto the same PVC- Main app init container — downloads the tar.gz from code-server on pod startup
The problem: EBS volumes are ReadWriteOnce — they can only be mounted by pods on the same node. jac-scale creates code-sync without ensuring it lands on the same node as code-server, and without scaling code-server down first to release the volume. When K8s schedules them on different nodes (which is common in multi-node clusters), the Multi-Attach error occurs.
Steps to Reproduce
- Deploy an app with
jac start --scaleon a multi-node EKS cluster - Push a code change and trigger
jac start --scaleagain - Observe
code-syncpod stuck inContainerCreatingwith Multi-Attach error - The app deployment rolls out (readiness probe passes on old code) — CI shows green
- Exec into the new pod: the code is unchanged from the first deployment
Impact
- Code updates silently fail — CI is green but the app runs stale code
- Developers have no indication the code update didn't apply
- Only workaround is manual
kubectl cpdirectly into thecode-syncpod after engineering a brief window whencode-serverreleases the volume
Proposed Fix
Option 1 (recommended): Scale down code-server before code-sync runs
In jac-scale's deploy/update logic, before creating the code-sync pod:
# Scale code-server to 0 to release the PVC
kubectl scale deployment {app}-code-server -n {namespace} --replicas=0
kubectl wait --for=delete pod -l app={app}-code-server -n {namespace} --timeout=60s
# Now code-sync can mount the PVC exclusively
# ... push code to PVC via code-sync ...
# Scale code-server back up (it repacks the tar.gz)
kubectl scale deployment {app}-code-server -n {namespace} --replicas=1Option 2: Node affinity
Add a podAffinity rule to the code-sync pod spec so it is always scheduled on the same node as the code-server pod. EBS allows multiple mounts from the same node.
Option 3: Switch PVC to ReadWriteMany (EFS)
Replace the EBS-backed PVC with an EFS-backed one (storageClassName: efs-sc, accessModes: [ReadWriteMany]). This allows simultaneous mounts from any number of nodes and eliminates the problem entirely, but requires EFS to be provisioned in the cluster.
Environment
- EKS (us-east-2), multi-node cluster
- EBS gp2/gp3 PVC (ReadWriteOnce)
- jac-scale with
--scale --experimentalflags - Confirmed on
jaseci-cluster, namespacejac-builder-dev
Workaround (for deploy workflows)
Add a scale-down/scale-up step around the jac-scale deploy in CI:
- name: Release PVC before deploy
run: |
kubectl scale deployment {app}-code-server -n {namespace} --replicas=0
kubectl wait --for=delete pod -l app={app}-code-server -n {namespace} --timeout=60s || true
- name: Deploy with jac-scale
run: timeout 600 jac start main.jac --scale --experimental || ...
- name: Restore code-server
run: kubectl scale deployment {app}-code-server -n {namespace} --replicas=1