-
Notifications
You must be signed in to change notification settings - Fork 78
Description
Late last week, Project Pythia encountered failing BinderHub builds on their JS2 BinderHub (see FreshDesk ticket). @GeorgianaElena and I both picked this up today.
@GeorgianaElena noticed that the ephemeral storage on was close to full (and warnings thereof in the debug logs). The ephemeral storage is used by Docker via hostPath
(see also https://binderhub-service.readthedocs.io/en/latest/explanation/architecture.html) to store the build cache, and other dockerd
state. We acknowledged that we've not seen this particular error across our other BinderHub deployments, and @GeorgianaElena's hypothesis for this was that this cluster is special; as a JS2 cluster, the user node pool never scales down to zero. This means that the build cache held by Docker grows with time, and is never cleared. Eventually, we run out of space. This contrasts with other clusters which periodically scale down and empty the cache.
@GeorgianaElena took a look at the filesystem to support this suggestion,
$ df -hT
Filesystem Type Size Used Available Use% Mounted on
overlay overlay 28.9G 20.8G 8.1G 72% /
tmpfs tmpfs 64.0M 0 64.0M 0% /dev
/dev/vda1 ext4 28.9G 20.8G 8.1G 72% /etc/hosts
/dev/vda1 ext4 28.9G 20.8G 8.1G 72% /dev/termination-log
/dev/vda1 ext4 28.9G 20.8G 8.1G 72% /etc/hostname
/dev/vda1 ext4 28.9G 20.8G 8.1G 72% /etc/resolv.conf
shm tmpfs 64.0M 0 64.0M 0% /dev/shm
/dev/vda1 ext4 28.9G 20.8G 8.1G 72% /var/lib/docker
/dev/vda1 ext4 28.9G 20.8G 8.1G 72% /var/lib/binderhub-binderhub/docker-api
tmpfs tmpfs 5.9G 3.0M 5.9G 0% /run/binderhub-binderhub/docker-api
tmpfs tmpfs 58.8G 12.0K 58.8G 0% /run/secrets/kubernetes.io/serviceaccount
none tmpfs 29.4G 0 29.4G 0% /tmp
It can be seen that we're using a lot of the space.
We thought about different ways of resolving this. In normal usage, a low-occupancy BinderHub on other providers won't likely see this. For high-occupancy clusters, this failure mode may become apparent again. In any case, for this investigative work we were thinking only of JS2.
There are several possible approaches to a solution:
- Always clear the build cache after each build
- Disable layer caching at the
repo2docker
level (easier) - Encourage k8s to provide a new node for building (which has a fresh cache) when the storage is near full.
- Run a job to clear up the storage when it gets too full
The easiest solution that doesn't entirely disable layer caching is to attempt to signal to k8s that a new node is required once the cache approaches full. This would allow k8s to schedule user pods on the full node. This will potentially worsen startup times for builds if the node-pool keeps resizing back down to 1, and there may be other behaviour scenarios depending upon the pod packing strategy.
I implemented a patch to the cluster that lets us specify the ephemeral-storage
resource request per build. This is a bit of a sledgehammer -- it's not clear whether we can make an informed guess about the cache requirements of a random build in the same way that we do about memory requirements for singleuser.
Given that we have restarted the problematic node, we probably don't need to identify the fix today. So, there's now a chance to step back and think about what a proper solution looks like.
Note
There are a few assumptions here, so it's worth a second pair of eyes to validate the approach.