Skip to content

PostMortem storage discussion #114

@consideRatio

Description

@consideRatio

Mounting of storage on user pods was slow

It seems like it takes a while to mount volumes to the pods, impacting the spawn time significantly, I'm not sure what mounting process takes time yet though. There were many mounts happening.

  1. A 10GB GCE PD through a PVC / PV.
  2. A NFS server mount for the /home/curriculum folder that we did a gitpuller pull from to avoid relying on GitHub being up.
  3. A set of k8s ConfigMaps were also mounted

If it's the mounting that takes time, how much time does it take? If mounting a NFS PVC is slow, but it's fast to mount a hostPath volume, one could mount the NFS storage on each node and then use a hostPath volume to access that mount indirectly. This is what @yuvipanda's https://github.com/yuvipanda/k8s-nfs-mounter is doing, but it's also something Yuvi is transitioning away from.

NFS read/write throughput and the rsync cache workaround

Google's managed NFS service called Filestore was not promising more than a sustained throughput of 100MB/sec, which is a bit low if we want users to have access to 1GB datasets and have hundreds of users. Due to this, I ended up running a DaemonSet to create a pod on each node where I used rsync to stash away a local replica. rsync was used instead of cp or similar in order to ensure we could stay up to date with changes.

Some related PRs for this were #60, #63, #66, #100.

NFS quotas

While we didn't use NFS storage for the users, we could have, and then it would be relevant to try to solve the storage quota issue where you typically can't set quotas for individual users so easily.

@yuvipanda has demonstrated one solution using a self-hosted NFS server backed by storage on a XFS filesystem, and one can also use a Helm chart called nfs-provisioner to deploy a NFS server etc.

pangeo-data/pangeo-cloud-federation#654

NFS archiving

A challenge with a bootcamp like this is that we intent to tear it down after a while, but its not so great to delete access to storage for users. With that in mind, an option could be to archive it in some object storage and provide a way to access it later for users without having an NFS server running.

Access to the archived storage should not be public, so a simple solution would be to generate a password for each user which could be emailed or accessed through JupyterHub somehow which knows about the user. This could make sense to develop as a external JupyterHub service perhaps, which would be aware of the JupyterHub identity.

@yuvipanda is exploring this, but no GitHub repo is up yet to reference.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions