Skip to content

Faster reporting of user home directory sizes #556

@yuvipanda

Description

@yuvipanda

What happened?

  • Based on our communities' needs over the last few years, we have created multiple projects to manage user home directories.
  • Two critical ones are prometheus-dirsize-exporter (provides information about each user's home directory (like total size, number of files, the last time they touched it, etc)) and jupyterhub-home-nfs (limits size of each user's home directory)
  • Since prometheus-dirsize-exporter is designed to run on all sorts of home directories across cloud providers, it's very generic. It also intentionally runs slowly, since we most disks that home directories run on (like Amazon EFS, EBS, GCP Filestore, Azure File, or just a plain old hard disk on a disk) have a limited number of IO operations they can do per second (IOPS), and most of those should be reserved for actual users rather than reporting.
  • So on some large communities' disks, the information from prometheus-dirsize-exporter can be sometimes hours out of date! This makes life difficult for admins, as they aren't able to fully trust the information they see in their Grafana about user's home directories, as it may be hours out of date.
  • There was also no information about what each user's limits are, making it difficult to do alerts.
  • With Add prometheus metrics for dirsize and limits jupyterhub-home-nfs#76, we now export two new metrics from jupyterhub-home-nfs - total_size_bytes (total size of each user's home directory) and hard_limit_bytes (max allowed size of each user's home directory). Since these rely on the reporting features of the underlying XFS filesystem, they are practically instant. So admins will get up-to-date user home directory sizes within minutes, no matter the size of the home directories!
  • We also added Allow disabling total size metric prometheus-dirsize-exporter#29 to prometheus-dirsize-exporter, so it will stop collecting duplicate total_size_bytes metrics, but will continue to collect other metrics. This means that users of the upstream JupyterHub Grafana dashboards will get the same useful view about home directory usage, regardless of wether the metric comes from prometheus-dirsize-exporter or jupyterhub-home-nfs.
  • This has been rolled out to all our communities with Collect home directory metrics from jupyterhub-home-nfs infrastructure#7261
  • This will also help us in providing community-specific alerts to hub admins when a user is near their quota (work tracked in Setup minimal round of *community facing alerts* for user home directory usage infrastructure#7166)

Why should we be excited about it?

  • Because A
  • Because B

Where can we learn more?

  • Link A
  • Link B

Media and images

Image

Home Directory Usage dashboard, with total size coming from jupyterhub-home-nfs and all other columns coming from prometheus-dirsize-exporter

Acknowledgements

To Jenny and Angus, who suggested finding ways to roll some parts of prometheus-dirsize-exporter into jupyterhub-home-nfs based on experiences with various communities


  • Post published.
  • Shared on socials
  • Shared in the team Slack.
  • (If applicable) Emailed to the partner/community member who was featured.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions