Skip to content

[P&S Initiative] Collect user session length and count metrics #6507

@yuvipanda

Description

@yuvipanda

Note: This initiative is not complete, the only issue defined should give us more info on what else needs to be done once that is complete.

For https://github.com/2i2c-org/meta/issues/1814, we need to provide an easily explainable set of numbers that BD can use to talk to communities. We currently provide 'active users' only. We would like to provide:

  1. Total number of user sessions
  2. Median length of each user session
  3. 99th percentile length of user sessions

This fundamentally can be expressed as a prometheus histogram, although we don't have the ways to provide that yet.

Problem

This is a little difficult to do currently with the information we have in prometheus, because we don't really have any raw data on sessions. Session presence or absence is calculated by the presence or absence of metrics (like kube_pod_status_phase) which makes this difficult. While we can do things like 'number of running servers at any given instant' easily, it's more difficult to do 'number of running servers today'. It's even more difficult to do things like 'median session length of users today'. A useful way to think of this is that promql excels at 'map' operations and 'filter' operations, but struggles with 'reduce' operations.

It's possible that someone with more promql knowledge than me could do this, but I've asked around and haven't necessarily found solutions we can actually trust.

The last time we tried to do this, we ended up adding these metrics to JupyterHub directly so we could trust the active user counts: jupyterhub/jupyterhub#4214

Possible solutions

  1. promql + Grafana based this off our existing metrics. This may include using a Prometheus recording rule to create a new timeseries
  2. Add this to JupyterHub as a metric, same as we did last time. However, that was possible because JupyterHub already tracked the one thing that was needed - users and when they were last active. While Spawner (the ORM object) in JupyterHub does have started and last_activity timestamps, these are reset whenever a user's server stops, so we can't really use them here. Also, it doesn't contain historical information - we can only know about the last server start / stop, nothing before
  3. Collect new metrics from the user pods. In particular, collect prometheus: Expose 3 activity metrics jupyter-server/jupyter_server#1471 and then do solution 1.
  4. Write our own prometheus exporter in python, that can treat prometheus as a source of data, compute the additional info we want, and export it for collection again by prometheus. This we know will work, so if we can't do it through the other options we do this.

Path forward

We'll experiment with 2i2c-org/meta#2544 to figure out if we can get at least active user session counts set up, and if not, what additional work we would need to do.

Sub-issues

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions