Skip to content

Discussion for Prometheus plugin (example use case autoscaling) #5214

@vsoch

Description

@vsoch

We want to develop a flux plugin that is able to deliver metrics to a server (likely Prometheus and using https://github.com/jupp0r/prometheus-cpp) that can then be sent to prometheus and the horizontal pod scaler adapter. In layman's terms, when the Flux queue gets too big and needs more resources, it can tell the autoscaling plugin and get them, and shrink back down the same. Likely we'd want the plugin build (outside of or alongside flux?) and then loaded in an rc file, like modload 0 prometheus. Also note that prometheus is interesting to use for other cases outside of autoscaling. The original discussion started here: #5184 (comment)

Questions I have:

  • Documentation for writing a plugin
  • Where does the plugin live - internal to flux-core or can it be external? Which would be better?

From @garlick

It depends on what kind of plugin is needed. I would think resource utilization or queue length or something like that would be the sort of metric you'd want for autoscaling? (Could we move this to the autoscaling discussion?)

I think to try we would just want to get the current queue stats - jobs that are running (using what resources) and jobs in the queue (and what resources are needed). I think I'd probably start with the most basic of things - number of jobs in the queue, maybe in different states, and then add details to that about resources needed vs. being used.

If I can get enough inkling of how to start, this would be fun for me to try.

Update: using https://github.com/digitalocean/prometheus-client-c/ doh, I can't use a c++ library!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions