-
Notifications
You must be signed in to change notification settings - Fork 55
Description
We want to develop a flux plugin that is able to deliver metrics to a server (likely Prometheus and using https://github.com/jupp0r/prometheus-cpp) that can then be sent to prometheus and the horizontal pod scaler adapter. In layman's terms, when the Flux queue gets too big and needs more resources, it can tell the autoscaling plugin and get them, and shrink back down the same. Likely we'd want the plugin build (outside of or alongside flux?) and then loaded in an rc file, like modload 0 prometheus. Also note that prometheus is interesting to use for other cases outside of autoscaling. The original discussion started here: #5184 (comment)
Questions I have:
- Documentation for writing a plugin
- Where does the plugin live - internal to flux-core or can it be external? Which would be better?
From @garlick
It depends on what kind of plugin is needed. I would think resource utilization or queue length or something like that would be the sort of metric you'd want for autoscaling? (Could we move this to the autoscaling discussion?)
I think to try we would just want to get the current queue stats - jobs that are running (using what resources) and jobs in the queue (and what resources are needed). I think I'd probably start with the most basic of things - number of jobs in the queue, maybe in different states, and then add details to that about resources needed vs. being used.
If I can get enough inkling of how to start, this would be fun for me to try.
Update: using https://github.com/digitalocean/prometheus-client-c/ doh, I can't use a c++ library!