flux-top
: command that gives insight into the current utilization of job resources
#3791
Unanswered
SteVwonder
asked this question in
Ideas
Replies: 1 comment
-
I know initial brainstorming idea for the short term. Could Then on the client side look at info via |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
As the title says, it would be cool to be able to do something like
flux top $JOBID
and see the aggregate cpu/gpu/mem usage of all the nodes in a job. Maybeflux-exec
could be leveraged here to spawn a "daemon" on each node in a user's job. Those daemons could then collect utilization statistics, forward them up the overlay, where they are reduced and fed to theflux-top
front-end. Alternatively, maybe we have an always loaded module that collects these stats, and theflux-top
front-end would just filter to just the stats that apply the the user's particular job.Opening because I just had a workflow user request this while debugging node OOMs. The traditional method of debugging an OOM (attach a parallel debugger) is hard since there are tons of apps/components running in this composite workflow. So just getting a feel for which of the components is using the most memory would be helpful.
Beta Was this translation helpful? Give feedback.
All reactions