-
Notifications
You must be signed in to change notification settings - Fork 56
Open
Description
From a interactive question: how is priority calculated? We shouldn't go into depth but it could be mentioned in 1-2 more sentences.
This bit of history is an old write-up about it, which was deprecated some time ago as the page was redundant: (new description shouldn't be this long, but it could be a faq?)
scicomp-docs/triton/usage/jobs.rst.old
Lines 189 to 273 in af73a37
| Job priority | |
| ============ | |
| Triton queues are not first-in first-out, but "fairshare". This means | |
| that every person has a priority. The more you run the lower your | |
| user priority. As time passes, your user priority increases again. | |
| The longer a job waits in the queue, the higher its job priority goes. | |
| So, in the long run (if everyone is submitting an never-ending stream | |
| of jobs), everyone will get exactly their share. | |
| Once there are priorities, then: jobs are scheduled in order of | |
| priority, then any gaps are backfilled with any smaller jobs that can | |
| fit in. So small jobs usually get scheduled fast regardless. | |
| *Warning: from this point on, we get more and more technical, if you | |
| really want to know the details. Summary at the end.* | |
| What's a share? Currently shares are based on department and their | |
| respective funding of Triton (``sshare``). Shares are shared among | |
| everyone in the department, but each person has their own priority. | |
| Thus, for medium users, the 2-week usage of the rest of your | |
| department can affect how fast your jobs run. However, again, things | |
| are balanced per-user within departments. (However, one heavy user in | |
| a department can affect all others in that department a bit too much, | |
| we are working on this) | |
| Your priority goes down via the "job billing": roughly time×power. | |
| CPUs are billed at 1/s (but older, less powerful CPUs cost less!). | |
| Memory costs .2/GB/s. But: you only get billed for the max of memory | |
| or CPU. So if you use one CPU and all the memory (so that no one else | |
| can run on it), you get billed for all memory but no CPU. Same for | |
| all CPUs and little memory. This encourages balanced use. (this also | |
| applies to GPUs). | |
| GPUs also have a billing weight, currently tens of times higher than a | |
| CPU billing weight for the newest GPUs. (In general all of these can | |
| change, for the latest info see search ``BillingWeights`` in | |
| ``/etc/slurm/slurm.conf``). | |
| If you submit a long job but it ends early, you are only billed for | |
| the actual time you use (but the longer job might take longer to start | |
| at the beginning). Memory is always billed for the full reservation | |
| even if you use less, since it isn't shared. | |
| The "user priority" is actually just a record how much you have | |
| consumed lately (the billing numbers above). This number goes down | |
| with a half-life decay of 2 weeks. Your personal priority your share | |
| compared to that, so we get the effect described above: the more you | |
| (or your department) runs lately, the lower your priority. | |
| If you want your stuff to run faster, the best way is to more | |
| accurately specify your time (may make that job can find a place | |
| sooner) and memory (avoids needlessly wasting your priority). | |
| While your job is pending in the queue SLURM checks those metrics | |
| regularly and recalculates job priority constantly. If you are | |
| interested in details, take a look at `multifactor priority plugin | |
| <https://slurm.schedmd.com/priority_multifactor.html>`__ page (general | |
| info) and `depth-oblivious fair-share factor | |
| <https://slurm.schedmd.com/priority_multifactor3.html>`__ for what we | |
| use specifically (warning: very in depth page). On Triton, you can | |
| always see the latest billing weights in ``/etc/slurm/slurm.conf`` | |
| Numerically, job priorities range from 0 to 2^32-1. Higher is | |
| sooner to run, but really the number doesn't mean much itself. | |
| These commands can show you information about your user and job | |
| priorities: | |
| .. csv-table:: | |
| :delim: | | |
| ``slurm s`` | list of jobs per user with their current priorities | |
| ``slurm full`` | as above but almost all of the job parameters are listed | |
| ``slurm shares`` | displays usage (RawUsage) and current FairShare weights (FairShare, higher is better) values for all users | |
| ``slurm j <jobid>`` | shows ``<jobid>`` detailed info including priority, requested nodes etc. | |
| .. | |
| ``slurm p gpu`` | # shows partition parameters incl. Priority= | |
| tl;dr: Just select the resources you think you need, and slurm | |
| tries to balance things out so everyone gets their share. The best | |
| way to maintain high priority is to use resources efficiently so you | |
| don't need to over-request. |
Metadata
Metadata
Assignees
Labels
No labels