Generalise submission and cancellation arguments #641

AlecThomson · 2024-05-24T09:59:51Z

Closes #640

jacobtomlinson

This looks fine to me! I do wonder if we should expose cancel_command as a keyword argument on the SLURMCluster and pass that down to SLURMJob, in case users want to override it with something else.

AlecThomson · 2024-05-25T01:39:53Z

I found that the HTCondor class already had something similar. I've added this to the base Job class in core. The downside is that this adds some extra boilerplate.

I've reworked the HTCondorJob and SLURMJob to make use of the new functionality

guillaumeeb · 2024-05-31T15:42:03Z

dask_jobqueue/lsf.py

            use_stdin = dask.config.get("jobqueue.%s.use-stdin" % self.config_name)
        self.use_stdin = use_stdin

+        if self.submit_command_extra is None:


Why is all this code both in the Core class and the class inheriting it?

guillaumeeb · 2024-05-31T15:43:14Z

Also, the test failures in slurm (and maybe others) looks related to this change.

jacobtomlinson · 2024-08-06T13:10:07Z

CI is now mostly happy on main so I've just merged in so we can see up to date CI failures.

AlecThomson · 2024-08-07T05:12:17Z

Sorry for the long lead time on this, everyone. I got myself tripped up between the methods on the base and inheriting classes - as noted by @guillaumeeb's comment. I believe I've got this all sorted now. Hopefully the CI will catch any lingering issues.

jacobtomlinson · 2024-08-07T10:04:48Z

Thanks @AlecThomson! Looks like there are some linting issues (make sure you run pre-commit install) and some slurm issues.

AlecThomson · 2024-08-07T10:16:49Z

Hmm - looks like some kind of timing error on the test. I don't quite understand why it's failing... 🤔

>                   assert time() < start + QUEUE_WAIT
E                   assert 1723021595.3378592 < (1723021535.2718754 + 60)
E                    +  where 1723021595.3378592 = time()

https://github.com/dask/dask-jobqueue/actions/runs/10278490598/job/28450471395#step:7:425

jacobtomlinson · 2024-08-07T10:20:29Z

It is calling cluster.scale(n) and then waiting for the cluster to scale. The time assertion is just a timeout, so it's not scaling to the correct number in the time allowed.

Note: We don't use client.wait_for_workers(n) because that checks for "at least n workers" so doesn't wait when scaling down (xref dask/distributed#6374).

Use sigterm

5b885e3

jacobtomlinson reviewed May 24, 2024

View reviewed changes

AlecThomson added 4 commits May 25, 2024 09:35

Add extra submit/cancel args to core

a6cb957

Rework condor to match core

320dfc0

Add slurm default

8752fb4

Add boilerplate to other clusters

012f121

AlecThomson changed the title ~~Graceful Slurm job cancellation~~ Generalise submission and cancellation arguments May 25, 2024

guillaumeeb reviewed May 31, 2024

View reviewed changes

AlecThomson and others added 4 commits June 17, 2024 16:36

Not self yet

6407ea6

Fix args

38ed4a9

Fix core

7adc55e

Merge branch 'main' of github.com:dask/dask-jobqueue into shutdown

7af5720

AlecThomson added 7 commits August 7, 2024 12:03

Type fixes

5d4fdc5

Initialise variables

7ad8ebc

typo

6074a3c

Use base class

a7cdfb5

Merge remote-tracking branch 'origin/main' into shutdown

82b4eeb

Cleanup

5c99b7e

Cleanup

4f03167

lint

64f47af

Merge remote-tracking branch 'origin/main' into shutdown

ba45c40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

Generalise submission and cancellation arguments #641

Generalise submission and cancellation arguments #641

Uh oh!

AlecThomson commented May 24, 2024

Uh oh!

jacobtomlinson left a comment

Uh oh!

AlecThomson commented May 25, 2024

Uh oh!

guillaumeeb May 31, 2024

Uh oh!

guillaumeeb commented May 31, 2024

Uh oh!

jacobtomlinson commented Aug 6, 2024

Uh oh!

AlecThomson commented Aug 7, 2024

Uh oh!

jacobtomlinson commented Aug 7, 2024

Uh oh!

AlecThomson commented Aug 7, 2024

Uh oh!

jacobtomlinson commented Aug 7, 2024 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Uh oh!

Generalise submission and cancellation arguments #641

Are you sure you want to change the base?

Generalise submission and cancellation arguments #641

Uh oh!

Conversation

AlecThomson commented May 24, 2024

Uh oh!

jacobtomlinson left a comment

Choose a reason for hiding this comment

Uh oh!

AlecThomson commented May 25, 2024

Uh oh!

guillaumeeb May 31, 2024

Choose a reason for hiding this comment

Uh oh!

guillaumeeb commented May 31, 2024

Uh oh!

jacobtomlinson commented Aug 6, 2024

Uh oh!

AlecThomson commented Aug 7, 2024

Uh oh!

jacobtomlinson commented Aug 7, 2024

Uh oh!

AlecThomson commented Aug 7, 2024

Uh oh!

jacobtomlinson commented Aug 7, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

jacobtomlinson commented Aug 7, 2024 •

edited

Loading