Skip to content

Prevent flow-on after triggering pre-startcp tasks after a warm start#7148

Merged
oliver-sanders merged 9 commits intocylc:8.6.xfrom
MetRonnie:trigger-warm-flow-on
Dec 18, 2025
Merged

Prevent flow-on after triggering pre-startcp tasks after a warm start#7148
oliver-sanders merged 9 commits intocylc:8.6.xfrom
MetRonnie:trigger-warm-flow-on

Conversation

@MetRonnie
Copy link
Member

@MetRonnie MetRonnie commented Dec 15, 2025

Follow-up to/built on:

Fixes a bug where triggering a task before the start cycle point, on a warm started workflow, will cause downstream tasks to flow on. It fixes this by treating pre-startcp tasks as having run already in flow 1, unless explicitly labelled by the task pool as having been manually triggered.

A limitation of this fix is that if you stop the workflow while the group is in progress, the group will stop flowing.

Repro

[scheduler]
    allow implicit tasks = True

[scheduling]
    cycling mode = integer
    initial cycle point = 1
    [[graph]]
        P1 = foo[-P1] => foo
$ cylc play wflow --startcp 10 --holdcp 9
$ cylc trigger wflow//2/foo

Actual behaviour: foo run at cycles 2-9.

Expected behaviour: Only 2/foo runs.

Check List

  • I have read CONTRIBUTING.md and added my name as a Code Contributor.
  • Contains logically grouped changes (else tidy your branch by rebase).
  • Does not contain off-topic changes (use other PRs for other changes).
  • No dependency changes
  • Tests are included
  • Changelog entry included if this is a change that can affect users
  • Docs PR: Add intervention example for triggering pre-startcp tasks after a warm start cylc-doc#899
  • If this is a bug fix, PR should be raised against the relevant ?.?.x branch.

@MetRonnie MetRonnie added this to the 8.6.2 milestone Dec 15, 2025
@MetRonnie MetRonnie self-assigned this Dec 15, 2025
@MetRonnie MetRonnie added the bug Something is wrong :( label Dec 15, 2025
@MetRonnie MetRonnie force-pushed the trigger-warm-flow-on branch from 2839f97 to e2bf93f Compare December 15, 2025 12:37
@MetRonnie MetRonnie marked this pull request as ready for review December 15, 2025 12:37
@MetRonnie MetRonnie force-pushed the trigger-warm-flow-on branch from e2bf93f to 83b2f4d Compare December 15, 2025 13:02
@wxtim wxtim self-requested a review December 15, 2025 15:31
Copy link
Member

@wxtim wxtim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! :)

@oliver-sanders
Copy link
Member

oliver-sanders commented Dec 15, 2025

This isn't the approach I was expecting. I think it answers to our use case well enough (triggering R1 tasks at ICP with no flow-on), but because it doesn't address the internal inconsistency, it produces some strange caveat behaviours. Examples so far:

  1. Flow-on is broken by restart (mentioned in OP).
    • The new field is not backed up in the DB so is not restart-safe.
  2. Flow-on is broken for --flow=new.
    • Trigger the R1 tasks with --flow=new.
    • The flow-on is suppressed, even though we explicitly requested it.
  3. Parentless task spawning is suppressed.
    • cylc set --pre=all does nothing for ICP tasks.

@MetRonnie
Copy link
Member Author

Flow-on is broken for --flow=new.

  • Trigger the R1 tasks with --flow=new.
  • The flow-on is suppressed, even though we explicitly requested it.

See my latest commit c9b8b06

@MetRonnie MetRonnie requested a review from wxtim December 15, 2025 17:59
@MetRonnie
Copy link
Member Author

MetRonnie commented Dec 15, 2025

Parentless task spawning is suppressed.

  • cylc set --pre=all does nothing for ICP tasks.

Sorry, what is expected to happen here?

On 8.6.1, if a parentless ICP task has already run (after a normal cold start), doing cylc set --pre=all doesn't do anything

@hjoliver
Copy link
Member

A limitation of this fix is that if you stop the workflow while the group is in progress, the group will stop flowing.

Hmm, that's actually a pretty bad limitation, isn't it?

@wxtim
Copy link
Member

wxtim commented Dec 16, 2025

Hmm, that's actually a pretty bad limitation, isn't it?

Yes, but as far as we can tell the use cases don't involve these tasks being re-run

  • Very often
  • For very long
    We should document this limitation tho.

@MetRonnie

This comment was marked as resolved.

@wxtim

This comment was marked as resolved.

@hjoliver
Copy link
Member

Hmm, that's actually a pretty bad limitation, isn't it?

Yes, but as far as we can tell the use cases don't involve these tasks being re-run

* Very often

* For very long
  We should document this limitation tho.

What's your opinion on this @oliver-sanders , given that you raised it above as a problem you weren't expecting.

I don't like it much - we generally put a lot of effort into making sure that things continue as planned after a restart - which after all can be forced upon users (scheduler killed, host goes down...).

@MetRonnie
Copy link
Member Author

We can look at fixing this limitation in the longer term (8.7.0) but I think it would require a db change Change to the workflow database structure . Unfortunately my attempts at hacking this with flow=None or negative flow numbers have not born fruit. I think this task pool flag is the quickest and least compromised solution for 8.6.2 which we need to get out ASAP

@oliver-sanders
Copy link
Member

oliver-sanders commented Dec 17, 2025

@hjoliver

I think this solution is a bit nasty, Cylc's behaviours should be consistent without the need for such hacks.

Though I'm less worried about the limitations and more about the potential for bugs. My real concern here is what I have described as an "inconsistency" between the database and the warm-start logic. At Cylc 7, all of the dependency logic was held in the task pool, so the warm start logic propagated forwards, no issue. With Cylc 8, we also go to the database which produces an inconsistency between the warm-start point / pre-initial condition and the workflow's run history. We will be going to the DB more in the future creating more potential inconsistency issues.

When these methods give different results, as in this case, Cylc's behaviour becomes undefined and it may start doing strange things (what I'm worried about).

I think I came up with a workable solution which patches over the difference (without actually populating DB entries for the missing tasks), but Ronnie says there were issues implementing it. I don't understand what these issues were, but also haven't managed to find the time to try this out myself (things a bit crazy right now).


So, the solution implemented in this PR protects a use case that we desperately need fixed on very short timescales. If it weren't for that context, I would look to reject this. But if the alternative is not workable on the required timeframes, it's a different story, perhaps it could be a short-term workaround?

@hjoliver
Copy link
Member

hjoliver commented Dec 18, 2025

OK I guess I'm happy enough with this, if we post a follow-up issue to detail the problem.


It sounds like you have no time to consider other options now, but as I noted on #7101 I still think it is worth considering normal flow behaviour (no history to stop it) but deliberately restricted to not go beyond the triggered group (whose members we know).

@oliver-sanders
Copy link
Member

oliver-sanders commented Dec 18, 2025

I still think it is worth considering normal flow behaviour (no history to stop it) but deliberately restricted to not go beyond the triggered group

That sounds like what's been implemented here? Might need more details to understand what's being suggested here.

Warm start is a special case as the pre-initial condition means that the flow history is assumed, so the solution to this particular case, is to resolve or workaround the internal consistency issue. We shouldn't need anything further as we just want to restore the normal behaviours of Cylc within this graph region.

But there may be things beyond this...

@oliver-sanders
Copy link
Member

#7153

"""
>>> IntegerSequence('R6/3/P1', '2')
<IntegerSequence start=3, stop=8, step=P1, self.p_context_start=2,
i_offset=P0>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Silly little diff to get the coverage to 100%:

Suggested change
i_offset=P0>
i_offset=P0>
>>> IntegerSequence('R6/P1/3', '2', '4')
<IntegerSequence start=2, stop=3, step=P1, self.p_context_stop=4,
i_offset=P0>

Copy link
Member

@oliver-sanders oliver-sanders left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use case working as intended 👍

@oliver-sanders oliver-sanders merged commit a89fba8 into cylc:8.6.x Dec 18, 2025
24 of 25 checks passed
@MetRonnie MetRonnie deleted the trigger-warm-flow-on branch December 18, 2025 11:53
@hjoliver
Copy link
Member

That sounds like what's been implemented here? Might need more details to understand what's being suggested here.

(OK you can view this PR that way; I was more thinking of possible explicit flow-stop solutions leveraged to automatically stop after the triggered group hence the link to previous discussion).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something is wrong :(

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants