Prevent flow-on after triggering pre-startcp tasks after a warm start#7148
Prevent flow-on after triggering pre-startcp tasks after a warm start#7148oliver-sanders merged 9 commits intocylc:8.6.xfrom
Conversation
They were not obeying prerequisites within the group
Partially reverts e6c4adf
2839f97 to
e2bf93f
Compare
e2bf93f to
83b2f4d
Compare
|
This isn't the approach I was expecting. I think it answers to our use case well enough (triggering R1 tasks at ICP with no flow-on), but because it doesn't address the internal inconsistency, it produces some strange caveat behaviours. Examples so far:
|
See my latest commit c9b8b06 |
Sorry, what is expected to happen here? On 8.6.1, if a parentless ICP task has already run (after a normal cold start), doing |
Hmm, that's actually a pretty bad limitation, isn't it? |
Yes, but as far as we can tell the use cases don't involve these tasks being re-run
|
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
What's your opinion on this @oliver-sanders , given that you raised it above as a problem you weren't expecting. I don't like it much - we generally put a lot of effort into making sure that things continue as planned after a restart - which after all can be forced upon users (scheduler killed, host goes down...). |
|
We can look at fixing this limitation in the longer term (8.7.0) but I think it would require a
db change
|
|
I think this solution is a bit nasty, Cylc's behaviours should be consistent without the need for such hacks. Though I'm less worried about the limitations and more about the potential for bugs. My real concern here is what I have described as an "inconsistency" between the database and the warm-start logic. At Cylc 7, all of the dependency logic was held in the task pool, so the warm start logic propagated forwards, no issue. With Cylc 8, we also go to the database which produces an inconsistency between the warm-start point / pre-initial condition and the workflow's run history. We will be going to the DB more in the future creating more potential inconsistency issues. When these methods give different results, as in this case, Cylc's behaviour becomes undefined and it may start doing strange things (what I'm worried about). I think I came up with a workable solution which patches over the difference (without actually populating DB entries for the missing tasks), but Ronnie says there were issues implementing it. I don't understand what these issues were, but also haven't managed to find the time to try this out myself (things a bit crazy right now). So, the solution implemented in this PR protects a use case that we desperately need fixed on very short timescales. If it weren't for that context, I would look to reject this. But if the alternative is not workable on the required timeframes, it's a different story, perhaps it could be a short-term workaround? |
|
OK I guess I'm happy enough with this, if we post a follow-up issue to detail the problem. It sounds like you have no time to consider other options now, but as I noted on #7101 I still think it is worth considering normal flow behaviour (no history to stop it) but deliberately restricted to not go beyond the triggered group (whose members we know). |
That sounds like what's been implemented here? Might need more details to understand what's being suggested here. Warm start is a special case as the pre-initial condition means that the flow history is assumed, so the solution to this particular case, is to resolve or workaround the internal consistency issue. We shouldn't need anything further as we just want to restore the normal behaviours of Cylc within this graph region. But there may be things beyond this... |
| """ | ||
| >>> IntegerSequence('R6/3/P1', '2') | ||
| <IntegerSequence start=3, stop=8, step=P1, self.p_context_start=2, | ||
| i_offset=P0> |
There was a problem hiding this comment.
Silly little diff to get the coverage to 100%:
| i_offset=P0> | |
| i_offset=P0> | |
| >>> IntegerSequence('R6/P1/3', '2', '4') | |
| <IntegerSequence start=2, stop=3, step=P1, self.p_context_stop=4, | |
| i_offset=P0> |
oliver-sanders
left a comment
There was a problem hiding this comment.
Use case working as intended 👍
(OK you can view this PR that way; I was more thinking of possible explicit flow-stop solutions leveraged to automatically stop after the triggered group hence the link to previous discussion). |
Follow-up to/built on:
Fixes a bug where triggering a task before the start cycle point, on a warm started workflow, will cause downstream tasks to flow on. It fixes this by treating pre-startcp tasks as having run already in flow 1, unless explicitly labelled by the task pool as having been manually triggered.
A limitation of this fix is that if you stop the workflow while the group is in progress, the group will stop flowing.
Repro
Actual behaviour:
foorun at cycles 2-9.Expected behaviour: Only
2/fooruns.Check List
CONTRIBUTING.mdand added my name as a Code Contributor.?.?.xbranch.