-
Notifications
You must be signed in to change notification settings - Fork 124
Manage concurrency and dependency of executable content #2413
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
|
@agahkarakuzu it's funny that you looked at this — I was just thinking about this over the JupyterCon flights. I like the concurrency limit, and the idea of a hard-coded ordering. Here are some of my unstructured thoughts: Relative vs Absolute OrderingWhen I was thinking about this, I considered the idea of relative ordering using There are pros/cons to both approaches, and I'm easily convinced either way. Render vs Execution OrderingAs it stands, this PR changes the rendering ordering. This is a problem if people use continuous numbering, which will break with this PR if the ordering disagrees. I think we need to introduce a higher level abstraction for execution (an function executionTransform(mdast, vfile, frontmatter, session) {
// Wait for our turn
await session.executionOrchestrator.wantsExecution(frontmatter.path);
}where this promise is resolved by the orchestrator once the previous notebooks have executed. I'd love to work with you on this, and welcome any other thoughts about things like ordering from the team! |
|
@agoose77 - regarding your second point, I'm less worried about the ordering - this concurrency management only applies to the first Now, I haven't actually tested this - we should probably have some end-to-end tests that combine (1) batched execution and (2) page enumeration. (@agoose77 - I think your code snippet is how we would need to do this once we streamline into a single "better" processing pipeline, as we prototyped in #1699) |
|
Regarding how we define this ordering - that question feels fuzzier and more flexible. Where should it go (toc vs. page vs. elsewhere in the project frontmatter) and how should it be defined (number vs. before/after vs. something else...)? Currently it's in the toc, which is nice and centralized. But not all projects necessarily have an explicit toc. We could also define it on the individual pages - this makes sense for named We probably don't want to allow multiple ways to do this - then we get into resolving conflicts... 😕 My vague preference is individual pages keep track of their |
Ah, of course. I didn't test that yesterday when I was musing about this! |
|
@fwkoch @agoose77 thank you so much for looking into this! Sorry for the late response, been navigating the insane air travelling conditions in the states.
Indeed, it does not take effect when TOC is implicit. Switching to a DAG-based model with explicit dependencies (instead of numerical expressions, dependencies can be defined as notebook or page names) would bring more sophistication. Maybe using Then again, I was not really sure about defining these at the page level from the get go. If you see a better middle ground between impeccable orchestration and this simple implementation, happy to give it a stab. |
|
Could we separate out the (Having a control over the parallelism would solve #1831 and potential resource limit hits, etc.) |
|
@bsipocz by separating out, do you mean addressing it in another PR? |
|
@agahkarakuzu yes, I believe so. Let's do that, if it's not too much work. It's a bit more effort overall, but makes it easier to review and avoid blocking the useful fix! |
I'm sorry for getting back to this just now, but Angus has answered exactly of what I meant. Overall it's just very helpful separating out new features/fixes/etc into separate PRs, you can expect a much quicker turnaround as it's both cleaner/easier to review such contributions but also they are not blocking each other to get in when it's only parts are under discussion. |
Need
Proposed solution
This PR introduces batching and sequencing logic to manage execution order based on the
tocdefinition inmyst.yml. The implementation allows users to control both concurrency and dependency ordering of executable documents.Adding the documentation snippet for clarity:
How to manage the order of execution?
Implicit TOC
If no table of contents (
toc) is defined in your myst.yml, all executable sources are run in parallel by default.Explicit TOC
Managing concurrency without dependency order
By default, executable files are processed concurrently in batches of 5.
You can modify this behavior by passing the
--execute-concurrency <n>option to your build command, where<n>specifies how many executable documents should run simultaneously.--execute-concurrency <n>to your build command to change the number of executable documents that will be executed together.Defining a specific execution order
To define a sequential execution order, use the
execution_orderfield within thetocelement. For example:In this example,
figure_2.ipynbandfigure_3.ipynbwill both wait forfigure_1.ipynbto finish before being executed concurrently.Warning
If a notebook that other notebooks depend on fails during execution, the build process will continue by default. To stop the build whenever an error occurs (including for notebooks without dependencies) pass the
--strictflag to your build command.Tests
I tested this locally across several builds and appears to function correctly. The current implementation provides a straightforward mechanism for managing execution order. It is not a full-fledged workflow-like approach, but I believe still a meaningful improvement for build control.
I am (and @fwkoch) not sure if
tocis the right place to define this logic, but it serves as a reasonable starting point for now.Relates to: #1794, #2055