Replies: 1 comment 1 reply
-
I think this example is a use case for the solution described here: #452 (comment) Basically you need a mechanism to delete intermediate files once they are no longer needed, but also not re-compute them if their downstream outputs already exist. |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
I've been reading though the docs and past discussions, but I cannot understand how to make my use case work.
To give an example, say I have two processes with the following DAG:
Process
A
downloads a giant file, andB
compute some summary statistics on it and stores the result in astoreDir
. The files downloaded inA
are so large I need to delete them once the workflow is done. But the issue is that over time I need to run new samples through the workflow, but I don't wantA
to re-download the files I deleted... because if it does I'll run out of disk space.Is there a way to have nextflow not re-run
A
if output files forB
exist? This is assuming the outputs for past runs ofA
have been deleted. Furthermore, I want to be able to extend the workflow by adding a processC
afterB
, so I'll need it to run the workflow again for all past samples as well (but again, without runningA
again).Beta Was this translation helpful? Give feedback.
All reactions