-
Notifications
You must be signed in to change notification settings - Fork 3
Description
Origin: DataLad matrix chat; Nov 2, 2023
OP asks for suggestions on how to handle marking results for retention with tags:
How do you generally proceed in datalad about marking results for retention? My analysis produces results that are larger than the input data. So I would like to find a compromise between keeping intermediate results for some versions but not for all of them. For a single dataset one can solve this using git tag to keep annexed content of a certain commit from being listed by git annex unused.
However, how do you do it for subdatasets? My analysis has another analysis as a submodule and relies on these data as an input to the computations. So for each tag on the superdataset I would need to create a tag in the subdataset and push these tags to their respective remotes on the archive disk.
So far this can be solved by e.g. if I create a tag project-meeting21 in the superdataset, I could automatically create a tag in the subdataset called needed/(datalad-id-superdataset)/project-meeting21. Now I want to also delete needed/ tags in the subdataset if the corresponding tag in the superdataset is gone. This can lead to problems if I have the datasets in multiple places and delete tags in one of them. How to decide when to delete the needed/ tags and how to make sure that if I delete a tag, it is not added back from another instance of the dataset?
Is there any partial or complete solution to this yet or should I make up a solution on my own?
TODO (not necessarily to be performed in this order)
- Inform OP/Add reference to this issue at origin
- Clarifying Qs asked or not needed
- Nature of the issue is understood
- Inform OP about resolution