-
Notifications
You must be signed in to change notification settings - Fork 0
Description
Is your feature request related to a problem? Please describe.
Now that quality_control.json files are being created I'm seeing two failure modes where the state of the QC gets out of sync across assets. This occurs when raw QC is created after derived assets exist.
- First failure mode happens because we allow users to append QC data to assets after they already exist. If the order of events is: (1) raw asset created, (2) derived asset created, (3) user appends QC data to raw asset, then the derived asset is now out of sync (it should have a copy of the raw QC).
- Second failure mode happens if a user creates a derived asset and forgets to copy the original
quality_control.jsonfile, then their derived asset will be out of sync (again it should have inherited any raw QC)
Describe the solution you'd like
Using the data_description, the indexer finds chains of assets that were derived from each other and checks that each subsequent asset has all of the QCEvaluation objects in their parent. It's critical that this goes from raw->derived->derived and that the checks don't go backwards (i.e. you have to make the chains in a first pass and then go through each chain, you can't start from each asset and go back to its raw asset or you get ordering issues).
Describe alternatives you've considered
The QC portal could be responsible for this, but I think that having invisible functionality happening in the background of the portal is a mistake.
Additional context
Link to the QC schema: https://github.com/AllenNeuralDynamics/aind-data-schema/blob/dev/src/aind_data_schema/core/quality_control.py