-
Notifications
You must be signed in to change notification settings - Fork 8
Description
@dbirman @arjunsridhar12345 @dyf this is to summariza and track progress on upgrading the ephys pipeline to aind-data-schema 2.0
Required changes
- Update
DataProcessinstantiation in Preprocessing, Spike Sorting, Postprocessing, Curation, and Visualization steps (e.g. here) (see preprocessing , spike sorting, postprocessing, curation, visualization) - Update Collect Results capsule to upgrade to/generate 2.0 compliant JSON files (here)
- Update parsing of session/rig to instantiate NWB Devices in NWB Ecephys and NWB Units
- Update QC and QC collector capsules to generate 2.0 compliant quality_control.json files
Issues
Collect Results
The Collect Results uses existing processing.json and data_description.json from the input asset. If these are not 2.0, we need a way to convert them to 2.0 first, so that the pipeline is compatible with existing data assets. We could use the aind-metadata-upgrader when it can upgrade to 2.0.
To give more context, the processing.json is created by the aind-data-transfer and it logs the compression data process. The current behavior of the pipeline is to extend on that Processing object and add the ephys-generated processes. The data_description.json, instead, is used to instantiate the DerivedDataDescription (which will be replaced by DataDescription.from_raw). In both cases, 2.0 processing/data description are needed.
For the processing.json, we could actually create a new object which just logs the processing steps of the ephys pipeline (since it ends up in the result asset anyways...).
For the data_description.json, I think it would be best to use the from_raw, so that all metadata are propagated (e.g., funder, investigators, etc)
NWB
Currently, there is a function that parses session+rig info to instantiate NWB Devices (this function should be moved to aind-nwb-utils @arjunsridhar12345, since currently both the Ecephys and Units capsules have their own copy...). It doesn't currently use aind-data-schema, but parses directly the JSON files. We'll definitely need to update this function, which could support both 1.0/2.0 session/rig schemas to make it compatible with all data assets.