-
Notifications
You must be signed in to change notification settings - Fork 0
Description
We've had MANIFEST files nearly since the inception of the Data Store (actually, the content was originally included in the README file in each collection, but we decided early-on to move the per-file metadata into two MANIFEST files). However, these files aren't validated and aren't (to my knowledge) being used programmatically.
There are potential uses for per-file metadata though. For example, in the diversity collections, there are often multiple VCFs. Which of these should be displayed in tools such as JBrowse and GCViT? We could a file-naming convention, as we've done with e.g. "genome_main.gff3" in the annotations collections, but if multiple VCFs in a collection should be displayed, a label such as "main" isn't appropriate.
So, the proposal:
- Formalize the MANIFEST file as a (validated) yml document
- Merge the current "descriptions" and "correspondence" files into a single file MANIFEST.metadata_file_prefix.yml
- Allow additional fields with programmatic use, e.g. "display: true"
An example of the proposed merged, restructured file, from collection Glycine/max/diversity/Wm82.gnm2.div.Wickland_Battu_2017
cat MANIFEST.Wm82.gnm2.div.Wickland_Battu_2017.yml
---
- name: glyma.Wm82.gnm2.div.Wickland_Battu_2017.SNPdata1.vcf.gz
description: genotype information from Population 1; 378 F2 lines resulting from
a cross between Prize and an NMU-mutagenized individual of Williams 82.
display: true
prior_names:
- glyma.Wm82.gnm2.div.RW0X.SNPdata1.vcf.gz
- Pop1_SNPs_minDP2.vcf.gz
- name: glyma.Wm82.gnm2.div.Wickland_Battu_2017.SNPdata2.vcf.gz
description: genotype information from Population 2; 391 F2 individuals from a -
cross between two breeding lines.
display: true
prior_names:
- glyma.Wm82.gnm2.div.RW0X.SNPdata2.vcf.gz
- Pop2_SNPs_minDP2.vcf.gz
- name: glyma.Wm82.gnm2.div.Wickland_Battu_2017.SNPdata3.vcf.gz
description: genotype information from Population 3; 81 unrelated accessions -
that form an association panel.
display: true
prior_names:
- glyma.Wm82.gnm2.div.RW0X.SNPdata3.vcf.gz
- Pop3_SNPs_minDP2.vcf.gz