Skip to content

RFO: Restructure, formalize, and validate MANIFEST files #50

@StevenCannon-USDA

Description

@StevenCannon-USDA

We've had MANIFEST files nearly since the inception of the Data Store (actually, the content was originally included in the README file in each collection, but we decided early-on to move the per-file metadata into two MANIFEST files). However, these files aren't validated and aren't (to my knowledge) being used programmatically.

There are potential uses for per-file metadata though. For example, in the diversity collections, there are often multiple VCFs. Which of these should be displayed in tools such as JBrowse and GCViT? We could a file-naming convention, as we've done with e.g. "genome_main.gff3" in the annotations collections, but if multiple VCFs in a collection should be displayed, a label such as "main" isn't appropriate.

So, the proposal:

  1. Formalize the MANIFEST file as a (validated) yml document
  2. Merge the current "descriptions" and "correspondence" files into a single file MANIFEST.metadata_file_prefix.yml
  3. Allow additional fields with programmatic use, e.g. "display: true"

An example of the proposed merged, restructured file, from collection Glycine/max/diversity/Wm82.gnm2.div.Wickland_Battu_2017

cat MANIFEST.Wm82.gnm2.div.Wickland_Battu_2017.yml
---
- name: glyma.Wm82.gnm2.div.Wickland_Battu_2017.SNPdata1.vcf.gz
  description: genotype information from Population 1; 378 F2 lines resulting from
    a cross between Prize and an NMU-mutagenized individual of Williams 82.
  display: true
  prior_names:
    - glyma.Wm82.gnm2.div.RW0X.SNPdata1.vcf.gz
    - Pop1_SNPs_minDP2.vcf.gz
- name: glyma.Wm82.gnm2.div.Wickland_Battu_2017.SNPdata2.vcf.gz
  description: genotype information from Population 2; 391 F2 individuals from a -
    cross between two breeding lines.
  display: true
  prior_names:
    - glyma.Wm82.gnm2.div.RW0X.SNPdata2.vcf.gz
    - Pop2_SNPs_minDP2.vcf.gz
- name: glyma.Wm82.gnm2.div.Wickland_Battu_2017.SNPdata3.vcf.gz
  description: genotype information from Population 3; 81 unrelated accessions -
    that form an association panel.
  display: true
  prior_names:
    - glyma.Wm82.gnm2.div.RW0X.SNPdata3.vcf.gz
    - Pop3_SNPs_minDP2.vcf.gz

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions