-
Notifications
You must be signed in to change notification settings - Fork 2
Description
Two API endpoints that return variant data in CSV format (/score-sets/{urn}/scores and /score-sets/{urn}/counts) use column names specified by the original data uploader. These include
- Some nonempty set of HGVS columns (
hgvs_nt,hgvs_pro, andhgvs_splice) score, for the scores endpoint only- Score-set-specific custom column names, for both endpoints. There are separate sets of custom columns for counts and scores, and their names may overlap.
- And an
accessioncolumn that gives each variant's MaveDB URN.
Ignoring column order, the download's content is identical to the raw CSV data that was originally uploaded, except for the MaveDB-supplied accession column.
It may be useful to provide a namespace version of the CSV export, which would have
accession: The variant's MaveDB URNhgvs_nt,hgvs_pro, and/orhgvs_splicescores.score: The main score columnscores.<custom column>for each additional column originally uploaded in the "scores" CSV filecounts.<custom column>for each column originally uploaded in the "counts" CSV file
In other words, we would namespace all columns except for accession, hgvs_nt, hgvs_pro, and hgvs_splice.
This would allow us to add columns computed by MaveDB or obtained from other data sources, such as
- The ClinGen allele ID;
- Mapped HGVS strings, such as
mavedb.mapped_hgvs_nt_g, `mavedb.mapped_hgvs_nt_c; - And information from ClinVar, gnomAD, or other data sources, suitably namespaces.
It will also allow score, count, and other data to be obtained in a single CSV file without concern for name collision between score and count data or between these and MaveDB-provided columns.
While we do not intend MaveDB as a repository for variant data from other sources, the MaveDB UI will increasingly rely on having efficient access to variant data from ClinVar, gnomAD, etc.