-
Notifications
You must be signed in to change notification settings - Fork 42
Description
Summary
The OpenCRAVAT store is an important part of the project. To improve its usability, this report identifies opportunities to enhance clarity by adding missing descriptions for many tools and columns, and by refining the organization of its categories.
Widgets
Most widgets have no description at all. The "IGV" widget is the only one with a description, but it is too brief.
Packages
The packages generally follow a clear and consistent structure with three standard sections: "Source Annotators," "Filter," and "Reports." However, the "Calibrated Classification Package" is an exception and deviates from this format. I suggest it should follow the same structure.
Items requiring additional or clearer information
- BioGRID: comprehensive interaction repository
- Cancer Gene Landscape
- ClinGen Gene: The current description is clear but potentially misleading. Phrases like "Our curators review genetic and experimental data..." could be misinterpreted to mean that OpenCRAVAT curators perform this review, rather than ClinGen's.
- EVE
- Geuvadis eQTLs
- LoFtool
- NCBI Gene (some text embedded)
- ncRNA (some text embedded)
- NDEx
- Pangolin Splicing (no description here)
- SpliceAI: The column descriptions currently repeat information. It would be clearer if they were revised to focus on the differences between acceptor and donor gain/loss.
- UCSC Genome Browser
Suggestion: It is very helpful to include OpenCRAVAT-specific information in the description cards, such as installation notes, widget details, or system warnings. In my opinion, general information is not needed in these cards, as it is available in other, more appropriate documentation. The DANN item serves as an excellent model to follow, while the ClinGen Gene card currently represents what to avoid.
No description of columns at all
- AllofUs 250k
- CHASMplus
- CHASMplus MSK-IMPACT
- CIViC Gene
- ClinGen Allele Registry
- ClinGen Gene
- dbSNP
- dbSNP Common
- ENCODE TFBS
- ESP6500
- Flanking Sequence
- Geuvadis eQTLs
- gnomAD4
- Grantham Scores
- hg19 coordinates (although self-explanatory)
- HGDP
- LitVar
- NCBI Gene
- ncRNA
- NDEx
- OMIM
- PharmGKB
- PubMed
- Regeneron
- Repeat Sequences
- TARGET
- UCSC Genome Browser
- Uniprot Domain
- VISTA Enhancer Browser
Tools with missing description of some columns (some might be okay)
- ALoFT: "All annotations".
- BRCA1 Saturation Genome Editing Scores: "Function class" column has no description.
- CADD Exome: "Phred" and "Scores" columns description must be taken from CADD.
- Cancer Gene Census: several columns description is missing.
- Cancer Genome Interpreter: "All annotations".
- Candidate cis-Regulatory Elements by ENCODE (SCREEN): "ENCODE Accession ID" and "cCRE Accession ID"
- CEDAR: "CEDAR ID" and "PubMed ID" have no description
- CGD: Clinical Genomic Database: "All annotations".
- CIViC: 3/4 columns have no description
- ClinVar: "Disease Names", "Preferred Disease Names", "HGVS"
- COSMIC: "Variant Count (Tissue)", "Transcript", and "Protein Change"
- Denovo-DB: "Phenotype"
- DGIdb: The Drug Interaction Database: "Drug name", "All annotations"
- ESM1b: "All annotations"
- EVE: "All annotations"
- ExAC Gene and CNV: "CNV Bias/Noise"
- FATHMM: "Prediction"
- FunSeq2: "All annotations"
- GeneHancer: "GeneHancer ID"
- GERP++: "Neutral Rate"
- gMVP: "All annotations"
- gnomAD: almost all columns have no description (it would be helpful to include a number of samples, which countries participated in creating the gnomAD subset etc)
- gnomAD Gene: "Transcript" and "All annotations"
- gnomAD3: almost all columns have no description
- GRASP: almost all columns have no description
- GWAS Catalog: "All annotations"
- Human Phenotype Ontology: "All annotations"
- IntAct: "Raw data"
- InterPro: "All annotations"
- MetaRNN: "All annotations"
- miRBase: "Accession ID"
- MITOMAP: "Disease", "PubMed ID"
- MuPIT: "Link"
- Mutation Assessor: "All annotations"
- MutationTaster: "All annotations"
- MutPred: "Variant", "All annotations"
- MutPred2: "All annotations"
- OncoKB: 7/12 columns description is missing
- PangaloDB: 5/8 columns description is missing
- PolyPhen-2: "All annotations"
- PrimateAI: "Rank Score"
- PROVEAN: Protein Variant Effect Analyzer: "UniProt Accession Number", "All annotations"
- Pseudogene: "Transcript"
- REVEL: "All annotations"
- SIFT: "All annotations"
- Swiss-Prot Binding: "UniProtKB Accession Number", "All annotations"
- Swiss-Prot Domains: "UniProtKB Accession Number", "All annotations"
- Swiss-Prot PTM: "UniProtKB Accession Number", "All annotations"
- VEST4: 4/6 columns description is missing
Suggestions for items classification
- Currently, "Input/Output" and "Reporters" appear among content-themed categories like "Literature", "Mendelian Disease", and "GWAS", which can be confusing. It would be more intuitive if these functional/system categories were grouped closer to the "Packages" section, as they relate more to tool operation than to biological themes.
- At the moment, the "Reporters" category contains only the "RData Reporter", while all reporters (including RData) are also listed under "Input/Output". This overlap may cause confusion.
- To make the menu more intuitive, categories could reflect OpenCRAVAT’s own module types—such as Widgets, Webapps, Annotators, Converters, Reporters, etc.—rather than "Input/Output" and "Visualization".