-
Notifications
You must be signed in to change notification settings - Fork 4
QA of Genetics KP Ingest #341
Copy link
Copy link
Open
Description
QA Resources:
- KGX Summary report: https://docs.google.com/spreadsheets/d/199Qn0F_9X0Hq_D2HsbX7IlCFMjIBdzJGAbvbqyYl7_Y/edit?gid=1275530030#gid=1275530030
- RIG (original): https://github.com/NCATSTranslator/translator-ingests/blob/main/src/translator_ingest/ingests/geneticskp/geneticskp_rig.yaml
- RIG (PR with QA changes):
This is simply a pass through ingest for now, and can move ahead into the refactored KG store.
A few specific issues that popped out immediately:
- uses a deprecated predicate (genetic_association).
- has edges going in both directions stating the same associations (BiologicalProcess - genetic association -> DiseaseorPhenotypicFeature and DiseaseorPhenotypicFeature - genetic association -> BiologicalProcess)
- I know Genetics KP provides some statistical scores (e.g.MAGMA p-values) that are useful to weight edges, but these are not provided in the current KG (perhaps because the edge property used is not in biolink?)
- The latest KGX summary report for Genetics KP shows genetic association edges between Diseases/Phenotypes and lots of categories of entities (things like cellular components, molecular activities) and also between Genes and lots of categories of entities (things like procedures, chemicals, molecular activities, pathways, ICEs, clinical attributes). How are these generated? I reviewed an older Genetics KP presentation from the beginning of Phase 2 and nothing in there explains these types of edges.
- if there are any supporting data sources that they use in generating their association edges, these should be captured in the data.
Some questions/Considerations:
- Does Genetics KP ingest data from external KBs like ClinVar, ClinGen, GenCC, Genebass? Documentation here suggests this is the case. If so, how is this data used (are you an aggregator - i.e. you structure knowledge from these sources using Biolink-based edges, and provide them in your KP. Or are these supporting data sources that provide inputs for algorithms that calculate genetic associations between conditions and genes, pathways, etc)
- We will want to consider modeling refactors / additions for GWAS-based gene and pathway to Condition associations, w.r.t. the predicates and qualifiers and EPC support - to full represent the nuance and utility and provenance of these edges, and make them more useful to Translator use cases. As we do this, we will want to be sure we fully understand the provenance and methods and supporting data behind the edges, and the utility of this knowledge, to ensure it is adequately represented and leveraged.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels