QA of Genetics KP Ingest

**QA Resources:** 
- KGX Summary report:  https://docs.google.com/spreadsheets/d/199Qn0F_9X0Hq_D2HsbX7IlCFMjIBdzJGAbvbqyYl7_Y/edit?gid=1275530030#gid=1275530030
- RIG (original):  https://github.com/NCATSTranslator/translator-ingests/blob/main/src/translator_ingest/ingests/geneticskp/geneticskp_rig.yaml
- RIG (PR with QA changes): 

------

This is simply a pass through ingest for now, and can move ahead into the refactored KG store. 

A few specific issues that popped out immediately:
- uses a deprecated predicate (genetic_association).
- has edges going in both directions stating the same associations (BiologicalProcess - _genetic association_ -> DiseaseorPhenotypicFeature and DiseaseorPhenotypicFeature - _genetic association_ -> BiologicalProcess)
- I know Genetics KP provides some statistical scores (e.g.MAGMA p-values) that are useful to weight edges, but these are not provided in the current KG (perhaps because the edge property used is not in biolink?)
-  The latest KGX summary report for Genetics KP shows genetic association edges between Diseases/Phenotypes and lots of categories of entities (things like cellular components, molecular activities) and also between Genes and lots of categories of entities (things like procedures, chemicals, molecular activities, pathways, ICEs, clinical attributes).  How are these generated?  I reviewed an older Genetics KP presentation from the beginning of Phase 2 and nothing in there explains these types of edges. 
- if there are any supporting data sources that they use in generating their association edges, these should be captured in the data. 


Some questions/Considerations:
- Does Genetics KP ingest data from external KBs like ClinVar, ClinGen, GenCC, Genebass? Documentation [here](https://github.com/NCATSTranslator/Translator-All/wiki/Genetics-Knowledge-Provider) suggests this is the case.  If so, how is this data used (are you an aggregator - i.e. you structure knowledge from these sources using Biolink-based edges, and provide them in your KP.  Or are these supporting data sources that provide inputs for algorithms that calculate genetic associations between conditions and genes, pathways, etc)
- We will want to consider modeling refactors / additions for GWAS-based gene and pathway to Condition associations,  w.r.t. the predicates and qualifiers and EPC support - to full represent the nuance and utility and provenance of these edges, and make them more useful to Translator use cases. As we do this, we will want to be sure we fully understand the provenance and methods and supporting data behind the edges, and the utility of this knowledge, to ensure it is adequately represented and leveraged. 




Provide feedback

Saved searches

Use saved searches to filter your results more quickly

QA of Genetics KP Ingest #341

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

QA of Genetics KP Ingest #341

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions