Skip to content

QA of Genetics KP Ingest #341

@mbrush

Description

@mbrush

QA Resources:


This is simply a pass through ingest for now, and can move ahead into the refactored KG store.

A few specific issues that popped out immediately:

  • uses a deprecated predicate (genetic_association).
  • has edges going in both directions stating the same associations (BiologicalProcess - genetic association -> DiseaseorPhenotypicFeature and DiseaseorPhenotypicFeature - genetic association -> BiologicalProcess)
  • I know Genetics KP provides some statistical scores (e.g.MAGMA p-values) that are useful to weight edges, but these are not provided in the current KG (perhaps because the edge property used is not in biolink?)
  • The latest KGX summary report for Genetics KP shows genetic association edges between Diseases/Phenotypes and lots of categories of entities (things like cellular components, molecular activities) and also between Genes and lots of categories of entities (things like procedures, chemicals, molecular activities, pathways, ICEs, clinical attributes). How are these generated? I reviewed an older Genetics KP presentation from the beginning of Phase 2 and nothing in there explains these types of edges.
  • if there are any supporting data sources that they use in generating their association edges, these should be captured in the data.

Some questions/Considerations:

  • Does Genetics KP ingest data from external KBs like ClinVar, ClinGen, GenCC, Genebass? Documentation here suggests this is the case. If so, how is this data used (are you an aggregator - i.e. you structure knowledge from these sources using Biolink-based edges, and provide them in your KP. Or are these supporting data sources that provide inputs for algorithms that calculate genetic associations between conditions and genes, pathways, etc)
  • We will want to consider modeling refactors / additions for GWAS-based gene and pathway to Condition associations, w.r.t. the predicates and qualifiers and EPC support - to full represent the nuance and utility and provenance of these edges, and make them more useful to Translator use cases. As we do this, we will want to be sure we fully understand the provenance and methods and supporting data behind the edges, and the utility of this knowledge, to ensure it is adequately represented and leveraged.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions