Skip to content

Remove all mappings in custom_curies.yaml and code using it in data source ETL pipelines #482

@sujaypatil96

Description

@sujaypatil96

There are still some mappings captured in the custom_curies.yaml file here: https://github.com/Knowledge-Graph-Hub/kg-microbe/blob/master/kg_microbe/transform_utils/custom_curies.yaml but none of those mappings are being pulled into the KGX transformation output/TSV files (check for the presence/absence of curie prefixes in output).

The reason for that being the pipeline code written "defensively". For example the way it has been written here in the BacDive transformation pipeline: https://github.com/Knowledge-Graph-Hub/kg-microbe/blob/master/kg_microbe/transform_utils/bacdive/bacdive.py#L409-L505 while constructing edges using information from "keywords" in the BacDive data source.

We need to make sure that all the remaining "mappings" that still exist in the custom curies YAML file are removed without the loss on any information in the KGX transformation output files, and the file deleted altogether. Once the YAML file has been removed we need to make sure that the code in the transformation pipelines (ex. BacDive, BactoTraits, Madin et al) that leverage that file is also removed.

Metadata

Metadata

Assignees

No one assigned

    Labels

    pipeline upgradesUpgrades to data source transformation pipelines

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions