Skip to content

Fix KeyError when accessing lineage dictionary in OBO parser#233

Open
manvikri22 wants to merge 1 commit intoalthonos:masterfrom
manvikri22:feature/fix-graphdata-keyerror
Open

Fix KeyError when accessing lineage dictionary in OBO parser#233
manvikri22 wants to merge 1 commit intoalthonos:masterfrom
manvikri22:feature/fix-graphdata-keyerror

Conversation

@manvikri22
Copy link

Description

This pull request addresses an issue where accessing the lineage dictionary resulted in a KeyError if the superentity was not found.

Changes Made

  • Implemented a conditional check to ensure that the superentity exists in the lineage dictionary before accessing it.
  • Added a warning message to inform users when a superentity is not found, improving the robustness of the parser.

Motivation

This change enhances error handling in the OBO parser, making it more resilient to inconsistencies in the ontology files and providing clearer feedback to users during debugging.

- Added a check to ensure the superentity exists in the lineage dictionary before attempting to access it.
- Added a warning message for better debugging when a superentity is missing.
@althonos
Copy link
Owner

althonos commented Oct 2, 2024

Hi @manvikri22 , do you have an example where such an error happens? My impression is that the KeyError would happen here if a superentity is not declared for a subentity, so it may be better to resolve this more globally to support dangling entities (#225).

@manvikri22
Copy link
Author

manvikri22 commented Oct 3, 2024

Hi @althonos,

Thank you for the feedback!

You're right that the KeyError could occur when a superentity is referenced but not declared (a dangling entity). In the case I encountered, I was working with an OBO file where certain is_a relationships pointed to terms that were either missing or improperly defined. Here's a minimal example that triggers the error:

[Term] id: CL:0000540 name: hematopoietic stem cell is_a: BFO:0000040 ! some undefined biological entity

In this case, the term BFO:0000040 is referenced as a superclass but is not declared in the file, leading to the KeyError when the symmetrize_lineage() function tries to access it. The fix I proposed ensures that the missing entity is handled gracefully by logging a warning, but you're right that this might need a broader resolution to support dangling entities in a more global way.

If you'd like, I can update the pull request or work on a more comprehensive fix based on your guidance.
Do you have specific suggestions or references for the dangling entity resolution? I could also look into updating the parser to log missing entities while preserving the overall structure, or perhaps adding an option to skip undefined superentities gracefully.

Let me know what you think!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants