Standardize SSSOM object_id normalization and curie_map-based parsing#352
Merged
Standardize SSSOM object_id normalization and curie_map-based parsing#352
Conversation
Contributor
There was a problem hiding this comment.
Pull request overview
This PR standardizes how SSSOM object_id prefixes are derived across METPO scripts by introducing shared utilities for parsing curie_map metadata and extracting prefixes, and by normalizing output identifiers during SSSOM generation.
Changes:
- Added
metpo.utils.sssom_utilswith helpers to parse# curie_map:blocks and extract prefixes from CURIE/IRI identifiers. - Updated multiple analysis/presentation scripts to use
curie_map-driven prefix extraction instead of ad hoc hostname heuristics. - Updated the ChromaDB semantic mapper to normalize
object_idvalues (CURIE when safe) and emit a dynamiccurie_mapcovering used prefixes.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| metpo/utils/sssom_utils.py | New shared helpers for parsing SSSOM curie_map headers and extracting prefixes from identifiers. |
| metpo/utils/init.py | Establishes metpo.utils as a package for shared utilities. |
| metpo/presentations/analyze_primary_sources.py | Switches SSSOM prefix counting to use curie_map + shared extract_prefix(). |
| metpo/pipeline/chromadb_semantic_mapper.py | Normalizes written object_id values and writes a dynamically generated curie_map. |
| metpo/analysis/extract_definitions_from_mappings.py | Uses parsed curie_map + shared prefix extraction when classifying object_id sources. |
| metpo/analysis/analyze_ontology_value.py | Uses parsed curie_map + shared prefix extraction to compute term_source. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Files changed
Validation
Scope note
This PR standardizes the current in-repo SSSOM handling and resolves review feedback. Full migration to curies/sssom-py and validator integration is tracked separately in #351.
Addresses #351