Skip to content

Standardize SSSOM object_id normalization and curie_map-based parsing#352

Merged
turbomam merged 4 commits intomainfrom
issue-351-sssom-curie-normalization
Feb 13, 2026
Merged

Standardize SSSOM object_id normalization and curie_map-based parsing#352
turbomam merged 4 commits intomainfrom
issue-351-sssom-curie-normalization

Conversation

@turbomam
Copy link
Contributor

@turbomam turbomam commented Feb 13, 2026

Summary

  • normalize SSSOM object_id values at write time in chromadb_semantic_mapper.py
  • emit dynamic curie_map metadata for prefixes actually used in output
  • add shared SSSOM utility helpers for parsing curie_map and extracting/normalizing identifiers
  • update key SSSOM consumer scripts to use shared curie_map-driven prefix extraction and centralized fallback logic
  • tighten curie_map parser boundaries to avoid treating non-curie metadata as prefix entries

Files changed

  • metpo/pipeline/chromadb_semantic_mapper.py
  • metpo/utils/sssom_utils.py
  • metpo/utils/init.py
  • metpo/analysis/analyze_ontology_value.py
  • metpo/analysis/extract_definitions_from_mappings.py
  • metpo/presentations/analyze_primary_sources.py

Validation

  • uv run ruff check metpo/utils/sssom_utils.py metpo/pipeline/chromadb_semantic_mapper.py metpo/analysis/analyze_ontology_value.py metpo/analysis/extract_definitions_from_mappings.py metpo/presentations/analyze_primary_sources.py
  • python3 -m py_compile metpo/utils/sssom_utils.py metpo/pipeline/chromadb_semantic_mapper.py metpo/analysis/analyze_ontology_value.py metpo/analysis/extract_definitions_from_mappings.py metpo/presentations/analyze_primary_sources.py

Scope note

This PR standardizes the current in-repo SSSOM handling and resolves review feedback. Full migration to curies/sssom-py and validator integration is tracked separately in #351.

Addresses #351

Copilot AI review requested due to automatic review settings February 13, 2026 18:59
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR standardizes how SSSOM object_id prefixes are derived across METPO scripts by introducing shared utilities for parsing curie_map metadata and extracting prefixes, and by normalizing output identifiers during SSSOM generation.

Changes:

  • Added metpo.utils.sssom_utils with helpers to parse # curie_map: blocks and extract prefixes from CURIE/IRI identifiers.
  • Updated multiple analysis/presentation scripts to use curie_map-driven prefix extraction instead of ad hoc hostname heuristics.
  • Updated the ChromaDB semantic mapper to normalize object_id values (CURIE when safe) and emit a dynamic curie_map covering used prefixes.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
metpo/utils/sssom_utils.py New shared helpers for parsing SSSOM curie_map headers and extracting prefixes from identifiers.
metpo/utils/init.py Establishes metpo.utils as a package for shared utilities.
metpo/presentations/analyze_primary_sources.py Switches SSSOM prefix counting to use curie_map + shared extract_prefix().
metpo/pipeline/chromadb_semantic_mapper.py Normalizes written object_id values and writes a dynamically generated curie_map.
metpo/analysis/extract_definitions_from_mappings.py Uses parsed curie_map + shared prefix extraction when classifying object_id sources.
metpo/analysis/analyze_ontology_value.py Uses parsed curie_map + shared prefix extraction to compute term_source.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@turbomam turbomam merged commit 3123728 into main Feb 13, 2026
4 checks passed
@turbomam turbomam deleted the issue-351-sssom-curie-normalization branch February 13, 2026 19:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants