-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Description
Context
The metpo repo now has a working deterministic resolver pipeline for MetaTraits -> KGX:
fetch-metatraits— scrapes the MetaTraits trait catalog (2,860 cards)resolve-metatraits-in-sheets— resolves traits to METPO predicates/objects usingmetpo_sheet.tsv+metpo-properties.tsvdemo-metatraits-mongo-to-kgx— demonstrates the full transform from MongoDB records to KGX TSV
This issue tracks the operational handoff to Anthea for production MetaTraits API integration and KGX transform in her external repos.
Key handoff artifacts
- Encoding examples:
docs/metatraits_kgx_encoding_examples.md— worked examples for all edge patterns - External repo guide:
docs/metatraits_external_repo_handoff.md— architecture constraints and code skeleton - Resolution table:
data/mappings/metatraits_in_sheet_resolution.tsv— the deterministic lookup table - Coverage report:
data/mappings/metatraits_in_sheet_resolution_report.md
Taxon list scoping
Anthea's current approach queries ~2.7M taxon IDs against the MetaTraits API. The MetaTraits species-level data covers ~55K NCBI species and ~65K GTDB species. Her query set should be scoped to ~120K IDs (the union of NCBI + GTDB species in MetaTraits) rather than the full NCBI taxonomy.
Reference crosswalk data available in local MongoDB metatraits.ncbi2gtdb (92,711 entries).
Implementation ownership split
metpo repo provides (deterministic, tested):
- Trait -> predicate routing via resolution table
- Predicate positive/negative pair selection
- Object CURIE resolution (CHEBI, GO, EC)
- KGX edge schema with Biolink compliance
Anthea owns (in kg-microbe or KG-Microbe-search):
- API client design (endpoint selection, batching, rate limiting, retries)
- Taxon list scoping and download orchestration
- Persistence of API payloads
- Integration tests against fixture records
Architecture constraints
Per the handoff doc:
- Do NOT reuse legacy kg-microbe transform code/config as implementation base
- Preferred layered architecture:
acquire->normalize->resolve->emit - No fuzzy matching in predicate routing — use deterministic exact lookup
- Preserve positive/negative predicate distinction
Acceptance criteria
- Deterministic outputs for fixed fixture input
- Composed traits preserve substrate/object when present
- Positive/negative assay outcomes route to correct predicate pair
- No fallback to generic predicates for unresolved categories (fail loudly)
Cross-references
- CultureBotAI/KG-Microbe-search#1 — Anthea's API download issue
- Add METPO predicate pairs for high-volume unresolved MetaTraits categories #353 — predicate pairs gap (limits what can be expressed)
- define quality metrics esp regarding how mappings are used in KG-Microbe #204 — quality metrics for mapping coverage
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels