prenatalppkt

A Python library for transforming raw prenatal sonography data into standardized GA4GH Phenopackets (v2) with clinically validated fetal growth references from NIHCD and INTERGROWTH-21st.

Overview

prenatalppkt bridges the gap between clinical prenatal ultrasound measurements and machine-readable, ontology-aware phenotype representations. The library:

Standardizes biometric data from multiple ultrasound reporting systems (Observer JSON, ViewPoint Excel)
Evaluates measurements against authoritative growth references (NIHCD and INTERGROWTH-21st)
Maps percentile classifications to Human Phenotype Ontology (HPO) terms
Generates GA4GH Phenopackets with complete provenance and metadata

This enables federated genomic repositories to integrate prenatal phenotype data with whole exome/genome sequencing (WES/WGS) results in a consistent, computationally tractable format.

Motivation

Clinical Context

Prenatal ultrasound biometry provides critical developmental markers for fetal health assessment. Key measurements include:

Head Circumference (HC): Marker for brain development
Biparietal Diameter (BPD): Skull width measurement
Abdominal Circumference (AC): Indicates fetal nutrition status
Femur Length (FL): Long bone growth indicator
Occipito-Frontal Diameter (OFD): Alternative skull measurement
Estimated Fetal Weight (EFW): Overall growth assessment

Technical Challenges

Data heterogeneity: Ultrasound systems export data in proprietary formats (ViewPoint Excel, Observer JSON)
Reference ambiguity: Multiple growth standards exist (NIHCD, INTERGROWTH-21st) with different population bases
Ontology mapping: Converting numeric percentiles to standardized phenotype terms requires domain expertise
Genomic integration: Linking prenatal observations to genetic data demands structured, machine-readable formats
Maintainability: Hard-coded mapping logic becomes brittle as clinical guidelines evolve

Solution

prenatalppkt provides a unified, configuration-driven pipeline from raw measurements to Phenopackets, enabling:

Reproducible phenotype analysis across institutions
Integration with genomic variant interpretation workflows
Federated data sharing with privacy-preserving pseudonymization
Flexible ontology mapping through declarative YAML configuration
Longitudinal tracking of fetal development

Architecture

The system implements a data-driven, configuration-based architecture with clean separation between measurement evaluation, ontology mapping, and export logic.

Core Design Principles

Configuration over Code: HPO term mappings are defined declaratively in YAML, not hard-coded in Python classes
Dependency Injection: Measurement evaluators receive configuration at instantiation, enabling flexible testing and deployment
Single Responsibility: Each component has one well-defined purpose
Open/Closed Principle: New measurements and mapping rules are added via configuration files, not code changes

System Layers

flowchart LR
    subgraph L1["Layer 1: Configuration (YAML)"]
        YAML["data/mappings/biometry_hpo_mappings.yaml
        • Percentile ranges (min/max)
        • HPO term IDs and labels
        • Normal/abnormal flags"]
    end

    subgraph L2["Layer 2: Data Models"]
        PR["PercentileRange
        • min_percentile
        • max_percentile
        • contains(percentile)"]
        
        TB["TermBin
        • range: PercentileRange
        • hpo_id, hpo_label
        • normal: bool
        • category (auto-detected)"]
        
        TO["TermObservation
        • hpo_id, hpo_label
        • observed: bool
        • gestational_age
        • percentile"]
    end

    subgraph L3["Layer 3: Loading & Validation"]
        Loader["BiometryMappingLoader
        • load(yaml_path)
        • Parses YAML → TermBin objects
        • Validates ranges
        • Sorts by min_percentile"]
    end

    subgraph L4["Layer 4: Business Logic"]
        SM["SonographicMeasurement
        • measurement_type: str
        • term_bins: List[TermBin]
        • from_percentile() → TermObservation"]
        
        Factory["MeasurementEvaluation
        • Factory pattern
        • Loads all mappings once
        • get_measurement_mapper()"]
    end

    subgraph L5["Layer 5: Reference Data"]
        Ref["FetalGrowthPercentiles
        • NIHCD / INTERGROWTH-21st tables
        • Percentile calculation
        • Z-score calculation"]
    end

    subgraph L6["Layer 6: Export"]
        Export["PhenotypicExporter
        • Phenopacket v2 assembly
        • QC validation
        • JSON serialization"]
    end

    YAML --> Loader
    Loader --> PR
    Loader --> TB
    
    Loader --> Factory
    Factory --> SM
    SM --> TO
    
    Ref --> SM
    TO --> Export

    classDef config fill:#fff4e6,stroke:#333,stroke-width:2px
    classDef model fill:#e3f2fd,stroke:#333,stroke-width:2px
    classDef logic fill:#f3e5f5,stroke:#333,stroke-width:2px
    classDef export fill:#e8f5e9,stroke:#333,stroke-width:2px
    
    class YAML config
    class PR,TB,TO model
    class Loader,SM,Factory logic
    class Ref logic
    class Export export

Key Architectural Choice: Configuration-Driven Mapping

OLD APPROACH (Hard-coded):

# Each measurement had its own class with hard-coded logic
class HeadCircumferenceMeasurement(SonographicMeasurement):
   def get_bin_to_term_mapping(self):
       return {
           "below_3p": MinimalTerm("HP:0000252", "Microcephaly"),
           "between_3p_5p": MinimalTerm("HP:0040195", "Decreased HC"),
           # ... 6 more hard-coded bins
       }

NEW APPROACH (Data-driven):

# data/mappings/biometry_hpo_mappings.yaml
head_circumference:
 - min: 0
   max: 3
   id: "HP:0000252"
   label: "Microcephaly"
   normal: false
 
 - min: 3
   max: 5
   id: "HP:0040195"
   label: "Decreased head circumference"
   normal: false
 # ... all 8 ranges covering 0-100 percentile

# Python code loads configuration, no hard-coding needed
factory = MeasurementEvaluation()  # Loads YAML once
mapper = factory.get_measurement_mapper("head_circumference")
observation = mapper.from_percentile(2.1, gestational_age)
# Returns: TermObservation(hpo_id="HP:0000252", hpo_label="Microcephaly", ...)

Parsing Observer (JSON)

The prenatalppkt package parses and transforms ultrasound ("Observer") JSON data into structured Python data transfer objects (DTOs). For example, each top-level fetuses array element represents one fetus, with standardized subkeys:

JSON subkey	Parser	DTO	Purpose
`fetus`	`FetusFetusParser`	`FetusCoreData`	Core fetal metadata (GA, sex, presentation)
`anatomy_text`	`FetusAnatomyTextParser`	`hpo_term_list` (`List[SimpleTerm]`)	Qualitative HPO terms from anatomy report
`measurements`	`FetusMeasurementsParser`	`MeasurementsData` (`Measurement` list)	Quantitative biometric data
`ratios`	`FetusRatiosParser`	`FetusRatiosData`	Computed biometric ratios (e.g., HC/AC)
`efws`	`FetusEfwParser`	`FetusEfwData` (`EfwEntry` list)	Estimated fetal weights

A central FetusParser coordinates all sub-parsers and assembles their results into a unified FetusData object. This modular architecture ensures each JSON subkey is isolated, testable, and easily extendable for future Observer fields (e.g., placenta, bpp, etc.).

Package structure

graph TD
A[ExamDataParser] --> B[FetusParser]
B --> C1[FetusFetusParser]
B --> C2[FetusAnatomyTextParser]
B --> C3[FetusMeasurementsParser]
B --> C4[FetusRatiosParser]
B --> C5[FetusEfwParser]

C1 --> D1[FetusCoreData]
C2 --> D2[List of SimpleTerm]
C3 --> D3[MeasurementsData / Measurement]
C4 --> D4[FetusRatiosData / Ratio]
C5 --> D5[FetusEfwData / EfwEntry]

Each fetus_*_parser.py is responsible for interpreting a single JSON section and producing its corresponding DTO in prenatalppkt/dto/fetuses/. The FetusData class then aggregates all of them into a cohesive representation for one fetus.

Fetus Parsing Flow

graph TD
 JSON[Observer JSON fetuses] --> FP[FetusParser]
 FP --> |fetus| Core[FetusFetusParser -> FetusCoreData]
 FP --> |anatomy_text| Anat[FetusAnatomyTextParser -> List of SimpleTerms]
 FP --> |measurements| Meas[FetusMeasurementsParser -> MeasurementsData]
 FP --> |ratios| Rat[FetusRatiosParser -> FetusRatiosData]
 FP --> |efws| Efw[FetusEfwParser -> FetusEfwData]

Parsing ViewPoint (VPL)

System Class Diagram

classDiagram
   %% Configuration Layer
   class BiometryMappingsYAML {
       <<Configuration>>
       head_circumference[]
       biparietal_diameter[]
       femur_length[]
       abdominal_circumference[]
       occipitofrontal_diameter[]
   }

   %% Data Models
   class PercentileRange {
       +min_percentile: float
       +max_percentile: float
       +contains(percentile: float) bool
   }

   class TermBin {
       +range: PercentileRange
       +hpo_id: str
       +hpo_label: str
       +normal: bool
       +fits(percentile: float) bool
       +category: str
   }

   class TermObservation {
       +hpo_id: str
       +hpo_label: str
       +category: str
       +observed: bool
       +gestational_age: GestationalAge
       +percentile: float
       +to_phenotypic_feature() dict
   }

   class GestationalAge {
       +weeks: int
       +days: int
       +from_weeks(float) GestationalAge
       +to_iso() str
   }

   %% Loading Layer
   class BiometryMappingLoader {
       <<Service>>
       +load(path: Path) Dict[str, List[TermBin]]
   }

   %% Business Logic Layer
   class SonographicMeasurement {
       +measurement_type: str
       +term_bins: List[TermBin]
       +from_percentile(percentile, ga) TermObservation
       +name() str
   }

   class MeasurementEvaluation {
       <<Factory>>
       -_mappings: Dict[str, List[TermBin]]
       +__init__(mappings_path?)
       +get_measurement_mapper(type: str) SonographicMeasurement
   }

   %% Reference Data Layer
   class FetalGrowthPercentiles {
       +source: str
       +tables: Dict[str, DataFrame]
       +calculate_percentile(measurement, ga, value) float
       +get_z_score(measurement, ga, value) float
       +lookup_percentile(measurement, ga, value) float
   }

   %% Export Layer
   class PhenotypicExporter {
       +term_observations: List[TermObservation]
       +build_phenopacket() dict
       +to_json() str
       +validate() QCReport
   }

   class QCValidator {
       +validate_schema(json) List[Error]
       +validate_ontology_terms() List[Error]
       +check_completeness() List[Warning]
   }

   %% Relationships
   BiometryMappingsYAML ..> BiometryMappingLoader : reads
   BiometryMappingLoader --> PercentileRange : creates
   BiometryMappingLoader --> TermBin : creates
   TermBin *-- PercentileRange : contains
   
   BiometryMappingLoader --> MeasurementEvaluation : provides mappings
   MeasurementEvaluation --> SonographicMeasurement : creates
   SonographicMeasurement *-- TermBin : configured with
   SonographicMeasurement --> TermObservation : produces
   
   TermObservation *-- GestationalAge : includes
   FetalGrowthPercentiles ..> SonographicMeasurement : provides percentiles
   
   PhenotypicExporter *-- TermObservation : collects
   PhenotypicExporter --> QCValidator : uses

Data Flow

End-to-End Processing Pipeline

The new architecture streamlines the flow from raw measurement to Phenopacket:

sequenceDiagram
   participant User
   participant Parser as Input Parser
   participant GA as GestationalAge
   participant Ref as FetalGrowthPercentiles
   participant Factory as MeasurementEvaluation
   participant Mapper as SonographicMeasurement
   participant Export as PhenotypicExporter
   participant Output as Phenopacket JSON

   User->>Parser: Load ultrasound report (JSON/XLSX)
   Parser->>GA: Parse gestational age string
   GA-->>Parser: GestationalAge(weeks=20, days=6)

   Note over Factory: ONE-TIME INITIALIZATION
   Factory->>Factory: Load biometry_hpo_mappings.yaml
   Factory->>Factory: Create TermBins for all measurements

   Parser->>Ref: Request percentile for HC at 20w6d
   Ref->>Ref: Interpolate INTERGROWTH table
   Ref-->>Parser: percentile = 2.1

   Parser->>Factory: get_measurement_mapper("head_circumference")
   Factory-->>Parser: SonographicMeasurement(term_bins=[...])

   Parser->>Mapper: from_percentile(2.1, gestational_age)
   
   Note over Mapper: DATA-DRIVEN LOOKUP
   Mapper->>Mapper: Iterate through term_bins
   Mapper->>Mapper: Find TermBin where range.contains(2.1)
   Mapper->>Mapper: Found: [0, 3) -> HP:0000252 "Microcephaly"
   
   Mapper-->>Parser: TermObservation(
   Note right of Mapper: hpo_id="HP:0000252"
   Note right of Mapper: hpo_label="Microcephaly"
   Note right of Mapper: observed=True
   Note right of Mapper: category="lower_extreme_term"
   Note right of Mapper: percentile=2.1)

   Parser->>Export: Add TermObservation to export batch
   Export->>Export: Build Phenopacket structure
   Export->>Export: QC validation
   Export-->>Output: Write JSON file

   Output-->>User: Phenopacket with HPO term + provenance

Detailed Step-by-Step Flow

1. Configuration Loading (Happens Once)

  flowchart LR
    A["Application Startup"] --> B["MeasurementEvaluation.__init__()"]
    B --> C["BiometryMappingLoader.load('biometry_hpo_mappings.yaml')"]
    C --> D["Parse YAML<br/>→ Create PercentileRange objects"]
    D --> E["Create TermBin objects linking ranges to HPO terms"]
    E --> F["Store in dictionary:<br/>{ 'head_circumference': [TermBin(...), ...],<br/>'biparietal_diameter': [...], ... }"]

2. Measurement Processing (Per Observation)

  flowchart LR
    A["Raw Input:<br/>HC = 175mm at 20w6d"] --> B["GestationalAge.from_weeks(20.86)<br/>→ GestationalAge(weeks=20, days=6)"]
    B --> C["FetalGrowthPercentiles.calculate_percentile('head_circumference', 20.86, 175.0)<br/>→ Lookup INTERGROWTH table<br/>→ Interpolate between 20w and 21w<br/>→ Return percentile: 2.1"]
    C --> D["factory.get_measurement_mapper('head_circumference')<br/>→ Returns SonographicMeasurement with 8 TermBins"]
    D --> E["mapper.from_percentile(2.1, gestational_age)<br/>→ Finds TermBin [0,3): HP:0000252 'Microcephaly'"]
    E --> F["Creates TermObservation:<br/>• hpo_id=HP:0000252<br/>• observed=True<br/>• category='lower_extreme_term'"]
    F --> G["TermObservation.to_phenotypic_feature()<br/>→ Phenopacket JSON output"]

3. Multi-Measurement Workflow

  flowchart LR
    Input["Raw Ultrasound Report<br/>(JSON or Excel)"] --> Parse["Parse Measurements"]

    Parse --> HC["HC<br/>175 mm"]
    Parse --> BPD["BPD<br/>45 mm"]
    Parse --> FL["FL<br/>30 mm"]

    subgraph Processing["Parallel Processing"]
        direction LR
        HC --> HCMap["HC Mapper"] --> HCObs["TermObservation"]
        BPD --> BPDMap["BPD Mapper"] --> BPDObs["TermObservation"]
        FL --> FLMap["FL Mapper"] --> FLObs["TermObservation"]
    end

    HCObs --> Collect["Collect All Observations"]
    BPDObs --> Collect
    FLObs --> Collect

    Collect --> PP["Assemble Phenopacket components"]
    PP --> QC["Quality Control"]
    QC --> Output["Build Phenopacket: JSON Output"]

Module Breakdown

Core Architecture Modules

`src/prenatalppkt/measurements/term_bin.py`

Data structures for configuration-driven ontology mapping:

@dataclass
class PercentileRange:
   """Represents a percentile interval [min, max)."""
   min_percentile: float
   max_percentile: float
   
   def contains(self, percentile: float) -> bool:
       """Check if percentile falls within this range."""
       return self.min_percentile <= percentile < self.max_percentile


@dataclass
class TermBin:
   """Links a percentile range to an HPO term."""
   range: PercentileRange
   hpo_id: str
   hpo_label: str
   normal: bool  # Explicit flag: is this range considered normal?
   
   def fits(self, percentile: float) -> bool:
       """Check if percentile fits in this bin."""
       return self.range.contains(percentile)
   
   @property
   def category(self) -> str:
       """Auto-categorize based on boundaries."""
       if self.range.min_percentile == 0:
           return "lower_extreme_term"
       elif self.range.max_percentile == 100:
           return "upper_extreme_term"
       elif self.normal:
           return "normal_term"
       else:
           return "abnormal_term"

Purpose: Pure data structures with no business logic. Can be easily serialized, tested, and validated.

`src/prenatalppkt/mapping_loader.py`

Handles all YAML parsing and TermBin construction:

class BiometryMappingLoader:
   """
   Loads HPO mappings from YAML configuration.
   Separates file I/O from measurement evaluation logic.
   """
   
   @staticmethod
   def load(path: Path) -> Dict[str, List[TermBin]]:
       """
       Load biometry-to-HPO mappings from YAML.
       
       Returns:
           Dictionary mapping measurement types to sorted lists of TermBins
           
       Example:
           {
               "head_circumference": [
                   TermBin(range=[0,3), id="HP:0000252", ...),
                   TermBin(range=[3,5), id="HP:0040195", ...),
                   ...
               ],
               "biparietal_diameter": [...]
           }
       """

Key Features:

Validates YAML structure
Creates PercentileRange and TermBin objects
Sorts bins by min_percentile for efficient lookup
Logs warnings for gaps or overlaps
Single point of failure for configuration errors

`src/prenatalppkt/measurement_eval.py`

Factory pattern for creating measurement evaluators:

class MeasurementEvaluation:
   """
   Factory for measurement mappers.
   Loads configuration once, creates mappers on demand.
   """
   
   def __init__(self, mappings_path: Optional[Path] = None) -> None:
       """Initialize with YAML path (defaults to bundled config)."""
       self._mappings = BiometryMappingLoader.load(
           mappings_path or DEFAULT_MAPPINGS_FILE
       )
   
   def get_measurement_mapper(
       self,
       measurement_type: str
   ) -> Optional[SonographicMeasurement]:
       """
       Get a configured mapper for the specified measurement.
       
       Example:
           factory = MeasurementEvaluation()
           hc_mapper = factory.get_measurement_mapper("head_circumference")
           observation = hc_mapper.from_percentile(2.1, gestational_age)
       """

Design Pattern: Factory + Singleton behavior (loads YAML once, reuses mappings)

`src/prenatalppkt/sonographic_measurement.py`

Generic measurement mapper (no longer abstract, no subclasses needed):

class SonographicMeasurement:
   """
   Generic measurement mapper using configured TermBins.
   Replaces all measurement-specific subclasses.
   """
   
   def __init__(self, measurement_type: str, term_bins: List[TermBin]) -> None:
       """Configuration is INJECTED at instantiation."""
       self.measurement_type = measurement_type
       self.term_bins = term_bins
   
   def from_percentile(
       self,
       percentile: float,
       gestational_age: GestationalAge
   ) -> TermObservation:
       """
       Map a percentile to an HPO term observation.
       DATA-DRIVEN - no hard-coded if/elif chains!
       """
       for term_bin in self.term_bins:
           if term_bin.fits(percentile):
               return TermObservation(
                   hpo_id=term_bin.hpo_id,
                   hpo_label=term_bin.hpo_label,
                   category=term_bin.category,
                   observed=not term_bin.normal,
                   gestational_age=gestational_age,
                   percentile=percentile,
               )
       
       raise ValueError(
           f"No HPO mapping found for {self.measurement_type} "
           f"percentile {percentile:.1f}"
       )

Key Change: No more inheritance hierarchy! One generic class works for all measurements.

`src/prenatalppkt/term_observation.py`

Lightweight data holder (no complex logic or external dependencies):

@dataclass
class TermObservation:
   """HPO term observation with gestational age context."""
   hpo_id: str
   hpo_label: str
   category: str
   observed: bool
   gestational_age: GestationalAge
   percentile: Optional[float] = None
   
   def to_phenotypic_feature(self) -> Dict[str, object]:
       """Convert to Phenopacket v2 format."""
       ga_str = f"{self.gestational_age.weeks}w{self.gestational_age.days}d"
       
       return {
           "type": {"id": self.hpo_id, "label": self.hpo_label},
           "excluded": not self.observed,
           "onset": {"gestationalAge": self.gestational_age.to_iso()},
           "description": f"Measurement at {ga_str}"
       }

Removed Dependencies:

No longer depends on MinimalTerm from hpo-toolkit
No __post_init__ logic
No build_standard_bin_mapping() method

Reference Data Modules

`src/prenatalppkt/biometry_reference.py`

Unified interface for loading and querying fetal growth reference data:

class FetalGrowthPercentiles:
   """
   Load and query NIHCD or INTERGROWTH-21st fetal growth references.
   
   Supports:
   - Percentile lookup by gestational age
   - Z-score calculation
   - Linear interpolation for non-integer gestational ages
   """
   
   def __init__(self, source: str = "intergrowth") -> None:
       """
       Initialize with reference data source.
       
       Args:
           source: "nihcd" or "intergrowth"
       """
   
   def calculate_percentile(
       self,
       measurement_type: str,
       gestational_age_weeks: float,
       value_mm: float
   ) -> float:
       """
       Calculate which percentile a measurement falls into.
       
       Returns:
           Percentile value (0-100)
       """

Key features:

Loads parsed TSV tables from data/parsed/
Handles gestational age interpolation
Supports both centile and z-score tables
Validates measurement types and ranges

`src/prenatalppkt/gestational_age.py`

Represents gestational age with weeks + days:

@dataclass
class GestationalAge:
   """Gestational age representation."""
   weeks: int
   days: int
   
   @classmethod
   def from_weeks(cls, total_weeks: float) -> GestationalAge:
       """Convert decimal weeks to weeks+days."""
       weeks = int(total_weeks)
       days = int((total_weeks - weeks) * 7)
       return cls(weeks=weeks, days=days)
   
   def to_iso(self) -> dict:
       """Convert to Phenopacket ISO format."""
       return {"weeks": self.weeks, "days": self.days}

Data Parsing Modules

`scripts/parse_nichd_raw.py`

Parses NIHCD raw text data into standardized TSV format:

def parse_nichd_raw(input_file: Path, output_dir: Path) -> None:
   """
   Parse NIHCD fetal growth calculator text export.
   
   Handles:
   - Multi-word measurement names
   - Race/ethnicity categories
   - Multiple percentile columns
   - Header/footer junk lines
   """

`scripts/parse_intergrowth_txt_all.py`

Parses INTERGROWTH-21st centile and z-score tables:

def parse_intergrowth_tables(raw_dir: Path, out_dir: Path) -> None:
   """
   Parse INTERGROWTH centile (_ct_) and z-score (_zs_) tables.
   
   Handles:
   - Text file parsing
   - Gestational age range validation
   - Measure name normalization
   - Provenance metadata
   """

Export Modules

`src/prenatalppkt/phenotypic_export.py`

Assembles Phenopackets from TermObservations:

class PhenotypicExporter:
   """
   Build GA4GH Phenopackets v2 from term observations.
   """
   
   def __init__(self) -> None:
       self.term_observations: List[TermObservation] = []
   
   def add_observation(self, obs: TermObservation) -> None:
       """Add an observation to the export batch."""
       self.term_observations.append(obs)
   
   def build_phenopacket(
       self,
       subject_id: str,
       maternal_id: Optional[str] = None
   ) -> dict:
       """
       Build complete Phenopacket structure.
       
       Returns:
           Phenopacket v2 compliant dictionary
       """

Configuration Guide

YAML Mapping Structure

The data/mappings/biometry_hpo_mappings.yaml file defines how percentile values map to HPO terms:

# Template for each measurement
measurement_name:
 - min: <float>        # Minimum percentile (inclusive)
   max: <float>        # Maximum percentile (exclusive)
   id: "<HPO:ID>"      # HPO term identifier
   label: "<string>"   # Human-readable label
   normal: <boolean>   # Is this range considered normal?

Complete Example: Head Circumference

head_circumference:
 # Extreme low: <3rd percentile
 - min: 0
   max: 3
   id: "HP:0000252"
   label: "Microcephaly"
   normal: false
 
 # Borderline low: 3rd-5th percentile
 - min: 3
   max: 5
   id: "HP:0040195"
   label: "Decreased head circumference"
   normal: false
 
 # Mildly abnormal low: 5th-10th percentile
 - min: 5
   max: 10
   id: "HP:0000240"
   label: "Abnormality of skull size"
   normal: false
 
 # Normal range: 10th-50th percentile
 - min: 10
   max: 50
   id: "HP:0000240"
   label: "Abnormality of skull size"
   normal: true  # Marked as normal
 
 # Normal range: 50th-90th percentile
 - min: 50
   max: 90
   id: "HP:0000240"
   label: "Abnormality of skull size"
   normal: true
 
 # Mildly abnormal high: 90th-95th percentile
 - min: 90
   max: 95
   id: "HP:0000240"
   label: "Abnormality of skull size"
   normal: false
 
 # Borderline high: 95th-97th percentile
 - min: 95
   max: 97
   id: "HP:0040194"
   label: "Increased head circumference"
   normal: false
 
 # Extreme high: >97th percentile
 - min: 97
   max: 100
   id: "HP:0000256"
   label: "Macrocephaly"
   normal: false

Validation Rules

The system automatically validates:

Complete Coverage: Ranges must span [0, 100) with no gaps
No Overlaps: Each percentile value must map to exactly one bin
Sorted Order: Ranges must be in ascending order by min
Valid Percentiles: 0 <= min < max <= 100
HPO Term Format: IDs must match pattern HP:\d{7}

Adding a New Measurement

# 1. Add to biometry_hpo_mappings.yaml
estimated_fetal_weight:
 - min: 0
   max: 10
   id: "HP:0001518"
   label: "Small for gestational age"
   normal: false
 
 - min: 10
   max: 90
   id: "HP:0000118"  # Generic placeholder
   label: "Phenotypic abnormality"
   normal: true
 
 - min: 90
   max: 100
   id: "HP:0001520"
   label: "Large for gestational age"
   normal: false

# 2. Use immediately (no code changes needed!)
factory = MeasurementEvaluation()
efw_mapper = factory.get_measurement_mapper("estimated_fetal_weight")

Customizing Normal Ranges

Different clinical contexts may define "normal" differently:

# Conservative definition (narrower normal range)
head_circumference_conservative:
 - min: 0
   max: 5
   id: "HP:0000252"
   label: "Microcephaly"
   normal: false
 
 - min: 5
   max: 15    # More restrictive
   id: "HP:0040195"
   label: "Decreased head circumference"
   normal: false
 
 - min: 15
   max: 85    # Narrower normal range
   id: "HP:0000240"
   label: "Abnormality of skull size"
   normal: true
 
 # ... continue pattern

Load with:

factory = MeasurementEvaluation(
   mappings_path=Path("config/conservative_mappings.yaml")
)

Inputs and Outputs

Input Formats

1. Observer JSON

{
 "exam": {
   "patient_dob": "1990-01-15",
   "lmp_date": "2024-03-10",
   "exam_date": "2024-08-15",
   "icd10_codes": ["Z34.00"]
 },
 "fetuses": [
   {
     "fetus_id": 1,
     "measurements": {
       "bpd_mm": 45.2,
       "hc_mm": 175.3,
       "ac_mm": 150.1,
       "fl_mm": 32.5
     },
     "anatomy": {
       "cranium": "normal",
       "heart": "four_chamber_view_normal"
     }
   }
 ]
}

2. ViewPoint Excel (.xlsx)

ExamDate	LMP	Fetus	BPD (mm)	HC (mm)	AC (mm)	FL (mm)
2024-08-15	2024-03-10	1	45.2	175.3	150.1	32.5

Note: ViewPoint uses proprietary dropdown lists (.vpl files) for anatomy findings. See docs/viewpoint_dropdown_options.md for conversion utilities.

Output Format: Phenopacket v2

{
 "id": "prenatal-exam-20240815-fetus1",
 "subject": {
   "id": "FETUS_001",
   "timeAtLastEncounter": {
     "gestationalAge": {
       "weeks": 20,
       "days": 6
     }
   }
 },
 "phenotypicFeatures": [
   {
     "type": {
       "id": "HP:0000252",
       "label": "Microcephaly"
     },
     "excluded": false,
     "onset": {
       "gestationalAge": {
         "weeks": 20,
         "days": 6
       }
     },
     "description": "Measurement at 20w6d"
   },
   {
     "type": {
       "id": "HP:0000240",
       "label": "Abnormality of skull size"
     },
     "excluded": true,
     "onset": {
       "gestationalAge": {
         "weeks": 20,
         "days": 6
       }
     },
     "description": "Measurement within normal range for gestational age (20w6d)"
   }
 ],
 "measurements": [
   {
     "assay": {
       "id": "LOINC:11820-8",
       "label": "Head circumference"
     },
     "value": {
       "quantity": {
         "unit": {
           "id": "UCUM:mm",
           "label": "millimeter"
         },
         "value": 175.3
       }
     },
     "timeObserved": {
       "gestationalAge": {
         "weeks": 20,
         "days": 6
       }
     }
   }
 ],
 "metaData": {
   "created": "2024-08-15T14:30:00Z",
   "createdBy": "prenatalppkt-v0.1.0",
   "resources": [
     {
       "id": "hp",
       "name": "Human Phenotype Ontology",
       "url": "http://purl.obolibrary.org/obo/hp.owl",
       "version": "2024-04-26",
       "namespacePrefix": "HP",
       "iriPrefix": "http://purl.obolibrary.org/obo/HP_"
     },
     {
       "id": "intergrowth",
       "name": "INTERGROWTH-21st Standards",
       "url": "https://intergrowth21.tghn.org/",
       "version": "2014",
       "namespacePrefix": "INTERGROWTH"
     }
   ],
   "phenopacketSchemaVersion": "2.0"
 }
}

Installation

Prerequisites

Python 3.10 or higher
pip package manager

Install from Source

# Clone the repository
git clone https://github.com/P2GX/prenatalppkt.git
cd prenatalppkt

# Create virtual environment
python3 -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install in development mode
pip install -e ".[test]"

# Verify installation
python -c "import prenatalppkt; print(prenatalppkt.__version__)"

Install Dependencies Only

pip install -r requirements/requirements.txt

Optional: Documentation Build

pip install -e ".[docs]"
mkdocs serve  # View docs at http://localhost:8000

Usage Examples

Basic Workflow

from prenatalppkt.measurement_eval import MeasurementEvaluation
from prenatalppkt.biometry_reference import FetalGrowthPercentiles
from prenatalppkt.gestational_age import GestationalAge

# 1. Initialize (loads YAML configuration once)
factory = MeasurementEvaluation()
ref_data = FetalGrowthPercentiles(source="intergrowth")

# 2. Parse gestational age
ga = GestationalAge.from_weeks(20.86)  # 20 weeks, 6 days

# 3. Calculate percentile
percentile = ref_data.calculate_percentile(
   measurement_type="head_circumference",
   gestational_age_weeks=20.86,
   value_mm=175.0
)
# Returns: 2.1 (well below 3rd percentile)

# 4. Get measurement mapper
hc_mapper = factory.get_measurement_mapper("head_circumference")

# 5. Map to HPO term
observation = hc_mapper.from_percentile(percentile, ga)

# Result:
# TermObservation(
#     hpo_id="HP:0000252",
#     hpo_label="Microcephaly",
#     category="lower_extreme_term",
#     observed=True,
#     gestational_age=GestationalAge(weeks=20, days=6),
#     percentile=2.1
# )

# 6. Convert to Phenopacket format
phenotypic_feature = observation.to_phenotypic_feature()
# {
#     "type": {"id": "HP:0000252", "label": "Microcephaly"},
#     "excluded": false,
#     "onset": {"gestationalAge": {"weeks": 20, "days": 6}},
#     "description": "Measurement at 20w6d"
# }

Batch Processing Multiple Measurements

from prenatalppkt.measurement_eval import MeasurementEvaluation
from prenatalppkt.biometry_reference import FetalGrowthPercentiles
from prenatalppkt.gestational_age import GestationalAge

# Initialize once
factory = MeasurementEvaluation()
ref_data = FetalGrowthPercentiles(source="intergrowth")
ga = GestationalAge.from_weeks(22.5)

# Raw measurements from ultrasound
measurements = {
   "head_circumference": 196.3,
   "biparietal_diameter": 52.1,
   "femur_length": 35.8,
   "abdominal_circumference": 170.2
}

# Process all measurements
observations = []
for measurement_type, value_mm in measurements.items():
   # Calculate percentile
   percentile = ref_data.calculate_percentile(measurement_type, 22.5, value_mm)
   
   # Get mapper and create observation
   mapper = factory.get_measurement_mapper(measurement_type)
   obs = mapper.from_percentile(percentile, ga)
   observations.append(obs)

# Build Phenopacket
from prenatalppkt.phenotypic_export import PhenotypicExporter
exporter = PhenotypicExporter()
for obs in observations:
   exporter.add_observation(obs)

phenopacket = exporter.build_phenopacket(
   subject_id="FETUS_001",
   maternal_id="MOTHER_001"
)

Custom Configuration

from pathlib import Path
from prenatalppkt.measurement_eval import MeasurementEvaluation

# Use custom YAML configuration
custom_mappings = Path("config/custom_hpo_mappings.yaml")
factory = MeasurementEvaluation(mappings_path=custom_mappings)

# Rest of workflow is identical
mapper = factory.get_measurement_mapper("head_circumference")
observation = mapper.from_percentile(15.2, ga)

Testing with Mock Configuration

from prenatalppkt.measurements.term_bin import TermBin, PercentileRange
from prenatalppkt.sonographic_measurement import SonographicMeasurement
from prenatalppkt.gestational_age import GestationalAge

# Create mock configuration for testing
test_bins = [
   TermBin(
       range=PercentileRange(0, 10),
       hpo_id="HP:TEST001",
       hpo_label="Low test value",
       normal=False
   ),
   TermBin(
       range=PercentileRange(10, 90),
       hpo_id="HP:TEST002",
       hpo_label="Normal test value",
       normal=True
   ),
   TermBin(
       range=PercentileRange(90, 100),
       hpo_id="HP:TEST003",
       hpo_label="High test value",
       normal=False
   ),
]

# Create mapper with mock config
test_mapper = SonographicMeasurement("test_measurement", test_bins)

# Test with various percentiles
ga = GestationalAge(weeks=20, days=0)
obs_low = test_mapper.from_percentile(5.0, ga)
obs_normal = test_mapper.from_percentile(50.0, ga)
obs_high = test_mapper.from_percentile(95.0, ga)

assert obs_low.hpo_id == "HP:TEST001"
assert obs_low.observed == True

assert obs_normal.hpo_id == "HP:TEST002"
assert obs_normal.observed == False  # Normal range

assert obs_high.hpo_id == "HP:TEST003"
assert obs_high.observed == True

Testing

Run All Tests

pytest -vv

Run Specific Test Module

pytest tests/test_term_bin.py -v

Run with Coverage

pytest --cov=prenatalppkt --cov-report=html

Linting and Formatting

# Format code
ruff format .

# Check for issues
ruff check .

# Auto-fix issues
ruff check . --fix

Test Coverage

Current test suite covers:

Core Functionality Tests

tests/test_term_bin.py

PercentileRange.contains() for various ranges
TermBin.fits() boundary conditions
Automatic category detection
Edge cases (boundary values, overlaps)

tests/test_mapping_loader.py

YAML file loading
TermBin object creation
Validation of range coverage
Error handling for malformed YAML

tests/test_measurement_eval.py

Factory initialization
Mapper creation
Configuration caching
Missing measurement handling

tests/test_sonographic_measurement.py

Percentile-to-observation mapping
Data-driven lookup logic
Normal vs. abnormal classification
Edge percentiles (0.0, 99.9, etc.)

Reference Data Tests

tests/test_biometry_reference.py

NIHCD table loading
INTERGROWTH table loading
Percentile interpolation accuracy
Z-score calculation
Cross-reference consistency

Export Tests

tests/test_phenotypic_export.py

HPO term assignment correctness
Phenopacket JSON serialization
Batch export functionality
QC validation integration

Parsing Tests

tests/test_parse_nichd_raw.py

Header/junk line detection
Multi-word measurement parsing
Race/ethnicity field extraction
Percentile value extraction

tests/test_parse_intergrowth_txt_all.py

Data line identification
GA range validation
Measure name normalization
Provenance metadata addition

Test Data

Test fixtures use validated reference values:

# Example: NIHCD BPD at 20.86 weeks (Non-Hispanic White)
NIHCD_BPD_20_86_WEEKS = {
   "3rd": 145.25,
   "5th": 147.25,
   "10th": 150.37,
   "50th": 161.95,
   "90th": 174.41,
   "95th": 178.12,
   "97th": 180.56
}

# Example: INTERGROWTH HC z-scores at 22 weeks
INTERGROWTH_HC_22_WEEKS_ZSCORES = {
   "-3 SD": 169.2,
   "-2 SD": 179.5,
   "-1 SD": 189.8,
   "0 SD": 200.1,
   "+1 SD": 210.4,
   "+2 SD": 220.7,
   "+3 SD": 231.0
}

Future Roadmap

Phase 1: Core Functionality (Current Release)

Reference data loading (NIHCD, INTERGROWTH-21st)
Percentile-based evaluation
HPO term mapping via YAML configuration
Data-driven measurement architecture
TermBin and PercentileRange models

Phase 2: Input Parsing (In Progress)

Observer JSON parser
ViewPoint Excel parser
Gestational age calculation from LMP/exam dates
Multi-fetus handling
Anatomy finding extraction (using ViewPoint dropdown lists)

Phase 3: Quality Control (Planned)

Schema validation (JSON Schema, Protobuf)
Completeness checking (required fields, measurement coverage)
Range validation (biologically plausible values)
Anomaly detection (statistical outliers)
Cross-measurement consistency (e.g., BPD/HC ratio)

Phase 4: Phenopacket Builder (Planned)

Full Phenopacket v2 assembly
Family/pedigree integration (twins, triplets)
ICD-10 -> MONDO/OMIM mapping
Provenance tracking (pipeline version, analyst ID)
Batch export utilities

Phase 5: CLI and Web API (Not Planned Yet)

# Command-line interface
prenatalppkt parse --input exam_data.json --output results/ --reference intergrowth

# Web API
POST /api/v1/evaluate
{
 "gestational_age_weeks": 22.5,
 "measurements": {"hc_mm": 196.3, "bpd_mm": 52.1}
}
-> Returns Phenopacket JSON

Phase 6: Advanced Features (Not Planned Yet)

Longitudinal growth tracking (serial ultrasounds)
Growth velocity calculations
Multi-parameter risk scoring
Predictive modeling integration (machine learning)
DICOM integration (extract measurements from ultrasound images)

Contributing

We welcome contributions! Please see our Contributing Guidelines for details.

Development Workflow

# 1. Fork and clone
git clone https://github.com/YOUR_USERNAME/prenatalppkt.git
cd prenatalppkt
git remote add upstream https://github.com/P2GX/prenatalppkt.git

# 2. Create feature branch
git checkout -b feature/add-efw-support

# 3. Install development dependencies
pip install -e ".[test]"

# 4. Make changes and test
pytest -vv
ruff format .
ruff check . --fix

# 5. Commit with descriptive messages
git add .
git commit -m "feat: Add estimated fetal weight (EFW) measurement support"

# 6. Push and create pull request
git push origin feature/add-efw-support

Code Style Guidelines

Python: Follow PEP 8 (enforced by Ruff)
Docstrings: Use Sphinx format
Type hints: Required for all public functions
Line length: 88 characters (Black-compatible)

Example:

def evaluate(
   self,
   gestational_age: GestationalAge,
   measurement_value: float,
   reference_range: ReferenceRange
) -> MeasurementResult:
   """
   Evaluate a raw measurement against the provided reference range.

   Parameters
   ----------
   gestational_age : GestationalAge
       The gestational age context for this measurement.
   measurement_value : float
       The observed measurement in millimeters.
   reference_range : ReferenceRange
       Percentile thresholds for this gestational age.

   Returns
   -------
   MeasurementResult
       Percentile bin classification for the measurement.
   """

LICENSE

## License

This project is released under a **dual-license model**:

- **Academic / Non-Commercial License:** Free to use, modify, and distribute for research and educational purposes.
- **Commercial License:** Required for commercial or for-profit use. Please contact [varenyajj@gmail.com](mailto:varenyajj@gmail.com).

Attribution required: (C) 2025 Varenya Jain, Peter N. Robinson.

For complete terms, see the [LICENSE](./LICENSE) file.

Acknowledgments

Reference Standards

NICHD Fetal Growth Studies: U.S. National Institute of Child Health and Human Development
INTERGROWTH-21st Project: International consortium for fetal growth standards

Key Dependencies

HPO Toolkit: Human Phenotype Ontology integration
GA4GH Phenopackets: Standardized phenotype representation
PyPhetools: Phenotype analysis utilities from Monarch Initiative

Contributors

Citation

If you use prenatalppkt in your research, please cite:

@article{prenatalppkt
  author = {Jain, Varenya and Robinson, Peter N.},
  title = {prenatalppkt: Standardized Prenatal Phenotype Representation},
  year = {2025},
  url = {https://github.com/P2GX/prenatalppkt},
  version = {0.1.dev}
}

And cite the relevant reference standards:

NICHD Fetal Growth Studies: Buck Louis GM, Grewal J, Albert PS, Sciscione A, Wing DA, Grobman WA, Newman RB, Wapner R, D'Alton ME, Skupski D, Nageotte MP, Ranzini AC, Owen J, Chien EK, Craigo S, Hediger ML, Kim S, Zhang C, Grantz KL. Racial/ethnic standards for fetal growth: the NICHD Fetal Growth Studies. Am J Obstet Gynecol. 2015 Oct;213(4):449.e1-449.e41. doi: 10.1016/j.ajog.2015.08.032. PMID: 26410205; PMCID: PMC4584427.
INTERGROWTH-21st: Papageorghiou AT, Kennedy SH, Salomon LJ, Altman DG, Ohuma EO, Stones W, Gravett MG, Barros FC, Victora C, Purwar M, Jaffer Y, Noble JA, Bertino E, Pang R, Cheikh Ismail L, Lambert A, Bhutta ZA, Villar J; International Fetal and Newborn Growth Consortium for the 21(st) Century (INTERGROWTH-21(st)). The INTERGROWTH-21st fetal growth standards: toward the global integration of pregnancy and pediatric care. Am J Obstet Gynecol. 2018 Feb;218(2S):S630-S640. doi: 10.1016/j.ajog.2018.01.011. PMID: 29422205.

@article{intergrowth2014,
  title={International standards for fetal growth based on serial ultrasound measurements: the INTERGROWTH-21st Project},
  author={Papageorghiou, Aris T and Ohuma, Eric O and others},
  journal={The Lancet},
  volume={384},
  number={9946},
  pages={869--879},
  year={2014},
  publisher={Elsevier}
}

@article{buck2015nichd,
  title={The NICHD Fetal Growth Studies: design, methods, and cohort description},
  author={Buck Louis, Germaine M and Grewal, Jagteshwar and others},
  journal={American Journal of Obstetrics and Gynecology},
  volume={213},
  number={4},
  pages={459--e1},
  year={2015}
}

Support

Documentation: https://github.com/P2GX/prenatalppkt/docs
Issue Tracker: https://github.com/P2GX/prenatalppkt/issues
Discussions: https://github.com/P2GX/prenatalppkt/discussions
Email: [Contact @VarenyaJ or @pnrobinson]

Visuals:

High-Level System Overview

flowchart LR
subgraph Input["Input Processing"]
  JSON["JSON/XLSX Input"] --> Parser["Parser Layer"]
  Parser --> GA["Gestational Age
  Calculation"]
  GA --> Measurements["Extract Measurements
  o BPD
  o HC
  o AC
  o FL
  o OFD"]
end

subgraph Config["Configuration Layer"]
  YAML["YAML Mappings
  o Percentile ranges
  o HPO term IDs
  o Normal flags"]
end

subgraph Reference["Reference Data"]
  NIH["NIHCD Reference
  o Percentiles by race/ethnicity
  o Growth charts"]
  IG21["INTERGROWTH-21st
  o Z-scores
  o Centiles"]
end

subgraph Processing["Measurement Processing"]
  Measurements --> Factory["MeasurementEvaluation
  (Factory)"]
  Factory --> Mapper["SonographicMeasurement
  (Generic Mapper)"]
  Mapper --> Percentile["Calculate Percentile"]
  Percentile --> Direct["Direct Mapping
  percentile -> HPO term"]
end

subgraph Output["Output Generation"]
  Direct --> TermObs["TermObservation
  o HPO ID + Label
  o observed flag
  o percentile"]
  TermObs --> Phenopacket["Phenopacket Builder"]
  Phenopacket --> QC["QC Validation"]
  QC --> Final["Final Phenopackets"]
end

YAML -.-> Factory
NIH --> Percentile
IG21 --> Percentile

classDef input fill:#a8d5ff,stroke:#333,stroke-width:2px,color:#000000
classDef config fill:#fff4e6,stroke:#333,stroke-width:2px,color:#000000
classDef reference fill:#ffe6cc,stroke:#333,stroke-width:2px,color:#000000
classDef process fill:#d5ffa8,stroke:#333,stroke-width:2px,color:#000000
classDef output fill:#ffafcc,stroke:#333,stroke-width:2px,color:#000000

class JSON,Parser,GA,Measurements input
class YAML config
class NIH,IG21 reference
class Factory,Mapper,Percentile,Direct process
class TermObs,Phenopacket,QC,Final output

Key Differences from Legacy Architecture:

YAML Configuration Layer: Central source of truth for HPO mappings
Factory Pattern: MeasurementEvaluation creates mappers on demand
Generic Mapper: Single SonographicMeasurement class (no subclasses)
Direct Mapping: No intermediate MeasurementResult - percentile maps directly to TermObservation
Explicit Normal Flags: YAML defines what's normal, not hard-coded logic

Detailed Module-Level Architecture

flowchart LR
subgraph Input["1 Input Sources"]
  JSON["data/EVMS_SAMPLE.json"]
  XLSX["ViewPoint Excel (.xlsx)"]
end

subgraph Parsing["2 Parsing & Gestational Age"]
  PARSER["biometry.py / parse_viewpoint.py
  o Reads JSON/XLSX
  o Extracts raw measurements"]
  GA["gestational_age.py
  o GestationalAge(weeks, days)
  o from_weeks() converter"]
end

subgraph Config["3 Configuration Loading"]
  YAML["data/mappings/
  biometry_hpo_mappings.yaml
  o Percentile ranges [min, max)
  o HPO IDs and labels
  o normal: true/false"]
  LOADER["mapping_loader.py
  BiometryMappingLoader
  o Parses YAML
  o Creates TermBin objects
  o Validates coverage"]
end

subgraph Reference["4 Reference Standards"]
  REF["biometry_reference.py
  FetalGrowthPercentiles
  o Loads NIHCD/INTERGROWTH tables
  o calculate_percentile()
  o get_z_score()"]
end

subgraph Factory["5 Mapper Factory"]
  FACTORY["measurement_eval.py
  MeasurementEvaluation
  o Loads YAML once at init
  o get_measurement_mapper()
  o Returns configured mappers"]
end

subgraph Measurement["6 Generic Measurement Mapper"]
  MAPPER["sonographic_measurement.py
  SonographicMeasurement
  o measurement_type: str
  o term_bins: List[TermBin]
  o from_percentile() -> TermObservation"]
  TERMBIN["measurements/term_bin.py
  o PercentileRange
  o TermBin
  o category auto-detection"]
end

subgraph Observation["7 Ontology Observation"]
  OBS["term_observation.py
  TermObservation
  o hpo_id, hpo_label
  o observed: bool
  o percentile: float
  o to_phenotypic_feature()"]
end

subgraph Export["8 Phenopacket Export"]
  EXPORT["phenotypic_export.py
  PhenotypicExporter
  o Collects TermObservations
  o build_phenopacket()
  o to_json()"]
end

subgraph QC["9 Quality Control"]
  VALID["qc/validator.py (planned)
  o Schema validation
  o Ontology term checks
  o Completeness reports"]
end

subgraph Output["Outputs"]
  PP["Phenopackets v2 JSON
  o Subject metadata
  o phenotypicFeatures[]
  o measurements[]
  o metaData"]
  LOGS["QC Reports
  o Validation results
  o Provenance tracking"]
end

%% Connections
JSON --> PARSER
XLSX --> PARSER
PARSER --> GA
PARSER --> REF

YAML --> LOADER
LOADER --> TERMBIN
LOADER --> FACTORY

FACTORY --> MAPPER
TERMBIN -.-> MAPPER

REF --> MAPPER
GA --> MAPPER

MAPPER --> OBS
OBS --> EXPORT
EXPORT --> VALID

VALID --> PP
VALID --> LOGS

classDef input fill:#a8d5ff,stroke:#333,stroke-width:2px,color:#000000
classDef config fill:#fff4e6,stroke:#333,stroke-width:2px,color:#000000
classDef reference fill:#ffe6cc,stroke:#333,stroke-width:2px,color:#000000
classDef process fill:#d5ffa8,stroke:#333,stroke-width:2px,color:#000000
classDef output fill:#ffafcc,stroke:#333,stroke-width:2px,color:#000000

class JSON,XLSX,PARSER,GA input
class YAML,LOADER,FACTORY config
class REF,TERMBIN reference
class MAPPER,OBS,EXPORT process
class VALID,PP,LOGS output

Architecture Highlights:

Component	Old Approach	New Approach
Ontology Mapping	Hard-coded in subclasses	Declarative YAML configuration
Measurement Classes	One subclass per measurement (HC, BPD, FL...)	Single generic `SonographicMeasurement`
Normal Range Logic	Hard-coded if/else chains	Explicit `normal: true/false` in YAML
Intermediate Results	`MeasurementResult` with string bin keys	Direct percentile -> `TermObservation`
Dependencies	`MinimalTerm` from hpo-toolkit	Simple strings (hpo_id, hpo_label)
Extensibility	Code changes for new measurements	YAML edits only

Core Design Principles

Configuration over Code: HPO term mappings are defined declaratively in YAML, not hard-coded in Python classes

Key Architectural Change: Configuration-Driven Mapping

-OLD APPROACH (Hard-coded): TRANSFORMATION OVERVIEW:

flowchart LR
subgraph Old["Old Architecture (Hard-coded)"]
  direction TB
  O1["Raw Measurement
  HC = 175mm @ 20w6d"]
  O2["ReferenceRange.evaluate()"]
  O3["MeasurementResult
  bin_key = 'below_3p'"]
  O4["Hard-coded if/elif
  'below_3p' -> Microcephaly"]
  O5["TermObservation
  with MinimalTerm object"]
 
  O1 --> O2 --> O3 --> O4 --> O5
end

subgraph New["New Architecture (Data-driven)"]
  direction LR
  N1["Raw Measurement
  HC = 175mm @ 20w6d"]
  N2["FetalGrowthPercentiles
  calculate_percentile()"]
  N3["Percentile = 2.1"]
  N4["YAML Lookup
  [0, 3) -> HP:0000252"]
  N5["TermObservation
  hpo_id, hpo_label"]
 
  N1 --> N2 --> N3 --> N4 --> N5
end

style Old fill:#ffe6e6,stroke:#cc0000,stroke-width:2px
style New fill:#e6ffe6,stroke:#00cc00,stroke-width:2px

OLD APPROACH (Hard-coded):

# Each measurement had its own class with hard-coded logic
class HeadCircumferenceMeasurement(SonographicMeasurement):
def get_bin_to_term_mapping(self):
  return {
      "below_3p": MinimalTerm("HP:0000252", "Microcephaly"),
      "between_3p_5p": MinimalTerm("HP:0040195", "Decreased HC"),
      # ... 6 more hard-coded bins
  }

NEW APPROACH (Data-driven):

# data/mappings/biometry_hpo_mappings.yaml
head_circumference:
- min: 0
max: 3
id: "HP:0000252"
label: "Microcephaly"
normal: false

- min: 3
max: 5
id: "HP:0040195"
label: "Decreased head circumference"
normal: false
# ... all 8 ranges covering 0-100 percentile

# Python code loads configuration, no hard-coding needed
factory = MeasurementEvaluation()  # Loads YAML once
mapper = factory.get_measurement_mapper("head_circumference")
observation = mapper.from_percentile(2.1, gestational_age)
# Returns: TermObservation(hpo_id="HP:0000252", hpo_label="Microcephaly", ...)

Component Interaction Comparison

flowchart LR
subgraph Legacy["Legacy Architecture"]
  direction TB
  L1["HeadCircumferenceMeasurement
  (Subclass)"]
  L2["BipariatalDiameterMeasurement
  (Subclass)"]
  L3["FemurLengthMeasurement
  (Subclass)"]
  L4["Hard-coded mappings
  in each class"]
  L5["MeasurementResult
  (bin_key strings)"]
  L6["MinimalTerm objects
  from hpo-toolkit"]
 
  L1 --> L4
  L2 --> L4
  L3 --> L4
  L4 --> L5
  L5 --> L6
 
  style L1 fill:#ffcccc
  style L2 fill:#ffcccc
  style L3 fill:#ffcccc
  style L4 fill:#ffcccc
  style L5 fill:#ffcccc
  style L6 fill:#ffcccc
end

subgraph Modern["Modern Architecture"]
  direction TB
  M1["biometry_hpo_mappings.yaml
  (Single source of truth)"]
  M2["BiometryMappingLoader
  (Parses YAML)"]
  M3["MeasurementEvaluation
  (Factory)"]
  M4["SonographicMeasurement
  (Generic - works for ALL)"]
  M5["TermBin objects
  (Percentile ranges)"]
  M6["TermObservation
  (Simple data holder)"]
 
  M1 --> M2
  M2 --> M3
  M3 --> M4
  M4 --> M5
  M5 --> M6
 
  style M1 fill:#ccffcc
  style M2 fill:#ccffcc
  style M3 fill:#ccffcc
  style M4 fill:#ccffcc
  style M5 fill:#ccffcc
  style M6 fill:#ccffcc
end

Legacy -.->|Refactored to| Modern

Benefits of New Architecture:

Aspect	Legacy	Modern	Improvement
Lines of Code	~500 LOC across subclasses	~200 LOC + YAML	60% reduction
Adding Measurements	Write new Python class	Add YAML entry	No code changes
Changing Thresholds	Edit hard-coded values	Edit YAML values	Non-technical edits
Testing	Mock entire class hierarchy	Inject test config	Easier unit tests
Dependencies	Tight coupling to hpo-toolkit	Simple strings	Looser coupling
Maintainability	Changes require code review	Config can be validated	Faster iteration

Decision Flow: Processing a Single Measurement

flowchart LR
Start([Ultrasound Measurement]) --> Parse{Parse Input}
Parse -->|Success| ExtractGA[Extract Gestational Age]
Parse -->|Fail| Error1[Error: Invalid Input]

ExtractGA --> ExtractMeas[Extract Measurement Value]
ExtractMeas --> ValidateRange{Value in
Valid Range?}

ValidateRange -->|No| Error2[Error: Out of Range]
ValidateRange -->|Yes| InitFactory[Initialize Factory]

InitFactory --> LoadYAML{YAML Already
Loaded?}
LoadYAML -->|Yes| GetMapper
LoadYAML -->|No| LoadConfig[Load biometry_hpo_mappings.yaml]
LoadConfig --> ValidateYAML{YAML Valid?}
ValidateYAML -->|No| Error3[Error: Invalid Config]
ValidateYAML -->|Yes| GetMapper[Get Measurement Mapper]

GetMapper --> MapperExists{Mapper Exists
for Type?}
MapperExists -->|No| Error4[Error: Unknown Measurement]
MapperExists -->|Yes| CalcPercentile[Calculate Percentile
from Reference Data]

CalcPercentile --> InterpolateGA{GA in
Table?}
InterpolateGA -->|No| Interpolate[Interpolate Between Rows]
InterpolateGA -->|Yes| LookupThresholds
Interpolate --> LookupThresholds[Lookup Percentile Thresholds]

LookupThresholds --> CompareValue[Compare Value to Thresholds]
CompareValue --> FindPercentile[Determine Percentile Value]

FindPercentile --> IterateBins[Iterate Through TermBins]
IterateBins --> CheckFit{Percentile Fits
in Bin?}

CheckFit -->|No| NextBin[Try Next Bin]
NextBin --> MoreBins{More Bins
to Check?}
MoreBins -->|Yes| CheckFit
MoreBins -->|No| Error5[Error: No Matching Bin]

CheckFit -->|Yes| CreateObs[Create TermObservation]
CreateObs --> SetHPO[Set HPO ID & Label]
SetHPO --> SetObserved{normal
flag?}

SetObserved -->|true| SetNormal[observed = False
excluded = True]
SetObserved -->|false| SetAbnormal[observed = True
excluded = False]

SetNormal --> AddMetadata[Add Gestational Age
& Percentile]
SetAbnormal --> AddMetadata

AddMetadata --> ToFeature[Convert to
Phenotypic Feature]
ToFeature --> Success([TermObservation Ready])

style Start fill:#a8d5ff
style Success fill:#ccffcc
style Error1 fill:#ffcccc
style Error2 fill:#ffcccc
style Error3 fill:#ffcccc
style Error4 fill:#ffcccc
style Error5 fill:#ffcccc
style LoadYAML fill:#fff4e6
style GetMapper fill:#d5ffa8
style CalcPercentile fill:#ffe6cc
style CreateObs fill:#e3f2fd

Key Decision Points:

Input Validation: Ensures data format is correct
Range Validation: Checks biological plausibility (e.g., HC not negative)
Configuration Loading: One-time YAML load, then cached
Mapper Resolution: Factory pattern creates appropriate mapper
Percentile Calculation: Reference data lookup with interpolation
Bin Matching: Data-driven iteration through TermBins
Observation Creation: Sets HPO term and observed/excluded flags
Feature Export: Converts to Phenopacket-compliant format

Initialization vs. Runtime Phases

Understanding when things happen is crucial for performance and debugging:

flowchart LR
subgraph Init["Initialization Phase (Once)"]
  direction TB
  I1["Load YAML Configuration"]
  I2["Parse into PercentileRange
  & TermBin objects"]
  I3["Create mapping dictionary
  {measurement_type: [TermBins]}"]
  I4["Store in Factory"]
 
  I1 --> I2 --> I3 --> I4
 
  style I1 fill:#fff4e6
  style I2 fill:#fff4e6
  style I3 fill:#fff4e6
  style I4 fill:#fff4e6
end

subgraph Runtime["Runtime Phase (Per Measurement)"]
  direction TB
  R1["Get mapper from Factory
  O(1) dictionary lookup"]
  R2["Calculate percentile
  from reference data"]
  R3["Iterate through TermBins
  (typically 8 bins)"]
  R4["Create TermObservation
  when bin matches"]
  R5["Convert to Phenopacket
  feature"]
 
  R1 --> R2 --> R3 --> R4 --> R5
 
  style R1 fill:#d5ffa8
  style R2 fill:#d5ffa8
  style R3 fill:#d5ffa8
  style R4 fill:#d5ffa8
  style R5 fill:#d5ffa8
end

Init ==>|One-time cost| Runtime

Note1["YAML parsing happens ONCE
at application startup, not per measurement"]
Note2["Runtime is pure in-memory
operations - very fast"]

Init -.-> Note1
Runtime -.-> Note2

Performance Characteristics:

Phase	Operation	Complexity	Frequency
Init	Load & parse YAML	O(n) where n = total mappings	Once per application start
Init	Create TermBin objects	O(n)	Once per application start
Runtime	Get mapper	O(1) dictionary lookup	Per measurement
Runtime	Calculate percentile	O(log n) with interpolation	Per measurement
Runtime	Find matching bin	O(k) where k 8 bins	Per measurement
Runtime	Create observation	O(1)	Per measurement

Memory Footprint:

YAML File: ~11 KB on disk
Loaded Mappings: ~50 KB in memory (all 5 measurements)
Per Observation: ~1 KB (TermObservation object)

Name		Name	Last commit message	Last commit date
Latest commit History 298 Commits
.github		.github
data		data
docs		docs
requirements		requirements
src/prenatalppkt		src/prenatalppkt
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
mkdocs.yml		mkdocs.yml
pyproject.toml		pyproject.toml
pytest.ini		pytest.ini
ruff.toml		ruff.toml

Folders and files

Latest commit

History

Repository files navigation

prenatalppkt

Table of Contents

Overview

Motivation

Clinical Context

Technical Challenges

Solution

Architecture

Core Design Principles

System Layers

Key Architectural Choice: Configuration-Driven Mapping

Parsing Observer (JSON)

Package structure

Fetus Parsing Flow

Parsing ViewPoint (VPL)

System Class Diagram

Data Flow

End-to-End Processing Pipeline

Detailed Step-by-Step Flow

1. Configuration Loading (Happens Once)

2. Measurement Processing (Per Observation)

3. Multi-Measurement Workflow

Module Breakdown

Core Architecture Modules

src/prenatalppkt/measurements/term_bin.py

src/prenatalppkt/mapping_loader.py

src/prenatalppkt/measurement_eval.py

src/prenatalppkt/sonographic_measurement.py

src/prenatalppkt/term_observation.py

Reference Data Modules

src/prenatalppkt/biometry_reference.py

src/prenatalppkt/gestational_age.py

Data Parsing Modules

scripts/parse_nichd_raw.py

scripts/parse_intergrowth_txt_all.py

Export Modules

src/prenatalppkt/phenotypic_export.py

Configuration Guide

YAML Mapping Structure

Complete Example: Head Circumference

Validation Rules

Adding a New Measurement

Customizing Normal Ranges

Inputs and Outputs

Input Formats

1. Observer JSON

2. ViewPoint Excel (.xlsx)

Output Format: Phenopacket v2

Installation

Prerequisites

Install from Source

Install Dependencies Only

Optional: Documentation Build

Usage Examples

Basic Workflow

Batch Processing Multiple Measurements

Custom Configuration

Testing with Mock Configuration

Testing

Run All Tests

Run Specific Test Module

Run with Coverage

Linting and Formatting

Test Coverage

Core Functionality Tests

Reference Data Tests

Export Tests

Parsing Tests

Test Data

Future Roadmap

Phase 1: Core Functionality (Current Release)

Phase 2: Input Parsing (In Progress)

Phase 3: Quality Control (Planned)

Phase 4: Phenopacket Builder (Planned)

Phase 5: CLI and Web API (Not Planned Yet)

Phase 6: Advanced Features (Not Planned Yet)

`src/prenatalppkt/measurements/term_bin.py`

`src/prenatalppkt/mapping_loader.py`

`src/prenatalppkt/measurement_eval.py`

`src/prenatalppkt/sonographic_measurement.py`

`src/prenatalppkt/term_observation.py`

`src/prenatalppkt/biometry_reference.py`

`src/prenatalppkt/gestational_age.py`

`scripts/parse_nichd_raw.py`

`scripts/parse_intergrowth_txt_all.py`

`src/prenatalppkt/phenotypic_export.py`

Packages