Skip to content

Resonance: Genomic Annotation Version Control & Semantic Reconciliation Platform

Latest

Choose a tag to compare

@Arnav261 Arnav261 released this 30 Jan 03:51
7cf2ca2

Genomic Coordinate Liftover with ML Confidence Prediction

Version: 5.0.1
Release date: 2026-01-30
Status: Active development
Demo available at https://genomic-annotation-version-controller.onrender.com

Highlights

  • Parallelized batch liftover and streaming VCF support for improved throughput on large files.
  • Official CLI (liftover-cli) for reproducible local batch jobs and pipeline integration.
  • Docker image and recommended run patterns for reproducible demos and deployments.
  • Expanded ML training/validation pipeline and SHAP-compatible explainability export for per-variant interpretation.
  • Optional LightGBM backend supported for faster training/inference.
  • Basic CI and smoke tests added to improve reliability of core functionality.

New features

  • Parallel batch processing
    • Configurable worker count for batch liftover jobs.
    • Chunked processing mode to control memory footprint for large VCFs.
  • Streaming VCF liftover endpoint
    • POST /liftover/stream accepts VCF upload and streams transformed variants back, reducing temporary disk use.
  • ML / explainability
    • SHAP-compatible per-variant explainability export (JSON) added to inference path.
    • Option to use LightGBM backend (faster training/inference; drop-in flag in training/inference configs).
  • Expanded validation dataset
    • Scripts and manifest for assembling a larger RefSeq-derived training/validation set included (see docs/validation/).
    • NOTE: public dataset downloads are scripted; some sources require direct download and are NOT bundled in the repo.

Improvements / Enhancements

  • Improved API
    • Health endpoint expanded to include ML model readiness, recent validation artifact timestamp, and worker pool status.
    • Batch endpoint supports asynchronous job submission with job status polling.
  • Performance and robustness
    • I/O and memory improvements for streaming and chunked VCF processing.
    • Worker pool is resilient to individual variant failures; failures are recorded per-variant and do not abort the entire job.
  • Documentation
    • Reworked README sections for Quick Start (virtualenv and Docker), CLI examples, and upgrade notes.
    • New reproducible training runbook (small dataset) and example notebooks added under docs/examples/.

Bug fixes

  • Fix: liftover chain parsing bug that produced incorrect chain agreement counts in rare chain overlap cases.
  • Fix: off-by-one coordinate handling in VCF streaming that affected indel normalization in some edge cases.
  • Fix: API error responses standardized (consistent JSON schema with error, code, details).
  • Fix: model calibration step now respects seed control for reproducible calibration artifacts.

Breaking changes / Migration notes

  • Config change: the default chain-files path is now app/data/chains/ (same as before) but the containerized recommended mount point is explicit in the Docker examples. If you used a different layout, update scripts or point LIFTOVER_CHAIN_DIR in environment variables.
  • SHAP output: explainability export format is JSON with a new schema (explainability.version: "1.0"). If you parse previous explainability tables, update your parsers to accept the new JSON schema.

Known issues

  • ML training reproducibility: while training scripts are included, full-size RefSeq downloads are large and not fully included; exact AUC and calibration numbers depend on the final curated dataset and random seeds.
  • Multi-species liftover: experimental and partial. Human hg19↔hg38 is the primary supported path.
  • Large structural variants and complex rearrangements may still fail liftover or receive low-confidence scores — these are flagged but not automatically resolved.
  • Security / privacy: the tool is not hardened for protected human data. Users must ensure compliance with local privacy policies before processing real human genomic data.

Documentation & examples

  • Quick Start, CLI examples, and Docker usage: see README.md.
  • Reproducible small training run and example notebooks: docs/examples/.
  • Validation scripts and manifests: docs/validation/.
  • API reference: available at runtime via FastAPI OpenAPI docs (default: http://localhost:8000/docs)