Skip to content

Refactor extract_read_features_from_bam to support pysam and samtools backends#221

Merged
jkmckenna merged 1 commit into0.3.0from
codex/refactor-extract_read_features_from_bam
Jan 16, 2026
Merged

Refactor extract_read_features_from_bam to support pysam and samtools backends#221
jkmckenna merged 1 commit into0.3.0from
codex/refactor-extract_read_features_from_bam

Conversation

@jkmckenna
Copy link
Owner

Motivation

  • Bring extract_read_features_from_bam in line with other bam_functions utilities by supporting both a Python (pysam) and CLI (samtools) backend.
  • Make the function more robust to missing qualities and provide consistent numeric outputs for downstream QC steps.

Description

  • Changed function signature to extract_read_features_from_bam(bam_file_path: str | Path, samtools_backend: str | None = "auto") -> Dict[str, List[float]] and added type hints and a docstring.
  • Implemented a Python backend using _require_pysam() that reads via pysam.AlignmentFile and computes median base quality, mapped length via get_blocks(), and reference lengths from bam_file.references/lengths.
  • Implemented a CLI backend that parses samtools view -H for @SQ lengths and streams samtools view -F 4 output, computing mapped length from CIGAR strings and converting ASCII qualities to Phred scores.
  • Added defensive handling for missing qualities/sequences and normalized returned metric types to floats.
  • Added unit tests for both the python and cli backend paths in tests/unit/informatics/test_tool_backends.py to cover the new behavior.

Testing

  • Added unit tests test_extract_read_features_from_bam_python_backend and test_extract_read_features_from_bam_cli_backend to cover both backends; these tests were committed with the change.
  • Ran pytest tests/unit/informatics/test_tool_backends.py -q which failed during test collection with importlib.metadata.PackageNotFoundError: No package metadata was found for smftools, so automated test execution in this environment did not complete successfully.
  • The new tests are self-contained and mock pysam / subprocess calls; they can be executed locally or in CI once the package is installable in the test environment (e.g. pip install -e . before running pytest).

Codex Task

@jkmckenna jkmckenna merged commit 675a61d into 0.3.0 Jan 16, 2026
7 of 8 checks passed
@jkmckenna jkmckenna deleted the codex/refactor-extract_read_features_from_bam branch January 20, 2026 21:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant