Skip to content

feat: embed reference M5 checksums in BAM headers#93

Open
jayhesselberth wants to merge 1 commit intomainfrom
feat/bam-header-m5-checksums
Open

feat: embed reference M5 checksums in BAM headers#93
jayhesselberth wants to merge 1 commit intomainfrom
feat/bam-header-m5-checksums

Conversation

@jayhesselberth
Copy link
Member

Summary

  • Add ref_dict rule in aatrnaseq-reference.smk that runs samtools dict to generate a sequence dictionary with MD5 checksums from the validated reference
  • Modify bwa_align in aatrnaseq-process.smk to reheader aligned BAMs, replacing @SQ lines with those from the dict (adding M5 and UR fields) while preserving @HD and @PG lines
  • M5 fields propagate automatically to downstream BAMs (inject_ubam_tags → classify_charging → transfer_bam_tags → finalize_bam) since those rules copy headers via pysam

Test plan

  • pixi run dry-run — DAG resolves, ref_dict appears before bwa_align
  • snakefmt passes
  • pixi run test — full test run with test data
  • Verify final BAM headers: samtools view -H .tests/outputs/bam/final/test_sample/test_sample.bam | grep '^@SQ' shows M5: and UR: fields

🤖 Generated with Claude Code

Add ref_dict rule to generate sequence dictionary via samtools dict,
then reheader aligned BAMs to include M5 and UR fields on @sq lines.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant