Skip to content

Generate sequence structure mapping files for any organism #53

Closed
jmbilodeaux wants to merge 120 commits intoseq2structfrom
main
Closed

Generate sequence structure mapping files for any organism #53
jmbilodeaux wants to merge 120 commits intoseq2structfrom
main

Conversation

@jmbilodeaux
Copy link
Collaborator

test data is charged sequencing data from T4 infection of WT E. coli.

test.py adds the charged adapter sequence to the 5' or 3' end of each sequence. This needs to be fixed since we use 'charged' and 'uncharged' adapters that have different 3' sequences.

The input is a converted .sto file (esl-reformat sto -> afa) containing conserved secondary structure across tRNAs, including phage. The reference sequence is used to create a mapping of sequence to structure position that can be used to correct nanopore sequence space position to structural position.

@jayhesselberth
Copy link
Member

I am a little confused. This PR seems to only have recent changes that I have already merged into main.

@jmbilodeaux
Copy link
Collaborator Author

@jayhesselberth hmm.. okay im not sure what went wrong. let me retry

jayhesselberth and others added 4 commits January 31, 2026 04:06
- Convert inline samples list to structured tables showing sample
  run paths and pipeline configuration parameters from config
- Combine charging summary into single cross-sample table
- Move reference sequence similarity section to appendix
- Fix empty "Top tRNAs by Read Count" tabs by removing conflicting
  tbl-cap option and switching to gt::as_raw_html() for reliable
  rendering in Quarto panel-tabset loops

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Replace sample table with inline list and link to detailed appendix
- Remove redundant Pipeline Configuration section (used overlay config)
- Add Sample Details section in appendix with absolute run paths
- Consolidate three manifest tables into single annotated gt table
  with row groups (Execution, Pipeline, Configuration, Tools)
- Add reference mode/source to manifest and report (build vs validate)
- Resolve TSV sample paths to absolute in setup chunk
- Remove .appendix class so section renders in TOC

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Pass raw POD5 directories directly to WarpDemuX and pod5 filter
instead of merging ~1TB of data into a single file first. Both tools
natively support directory/multi-file inputs, avoiding unnecessary
data duplication.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
jayhesselberth and others added 11 commits January 31, 2026 10:34
Apply snakefmt to common.smk and aatrnaseq-report.smk. Fix
TestProcessBam tests to pass adapter_3p as list of (name, seq) tuples
matching the process_bam signature after the multi-adapter change.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Replace 6 individual facet_wrap figures with a single Per-Sample QC
Overview tabset where each tab shows a 6-panel patchwork (alignment
funnel, charging density, tRNA abundance, tRNA charging, error
frequency, position coverage). Remove Base Calling Quality section
(figures moved into patchwork). Filter tRNA-Und from charging/CPM
data. Fix isodecoder regex to handle numbered amino acids.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
RNA reads should never map to the reverse strand. Change samtools view
flag from -F 4 (unmapped only) to -F 20 (unmapped + reverse strand)
in bwa_align to discard these reads immediately after alignment.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Remove hardcoded awk filter ($4 <= 25) that pre-filtered reads by
mapping position. This is redundant with the proper 5' truncation
filter in filter_reads.py (-5 24) and was fragile (raw SAM text
parsing with an undocumented magic number).

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
Parallel rebasecall GPU jobs downloading modified bases models (pseU,
m5C, inosine_m6A) simultaneously into the shared BeeGFS models
directory causes extraction failures. Add a download_mod_models
localrule that runs once on the submission node before any rebasecall
jobs, ensuring models are pre-downloaded.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add documentation for the generate_squiggy_session rule and script
to CLAUDE.md, outputs guide, scripts reference, and rules reference.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
jayhesselberth and others added 6 commits February 9, 2026 18:34
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add v0.1.1 CHANGELOG entry covering 34 commits since v0.1.0 including
Quarto QC report, per-tRNA odds ratios, reference similarity QC, and
classify_charging CPU migration. Update rules-reference, overview,
outputs, scripts-reference, and README to document new rules, updated
commands, and new scripts.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
jayhesselberth and others added 4 commits February 13, 2026 10:15
)

* fix: decouple dorado tags from bwa alignment to resolve OOM at 48GB

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* style: fix snakefmt formatting in inject_ubam_tags rule

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: support multi-adapter validation for dual-adapter references

The validate_reference rule only checked against a single 3' adapter,
causing validation failure for references with both charged (v2) and
uncharged (v1) adapter sequences. Now passes all configured adapters
and validates each sequence against any matching adapter.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: update unit tests for multi-adapter validate_reference() signature

Wrap 3' adapter arg in a list for all 7 TestValidateReference calls to
match the updated function signature from ebcbba2.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
- Increase bwa_align SLURM resources (96GB mem, 12h runtime, 16 cpus)
  to handle larger samples that OOM at 64GB
- Use multithreaded samtools sort (-@ 4) with reduced per-thread memory
- Add generate_squiggy_session to localrules
- Add --no-checksums to squiggy session generation

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Remove the separate `demux` pixi environment by merging WarpDemuX
dependencies into the default env and unified setup script. Rename
warpdemux.smk to demux.smk and add edx_concordance rule for
cross-checking WDX barcode assignment vs 3' EDX adapter identity.
Update all docs and config to reflect the simplified workflow.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants