Generate sequence structure mapping files for any organism by jmbilodeaux · Pull Request #53 · rnabioco/aa-tRNA-seq-pipeline

jmbilodeaux · 2025-03-10T16:36:33Z

test data is charged sequencing data from T4 infection of WT E. coli.

test.py adds the charged adapter sequence to the 5' or 3' end of each sequence. This needs to be fixed since we use 'charged' and 'uncharged' adapters that have different 3' sequences.

The input is a converted .sto file (esl-reformat sto -> afa) containing conserved secondary structure across tRNAs, including phage. The reference sequence is used to create a mapping of sequence to structure position that can be used to correct nanopore sequence space position to structural position.

Too chirpy. Run `snakefmt workflow/rules/*`

jayhesselberth · 2025-03-10T16:51:16Z

I am a little confused. This PR seems to only have recent changes that I have already merged into main.

jmbilodeaux · 2025-03-10T19:32:12Z

@jayhesselberth hmm.. okay im not sure what went wrong. let me retry

Closes #40 * Set file extension * Eliminate input_format option

And reorganize rule files and summary outputs

Always build from source. The linux dist doesn't run on bodhi because glibc is out of date.

And snakefmt

Removing it causes it to fail.

- Convert inline samples list to structured tables showing sample run paths and pipeline configuration parameters from config - Combine charging summary into single cross-sample table - Move reference sequence similarity section to appendix - Fix empty "Top tRNAs by Read Count" tabs by removing conflicting tbl-cap option and switching to gt::as_raw_html() for reliable rendering in Quarto panel-tabset loops Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- Replace sample table with inline list and link to detailed appendix - Remove redundant Pipeline Configuration section (used overlay config) - Add Sample Details section in appendix with absolute run paths - Consolidate three manifest tables into single annotated gt table with row groups (Execution, Pipeline, Configuration, Tools) - Add reference mode/source to manifest and report (build vs validate) - Resolve TSV sample paths to absolute in setup chunk - Remove .appendix class so section renders in TOC Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Pass raw POD5 directories directly to WarpDemuX and pod5 filter instead of merging ~1TB of data into a single file first. Both tools natively support directory/multi-file inputs, avoiding unnecessary data duplication. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Apply snakefmt to common.smk and aatrnaseq-report.smk. Fix TestProcessBam tests to pass adapter_3p as list of (name, seq) tuples matching the process_bam signature after the multi-adapter change. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Replace 6 individual facet_wrap figures with a single Per-Sample QC Overview tabset where each tab shows a 6-panel patchwork (alignment funnel, charging density, tRNA abundance, tRNA charging, error frequency, position coverage). Remove Base Calling Quality section (figures moved into patchwork). Filter tRNA-Und from charging/CPM data. Fix isodecoder regex to handle numbered amino acids. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

RNA reads should never map to the reverse strand. Change samtools view flag from -F 4 (unmapped only) to -F 20 (unmapped + reverse strand) in bwa_align to discard these reads immediately after alignment. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Remove hardcoded awk filter ($4 <= 25) that pre-filtered reads by mapping position. This is redundant with the proper 5' truncation filter in filter_reads.py (-5 24) and was fragile (raw SAM text parsing with an undocumented magic number). Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>

Parallel rebasecall GPU jobs downloading modified bases models (pseU, m5C, inosine_m6A) simultaneously into the shared BeeGFS models directory causes extraction failures. Add a download_mod_models localrule that runs once on the submission node before any rebasecall jobs, ensuring models are pre-downloaded. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Add documentation for the generate_squiggy_session rule and script to CLAUDE.md, outputs guide, scripts reference, and rules reference. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Add v0.1.1 CHANGELOG entry covering 34 commits since v0.1.0 including Quarto QC report, per-tRNA odds ratios, reference similarity QC, and classify_charging CPU migration. Update rules-reference, overview, outputs, scripts-reference, and README to document new rules, updated commands, and new scripts. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

) * fix: decouple dorado tags from bwa alignment to resolve OOM at 48GB Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * style: fix snakefmt formatting in inject_ubam_tags rule Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: support multi-adapter validation for dual-adapter references The validate_reference rule only checked against a single 3' adapter, causing validation failure for references with both charged (v2) and uncharged (v1) adapter sequences. Now passes all configured adapters and validates each sequence against any matching adapter. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: update unit tests for multi-adapter validate_reference() signature Wrap 3' adapter arg in a list for all 7 TestValidateReference calls to match the updated function signature from ebcbba2. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

- Increase bwa_align SLURM resources (96GB mem, 12h runtime, 16 cpus) to handle larger samples that OOM at 64GB - Use multithreaded samtools sort (-@ 4) with reduced per-thread memory - Add generate_squiggy_session to localrules - Add --no-checksums to squiggy session generation Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Remove the separate `demux` pixi environment by merging WarpDemuX dependencies into the default env and unified setup script. Rename warpdemux.smk to demux.smk and add edx_concordance rule for cross-checking WDX barcode assignment vs 3' EDX adapter identity. Update all docs and config to reflect the simplified workflow. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

jayhesselberth added 3 commits February 25, 2025 12:07

Make yeast mito tRNA ref

d7c0cd8

Remove linters for now

0b8e677

Too chirpy. Run `snakefmt workflow/rules/*`

ruff format

7be3485

jmbilodeaux requested a review from jayhesselberth March 10, 2025 16:36

jayhesselberth added 24 commits March 12, 2025 21:31

Download and install dorado and model (#56)

e553083

Call modified bases

dafa2c0

Update model download strategy

c62e4a9

Install modkit (#59)

8020289

Use cli for cargo install

642d452

Install binary modkit on linux x86

2bdfcc8

use bare version number

0a1b719

Eliminate support of FAST5 files (#43)

d287cdf

Closes #40 * Set file extension * Eliminate input_format option

Remove optional bam inputs

b8823ca

Increase threads for pod5 merge

d45e40a

Add modkit rules

59c9af6

And reorganize rule files and summary outputs

Simplify modkit install

e2f8399

Always build from source. The linux dist doesn't run on bodhi because glibc is out of date.

add bedtools

33154b4

update rule

4e848b6

modkit testing

e4f006c

Merge branch 'main' of ssh://github.com/rnabioco/aa-tRNA-seq-pipeline

9f0bfc9

Reorganize output directory

fb135b6

Rename charging tags during transfer

bc2a5d8

Update tag for charging table

2664321

And snakefmt

Update readme

24faab9

Reduce dorado verbosity

4dac8fc

Restore -v option in dorado

dacba62

Removing it causes it to fail.

Add rule for modkit extract full

1b57916

Move comments out of shell call

c1efe37

jayhesselberth and others added 4 commits January 31, 2026 04:06

ignore quarto build files

8cff88c

jayhesselberth temporarily deployed to github-pages January 31, 2026 17:17 — with GitHub Actions Inactive

jayhesselberth and others added 11 commits January 31, 2026 10:34

fix: download dorado mod base models

ca4c646

fix: get pipeline commit gracefully

875f0c3

feat: add pairwise modification odds ratios (#85)

b9ff61c

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>

fix: removed "protected" directives

567223b

add pheatmtp to pixi

95ddb3f

docs: document squiggy-session.json across all docs

8c0a805

Add documentation for the generate_squiggy_session rule and script to CLAUDE.md, outputs guide, scripts reference, and rules reference. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

jayhesselberth temporarily deployed to github-pages February 10, 2026 01:30 — with GitHub Actions Inactive

jayhesselberth and others added 6 commits February 9, 2026 18:34

style: fix snakefmt formatting in aatrnaseq-modifications.smk

e719702

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

fix: gitignore mod_models_ready marker file

736497d

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

refactor: switch classify_charging to CPU with parallel workers

7779a5f

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

refactor: rewrite odds ratios as per-tRNA analysis using modkit calls

651b530

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

feat: add nvitop GPU monitoring dependency

b1dadcf

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

jayhesselberth temporarily deployed to github-pages February 11, 2026 12:19 — with GitHub Actions Inactive

jayhesselberth and others added 4 commits February 13, 2026 10:15

chore: bump alloc for modkit rule

4d13970

jayhesselberth temporarily deployed to github-pages February 18, 2026 12:15 — with GitHub Actions Inactive

jayhesselberth closed this Feb 18, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Generate sequence structure mapping files for any organism #53

Generate sequence structure mapping files for any organism #53
jmbilodeaux wants to merge 120 commits intoseq2structfrom
main

jmbilodeaux commented Mar 10, 2025

Uh oh!

jayhesselberth commented Mar 10, 2025

Uh oh!

jmbilodeaux commented Mar 10, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

jmbilodeaux commented Mar 10, 2025

Uh oh!

jayhesselberth commented Mar 10, 2025

Uh oh!

jmbilodeaux commented Mar 10, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants