Generate sequence structure mapping files for any organism #53
Closed
jmbilodeaux wants to merge 120 commits intoseq2structfrom
Closed
Generate sequence structure mapping files for any organism #53jmbilodeaux wants to merge 120 commits intoseq2structfrom
jmbilodeaux wants to merge 120 commits intoseq2structfrom
Conversation
Too chirpy. Run `snakefmt workflow/rules/*`
Member
|
I am a little confused. This PR seems to only have recent changes that I have already merged into main. |
Collaborator
Author
|
@jayhesselberth hmm.. okay im not sure what went wrong. let me retry |
Closes #40 * Set file extension * Eliminate input_format option
And reorganize rule files and summary outputs
Always build from source. The linux dist doesn't run on bodhi because glibc is out of date.
And snakefmt
Removing it causes it to fail.
- Convert inline samples list to structured tables showing sample run paths and pipeline configuration parameters from config - Combine charging summary into single cross-sample table - Move reference sequence similarity section to appendix - Fix empty "Top tRNAs by Read Count" tabs by removing conflicting tbl-cap option and switching to gt::as_raw_html() for reliable rendering in Quarto panel-tabset loops Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Replace sample table with inline list and link to detailed appendix - Remove redundant Pipeline Configuration section (used overlay config) - Add Sample Details section in appendix with absolute run paths - Consolidate three manifest tables into single annotated gt table with row groups (Execution, Pipeline, Configuration, Tools) - Add reference mode/source to manifest and report (build vs validate) - Resolve TSV sample paths to absolute in setup chunk - Remove .appendix class so section renders in TOC Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Pass raw POD5 directories directly to WarpDemuX and pod5 filter instead of merging ~1TB of data into a single file first. Both tools natively support directory/multi-file inputs, avoiding unnecessary data duplication. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Apply snakefmt to common.smk and aatrnaseq-report.smk. Fix TestProcessBam tests to pass adapter_3p as list of (name, seq) tuples matching the process_bam signature after the multi-adapter change. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Replace 6 individual facet_wrap figures with a single Per-Sample QC Overview tabset where each tab shows a 6-panel patchwork (alignment funnel, charging density, tRNA abundance, tRNA charging, error frequency, position coverage). Remove Base Calling Quality section (figures moved into patchwork). Filter tRNA-Und from charging/CPM data. Fix isodecoder regex to handle numbered amino acids. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
RNA reads should never map to the reverse strand. Change samtools view flag from -F 4 (unmapped only) to -F 20 (unmapped + reverse strand) in bwa_align to discard these reads immediately after alignment. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Remove hardcoded awk filter ($4 <= 25) that pre-filtered reads by mapping position. This is redundant with the proper 5' truncation filter in filter_reads.py (-5 24) and was fragile (raw SAM text parsing with an undocumented magic number). Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
Parallel rebasecall GPU jobs downloading modified bases models (pseU, m5C, inosine_m6A) simultaneously into the shared BeeGFS models directory causes extraction failures. Add a download_mod_models localrule that runs once on the submission node before any rebasecall jobs, ensuring models are pre-downloaded. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add documentation for the generate_squiggy_session rule and script to CLAUDE.md, outputs guide, scripts reference, and rules reference. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add v0.1.1 CHANGELOG entry covering 34 commits since v0.1.0 including Quarto QC report, per-tRNA odds ratios, reference similarity QC, and classify_charging CPU migration. Update rules-reference, overview, outputs, scripts-reference, and README to document new rules, updated commands, and new scripts. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
) * fix: decouple dorado tags from bwa alignment to resolve OOM at 48GB Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * style: fix snakefmt formatting in inject_ubam_tags rule Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: support multi-adapter validation for dual-adapter references The validate_reference rule only checked against a single 3' adapter, causing validation failure for references with both charged (v2) and uncharged (v1) adapter sequences. Now passes all configured adapters and validates each sequence against any matching adapter. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: update unit tests for multi-adapter validate_reference() signature Wrap 3' adapter arg in a list for all 7 TestValidateReference calls to match the updated function signature from ebcbba2. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
- Increase bwa_align SLURM resources (96GB mem, 12h runtime, 16 cpus) to handle larger samples that OOM at 64GB - Use multithreaded samtools sort (-@ 4) with reduced per-thread memory - Add generate_squiggy_session to localrules - Add --no-checksums to squiggy session generation Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Remove the separate `demux` pixi environment by merging WarpDemuX dependencies into the default env and unified setup script. Rename warpdemux.smk to demux.smk and add edx_concordance rule for cross-checking WDX barcode assignment vs 3' EDX adapter identity. Update all docs and config to reflect the simplified workflow. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
test data is charged sequencing data from T4 infection of WT E. coli.
test.py adds the charged adapter sequence to the 5' or 3' end of each sequence. This needs to be fixed since we use 'charged' and 'uncharged' adapters that have different 3' sequences.
The input is a converted .sto file (esl-reformat sto -> afa) containing conserved secondary structure across tRNAs, including phage. The reference sequence is used to create a mapping of sequence to structure position that can be used to correct nanopore sequence space position to structural position.