Skip to content

Commit 7d651c9

Browse files
docs: update CHANGELOG, rules, and output docs for v0.1.1
Add v0.1.1 CHANGELOG entry covering 34 commits since v0.1.0 including Quarto QC report, per-tRNA odds ratios, reference similarity QC, and classify_charging CPU migration. Update rules-reference, overview, outputs, scripts-reference, and README to document new rules, updated commands, and new scripts. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
1 parent b1dadcf commit 7d651c9

File tree

6 files changed

+396
-20
lines changed

6 files changed

+396
-20
lines changed

CHANGELOG.md

Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,33 @@
22

33
All notable changes to the aa-tRNA-seq pipeline are documented in this file.
44

5+
## [v0.1.1] - 2026-02-11
6+
7+
### Added
8+
- Quarto QC report with per-sample tabs (#81)
9+
- Per-tRNA pairwise modification odds ratios (#85)
10+
- Reference sequence similarity QC (#84)
11+
- Squiggy session JSON export for Positron IDE
12+
- Utility to collapse redundant GtRNAdb FASTA sequences
13+
- Multiple 3' adapter support for PT tag detection
14+
- Skip mode for reference validation
15+
- Pre-download dorado mod base models rule (avoids race conditions)
16+
- nvitop GPU monitoring dependency
17+
18+
### Changed
19+
- `classify_charging` switched from GPU to CPU with parallel workers (8 threads)
20+
- WarpDemuX workflow simplified: eliminated `merge_pods_for_demux`, passes raw POD5 dirs directly
21+
- `bwa_align` filtering changed from `-F 4` to `-F 20` (also excludes reverse-strand reads)
22+
- Removed redundant awk position filter from `bwa_align`
23+
- Removed `protected()` directive from `rebasecall` output
24+
25+
### Fixed
26+
- Race condition when parallel GPU jobs download dorado modification models simultaneously
27+
- Reverse-strand reads not filtered at alignment step
28+
- Redundant awk position filter in `bwa_align` superseded by adapter-based filtering
29+
- Graceful fallback for `get_pipeline_commit` when git unavailable
30+
- Various snakefmt formatting and test corrections
31+
532
## [v0.1.0] - 2025-01-16
633

734
### Added

README.md

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -123,6 +123,10 @@ flowchart TD
123123
G --> J[bcerror<br/>basecalling errors]
124124
G --> K[align_stats]
125125
G --> L[modkit pileups]
126+
L -.-> M[odds_ratios<br/>pairwise mod ORs]
127+
H -.-> M
128+
K -.-> N[qc_report<br/>Quarto HTML]
129+
H -.-> N
126130
end
127131
128132
POD5 -.-> W
@@ -141,7 +145,7 @@ Given a directory of POD5 files, this pipeline:
141145

142146
The classification generates ML tag values (0-255) indicating the likelihood of aminoacylation. By default, ML values of 200-255 are treated as charged, and values <200 as uncharged. This threshold can be adjusted via the `ml-threshold` parameter in the `get_cca_trna_cpm` rule.
143147

144-
The final steps of the pipeline calculate a number of outputs that may be useful for analysis and visualization, including normalized counts for charged and uncharged tRNA (`get_cca_trna_cpm`), basecalling error values (`bcerror`), alignment statistics (`align_stats`) and information on raw nanopore signal from Remora (`remora_signal_stats`).
148+
The final steps of the pipeline calculate a number of outputs that may be useful for analysis and visualization, including normalized counts for charged and uncharged tRNA (`get_cca_trna_cpm`), basecalling error values (`bcerror`), alignment statistics (`align_stats`), information on raw nanopore signal from Remora (`remora_signal_stats`), per-tRNA pairwise modification odds ratios (`compute_odds_ratios`), reference sequence similarity QC (`compute_reference_similarity`), and a combined Quarto QC report (`render_combined_qc_report`).
145149

146150
### Remora classification
147151

docs/user-guide/outputs.md

Lines changed: 67 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,9 @@ This guide documents all output files produced by the pipeline.
1111
├── fq/ # Extracted FASTQ files
1212
├── summary/ # Analysis outputs
1313
│ ├── tables/ # Tabular summaries
14-
│ └── modkit/ # Modification calling
14+
│ ├── modkit/ # Modification calling
15+
│ └── qc/ # Reference QC metrics
16+
├── reports/ # Rendered QC reports
1517
├── demux/ # Demultiplexing outputs (if enabled)
1618
├── logs/ # Rule execution logs
1719
└── squiggy-session.json # Squiggy session file for Positron
@@ -37,11 +39,18 @@ flowchart TB
3739
subgraph Outputs
3840
H[summary/tables/<br/>Charging & Stats]
3941
I[summary/modkit/<br/>Modifications]
42+
J[summary/qc/<br/>Reference similarity]
43+
K[summary/tables/<br/>Odds ratios]
44+
L[reports/<br/>QC report]
4045
end
4146
4247
A --> B --> C --> D --> E --> F --> G
4348
G --> H
4449
G --> I
50+
G --> J
51+
I --> K
52+
H --> K
53+
H --> L
4554
```
4655

4756
## Core Outputs
@@ -209,6 +218,59 @@ Individual modification calls per read.
209218

210219
Comprehensive modification information including all modkit fields.
211220

221+
## Reference Similarity Matrix
222+
223+
`summary/qc/reference_similarity.tsv`
224+
225+
Pairwise sequence similarity matrix for the reference FASTA, useful for identifying potential cross-mapping issues.
226+
227+
!!! info "Separate invocation"
228+
This rule is not part of the default pipeline outputs. Run it explicitly:
229+
```bash
230+
pixi run snakemake compute_reference_similarity --configfile=config/config.yml
231+
```
232+
233+
**Format:** Square TSV matrix with sequence names as row and column headers, values are percent identity (0-100).
234+
235+
## Modification Odds Ratios
236+
237+
`summary/tables/{sample}/{sample}.odds_ratios.tsv.gz`
238+
239+
Per-tRNA pairwise modification odds ratios testing whether modification at one position is correlated with modification at another position (or with charging status).
240+
241+
!!! info "Separate invocation"
242+
This rule is not part of the default pipeline outputs. Run it explicitly:
243+
```bash
244+
pixi run snakemake compute_odds_ratios --configfile=config/config.yml
245+
```
246+
247+
| Column | Description |
248+
|--------|-------------|
249+
| `tRNA` | Reference tRNA name |
250+
| `pos1` | First position |
251+
| `pos2` | Second position (999 = charging) |
252+
| `n00`, `n01`, `n10`, `n11` | 2x2 contingency table counts |
253+
| `total_obs` | Total observations |
254+
| `odds_ratio` | Odds ratio |
255+
| `log_odds_ratio` | Log odds ratio |
256+
| `se_log_or` | Standard error of log OR |
257+
| `ci_lower`, `ci_upper` | 95% confidence interval |
258+
| `fisher_or` | Fisher's exact test OR |
259+
| `p_value` | Fisher's exact test p-value |
260+
| `p_adjusted` | BH-adjusted p-value |
261+
262+
## QC Report
263+
264+
`reports/qc_report.html`
265+
266+
A combined Quarto HTML report with per-sample QC tabs, including alignment statistics, charging distributions, and basecalling error metrics.
267+
268+
!!! info "Separate invocation"
269+
This report requires the `report` pixi environment:
270+
```bash
271+
pixi run -e report snakemake render_combined_qc_report --configfile=config/config.yml
272+
```
273+
212274
## Squiggy Session File
213275

214276
`squiggy-session.json`
@@ -258,7 +320,7 @@ Merged POD5 file containing all raw signal data for the sample.
258320

259321
`bam/rebasecall/{sample}/{sample}.rbc.bam`
260322

261-
Dorado output with basecalls and move tables. Protected output (not deleted).
323+
Dorado output with basecalls and move tables.
262324

263325
### Aligned BAM
264326

@@ -325,6 +387,9 @@ Approximate file sizes for a typical sample:
325387
| Charging CPM | 10-50 KB |
326388
| Charging Prob | 1-10 MB |
327389
| Modkit pileup | 1-5 MB |
390+
| Odds ratios | 100 KB-1 MB |
391+
| Reference similarity | 10-500 KB |
392+
| QC report (HTML) | 1-5 MB |
328393

329394
## Cleanup
330395

docs/workflow/overview.md

Lines changed: 36 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,8 @@ flowchart TB
1313
B[aatrnaseq-charging.smk<br/>Charging analysis]
1414
C[aatrnaseq-qc.smk<br/>Quality control]
1515
D[aatrnaseq-modifications.smk<br/>Modification calling]
16+
OR[aatrnaseq-odds-ratios.smk<br/>Odds ratio analysis]
17+
R[aatrnaseq-report.smk<br/>QC report]
1618
E[warpdemux.smk<br/>Demultiplexing<br/><i>conditional</i>]
1719
end
1820
@@ -64,6 +66,14 @@ flowchart TB
6466
P[modkit_extract_full<br/>Full export]
6567
end
6668
69+
subgraph OddsRatios[aatrnaseq-odds-ratios.smk]
70+
Q[compute_odds_ratios<br/>Pairwise OR]
71+
end
72+
73+
subgraph Report[aatrnaseq-report.smk]
74+
R[render_combined_qc_report<br/>QC report]
75+
end
76+
6777
A --> B --> C --> D --> E --> F --> G --> G2
6878
6979
G2 --> H --> I
@@ -74,6 +84,12 @@ flowchart TB
7484
G2 --> N
7585
G2 --> O
7686
G2 --> P
87+
O --> Q
88+
H --> Q
89+
J --> R
90+
H --> R
91+
I --> R
92+
K --> R
7793
```
7894

7995
### With Demultiplexing (WarpDemuX)
@@ -111,7 +127,7 @@ Core data processing from raw signal to classified reads:
111127
| `ubam_to_fastq` | Extract reads for alignment | No |
112128
| `bwa_idx` | Build BWA index | No |
113129
| `bwa_align` | Align reads to reference | No |
114-
| `classify_charging` | ML charging classification | Yes |
130+
| `classify_charging` | ML charging classification | No |
115131
| `transfer_bam_tags` | Rename ML→CL tags | No |
116132
| `add_adapter_tags` | Add PT tags for adapter positions | No |
117133

@@ -130,6 +146,7 @@ Generate QC metrics and statistics:
130146

131147
| Rule | Purpose |
132148
|------|---------|
149+
| `compute_reference_similarity` | Pairwise reference sequence similarity matrix |
133150
| `base_calling_error` | Per-position error frequencies |
134151
| `align_stats` | Read counts through pipeline |
135152
| `remora_signal_stats` | Raw signal metrics |
@@ -145,6 +162,22 @@ RNA modification calling with Modkit:
145162
| `modkit_extract_calls` | Per-read modification calls |
146163
| `modkit_extract_full` | Comprehensive modification export |
147164

165+
### Odds Ratio Rules
166+
167+
Per-tRNA pairwise modification odds ratios:
168+
169+
| Rule | Purpose |
170+
|------|---------|
171+
| `compute_odds_ratios` | Pairwise modification odds ratios per tRNA |
172+
173+
### Report Rules
174+
175+
QC report generation:
176+
177+
| Rule | Purpose |
178+
|------|---------|
179+
| `render_combined_qc_report` | Combined Quarto QC report with per-sample tabs |
180+
148181
### Demultiplexing Rules
149182

150183
Optional WarpDemuX barcode demultiplexing:
@@ -181,8 +214,7 @@ Dorado re-basecalls with:
181214
BWA MEM with RNA-optimized parameters:
182215

183216
- `-x ont2d` preset for ONT reads
184-
- Position filtering (read start ≤ 25)
185-
- Unmapped read removal
217+
- `-F 20`: Unmapped and reverse-strand read removal
186218

187219
### 4. Charging Classification
188220

@@ -224,14 +256,14 @@ These rules require GPU access:
224256
| Rule | Typical Runtime | Memory |
225257
|------|-----------------|--------|
226258
| `rebasecall` | 30-60 min/sample | 24 GB |
227-
| `classify_charging` | 10-30 min/sample | 24 GB |
228259

229260
### CPU-Intensive Rules
230261

231262
| Rule | Threads | Memory |
232263
|------|---------|--------|
233264
| `merge_pods` | 12 | 16 GB |
234265
| `bwa_align` | 12 | 24 GB |
266+
| `classify_charging` | 8 | 24 GB |
235267
| `modkit_extract_full` | 12 | 48 GB |
236268

237269
### Memory-Intensive Rules

0 commit comments

Comments
 (0)