Skip to content

Commit 60a9784

Browse files
docs: update documentation for v1.3.1 release
- Update dorado version from 0.9.1 to 1.3.1 in setup scripts - Fix setup command from `pixi run setup-tools` to `pixi run setup` - Update output paths to nested sample structure ({sample}/{sample}.bam) - Document add_adapter_tags rule and PT tags throughout - Fix SLURM profile path in first-analysis guide - Update installation docs to reflect modkit via pixi, remora via uv - Add adapter position tagging to workflow diagrams and rule reference Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
1 parent 144e941 commit 60a9784

File tree

11 files changed

+125
-79
lines changed

11 files changed

+125
-79
lines changed

CLAUDE.md

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -94,7 +94,7 @@ workflow/
9494

9595
```
9696
POD5 files → merge_pods → rebasecall (Dorado) → ubam_to_fastq → bwa_align →
97-
classify_charging (Remora) → transfer_bam_tags → Summary tables
97+
classify_charging (Remora) → transfer_bam_tags → add_adapter_tags → Summary tables
9898
```
9999

100100
### Core Processing Pipeline (aatrnaseq-process.smk)
@@ -104,7 +104,8 @@ classify_charging (Remora) → transfer_bam_tags → Summary tables
104104
3. **ubam_to_fastq**: Extract reads from unmapped BAM to FASTQ
105105
4. **bwa_align**: Align reads to tRNA + adapter reference with BWA MEM
106106
5. **classify_charging**: Use Remora model to classify charged vs uncharged reads (adds ML tag to BAM)
107-
6. **transfer_bam_tags**: Transfer alignment tags back to classified BAM
107+
6. **transfer_bam_tags**: Transfer alignment tags back to classified BAM (ML→CL, MM→CM)
108+
7. **add_adapter_tags**: Detect adapter positions and add PT tags with 5'/3' boundaries
108109

109110
### Summary Generation
110111

@@ -263,4 +264,4 @@ Outputs go to directory specified by `output_dir` in config. Test outputs: `.tes
263264
Key outputs per sample:
264265
- `summary/tables/{sample}/{sample}.charging.cpm.tsv.gz` - CPM-normalized charging counts
265266
- `summary/tables/{sample}/{sample}.charging_prob.tsv.gz` - Per-read charging probabilities
266-
- `bam/final/{sample}/{sample}.bam` - Final BAM with CL/CM charging tags
267+
- `bam/final/{sample}/{sample}.bam` - Final BAM with CL/CM (charging) and PT (adapter positions) tags

README.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -28,9 +28,9 @@ cd aa-tRNA-seq-pipeline
2828
# Install environment
2929
pixi install
3030

31-
# Download test data and setup tools (first time only)
31+
# One-time setup: download tools, models, and test data
32+
pixi run setup
3233
pixi run dl-test-data
33-
pixi run setup-tools
3434

3535
# Dry run
3636
pixi run dry-run

docs/getting-started/first-analysis.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -103,7 +103,7 @@ For production runs, use cluster execution:
103103
=== "SLURM"
104104

105105
```bash
106-
pixi run snakemake --profile cluster/generic --configfile=config/config-myproject.yml
106+
pixi run snakemake --profile cluster/slurm --configfile=config/config-myproject.yml
107107
```
108108

109109
See [Cluster Setup](../cluster/lsf-setup.md) for detailed cluster configuration.
@@ -175,7 +175,7 @@ ls -la results/myproject/summary/tables/
175175

176176
3. **Final BAM** - Verify charging tags:
177177
```bash
178-
samtools view results/myproject/bam/final/sample1.bam | head -1
178+
samtools view results/myproject/bam/final/sample1/sample1.bam | head -1
179179
```
180180

181181
## Troubleshooting

docs/getting-started/installation.md

Lines changed: 18 additions & 30 deletions
Original file line numberDiff line numberDiff line change
@@ -67,28 +67,20 @@ This creates a `.pixi` directory with all required packages including:
6767

6868
## Install External Tools
6969

70-
The pipeline requires Dorado (ONT basecaller) and Modkit (modification toolkit). Install them with:
70+
The pipeline requires several external tools. Install them with a single command:
7171

7272
```bash
73-
pixi run setup-tools
73+
pixi run setup
7474
```
7575

7676
This downloads and installs:
7777

78-
- **Dorado** v0.9.1 - Oxford Nanopore basecaller
79-
- **Modkit** v0.4.3 - Modification calling toolkit
78+
- **Dorado** v1.3.1 - Oxford Nanopore basecaller
79+
- **Dorado model** - `rna004_130bps_sup@v5.1.0` basecalling model
80+
- **Remora** - ONT signal analysis for charging classification
81+
- **WarpDemuX** - Barcode demultiplexing (optional, for multiplexed samples)
8082

81-
Tools are installed to `resources/tools/` and automatically added to PATH when running the pipeline.
82-
83-
## Download Basecalling Model
84-
85-
Download the Dorado basecalling model:
86-
87-
```bash
88-
pixi run snakemake dorado_model --cores 1
89-
```
90-
91-
This downloads `rna004_130bps_sup@v5.1.0` to `resources/models/`.
83+
Dorado and models are installed to `resources/tools/` and `resources/models/`. Modkit is managed by pixi (installed via conda).
9284

9385
## Download Test Data (Optional)
9486

@@ -109,10 +101,10 @@ Verify everything is installed correctly:
109101
pixi run snakemake --version
110102

111103
# Check Dorado installation
112-
resources/tools/dorado/0.9.1/bin/dorado --version
104+
resources/tools/dorado/1.3.1/bin/dorado --version
113105

114-
# Check Modkit installation
115-
resources/tools/modkit/0.4.3/bin/modkit --version
106+
# Check Modkit installation (managed by pixi)
107+
pixi run modkit --version
116108

117109
# Dry run with test config
118110
pixi run dry-run
@@ -122,11 +114,11 @@ pixi run dry-run
122114

123115
```
124116
aa-tRNA-seq-pipeline/
125-
├── .pixi/ # Pixi environment
117+
├── .pixi/ # Pixi environment (includes modkit, remora)
126118
├── resources/
127119
│ ├── tools/
128-
│ │ ├── dorado/0.9.1/ # Dorado binaries
129-
│ │ └── modkit/0.4.3/ # Modkit binaries
120+
│ │ ├── dorado/1.3.1/ # Dorado binaries
121+
│ │ └── WarpDemuX/ # WarpDemuX (if demux enabled)
130122
│ ├── models/
131123
│ │ ├── rna004_130bps_sup@v5.1.0/ # Basecalling model
132124
│ │ └── cca_classifier.pt # Remora charging model
@@ -150,7 +142,7 @@ pixi install # Update dependencies if pixi.lock changed
150142
To update external tools, modify the version in `config/config-base.yml` and rerun:
151143

152144
```bash
153-
pixi run setup-tools
145+
pixi run setup
154146
```
155147

156148
## Troubleshooting
@@ -171,17 +163,13 @@ If Dorado fails to detect GPU:
171163
2. Verify CUDA_VISIBLE_DEVICES is set correctly
172164
3. Ensure GPU drivers are up to date
173165

174-
### Modkit Build Fails
166+
### Remora Installation Issues
175167

176-
Modkit is built from source and requires Rust. If installation fails:
168+
If Remora fails to install with CUDA/PyTorch errors:
177169

178170
```bash
179-
# Install Rust manually
180-
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
181-
source ~/.cargo/env
182-
183-
# Retry setup
184-
pixi run setup-tools
171+
# Manually specify CUDA version
172+
CUDA_VERSION=cu121 pixi run setup
185173
```
186174

187175
## Next Steps

docs/getting-started/quickstart.md

Lines changed: 15 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ Complete the [Installation](installation.md) guide first:
88

99
```bash
1010
pixi install
11-
pixi run setup-tools
11+
pixi run setup
1212
pixi run dl-test-data
1313
```
1414

@@ -59,14 +59,20 @@ After completion, outputs are in `.tests/outputs/`:
5959
```
6060
.tests/outputs/
6161
├── pod5/
62-
│ └── sample1/sample1.pod5 # Merged POD5
62+
│ └── sample1/
63+
│ └── sample1.pod5 # Merged POD5
6364
├── bam/
64-
│ ├── rebasecall/sample1/ # Basecalled BAM
65-
│ ├── aln/sample1/ # Aligned BAM
66-
│ ├── charging/sample1.charging.bam # Remora classification
67-
│ └── final/sample1.bam # Final BAM with CL/CM tags
65+
│ ├── rebasecall/sample1/
66+
│ │ └── sample1.rbc.bam # Basecalled BAM
67+
│ ├── aln/sample1/
68+
│ │ └── sample1.aln.bam # Aligned BAM
69+
│ ├── charging/sample1/
70+
│ │ └── sample1.charging.bam # Remora classification
71+
│ └── final/sample1/
72+
│ └── sample1.bam # Final BAM with CL/CM/PT tags
6873
├── fq/
69-
│ └── sample1.fq.gz # Extracted FASTQ
74+
│ └── sample1/
75+
│ └── sample1.fq.gz # Extracted FASTQ
7076
└── summary/
7177
├── tables/sample1/
7278
│ ├── sample1.charging_prob.tsv.gz # Per-read charging
@@ -119,11 +125,12 @@ Output columns:
119125
The final BAM contains charging classification in tags:
120126

121127
```bash
122-
samtools view .tests/outputs/bam/final/sample1.bam | head -1 | tr '\t' '\n' | grep -E "^(CL|CM):"
128+
samtools view .tests/outputs/bam/final/sample1/sample1.bam | head -1 | tr '\t' '\n' | grep -E "^(CL|CM|PT):"
123129
```
124130

125131
- `CL:B:C` - Charging likelihood (ML tag renamed to avoid conflict)
126132
- `CM:Z` - Charging model metadata (MM tag renamed)
133+
- `PT:Z` - Adapter positions (5' and 3' adapter boundaries)
127134

128135
## Run on Cluster
129136

docs/index.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -69,8 +69,8 @@ cd aa-tRNA-seq-pipeline
6969
# Install environment
7070
pixi install
7171

72-
# Download tools and test data
73-
pixi run setup-tools
72+
# One-time setup: download tools, models, and test data
73+
pixi run setup
7474
pixi run dl-test-data
7575

7676
# Run test pipeline
@@ -124,7 +124,7 @@ The pipeline produces several key output files per sample:
124124

125125
| Output | Description |
126126
|--------|-------------|
127-
| `bam/final/{sample}.bam` | Final BAM with charging tags (CL/CM) |
127+
| `bam/final/{sample}/{sample}.bam` | Final BAM with charging tags (CL/CM/PT) |
128128
| `summary/tables/{sample}/{sample}.charging.cpm.tsv.gz` | CPM-normalized charging counts per tRNA |
129129
| `summary/tables/{sample}/{sample}.charging_prob.tsv.gz` | Per-read charging probabilities |
130130
| `summary/modkit/{sample}/{sample}.pileup.bed.gz` | Modification pileup consensus |

docs/user-guide/outputs.md

Lines changed: 12 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -25,12 +25,12 @@ flowchart TB
2525
end
2626
2727
subgraph Processing
28-
B[pod5/{sample}.pod5<br/>Merged POD5]
29-
C[bam/rebasecall/{sample}.rbc.bam<br/>Basecalled]
30-
D[fq/{sample}.fq.gz<br/>FASTQ]
31-
E[bam/aln/{sample}.aln.bam<br/>Aligned]
32-
F[bam/charging/{sample}.charging.bam<br/>Classified]
33-
G[bam/final/{sample}.bam<br/>Final BAM]
28+
B[pod5/{sample}/{sample}.pod5<br/>Merged POD5]
29+
C[bam/rebasecall/{sample}/{sample}.rbc.bam<br/>Basecalled]
30+
D[fq/{sample}/{sample}.fq.gz<br/>FASTQ]
31+
E[bam/aln/{sample}/{sample}.aln.bam<br/>Aligned]
32+
F[bam/charging/{sample}/{sample}.charging.bam<br/>Classified]
33+
G[bam/final/{sample}/{sample}.bam<br/>Final BAM]
3434
end
3535
3636
subgraph Outputs
@@ -47,21 +47,22 @@ flowchart TB
4747

4848
### Final BAM
4949

50-
`bam/final/{sample}.bam`
50+
`bam/final/{sample}/{sample}.bam`
5151

52-
The final BAM file with charging classification tags.
52+
The final BAM file with charging classification and adapter position tags.
5353

5454
**Tags:**
5555

5656
| Tag | Type | Description |
5757
|-----|------|-------------|
5858
| `CL` | `B:C` | Charging likelihood (0-255 scale) |
5959
| `CM` | `Z` | Charging model metadata |
60+
| `PT` | `Z` | Adapter positions (5' and 3' boundaries) |
6061

6162
**View tags:**
6263

6364
```bash
64-
samtools view results/bam/final/sample1.bam | head -1 | tr '\t' '\n' | grep -E "^(CL|CM):"
65+
samtools view results/bam/final/sample1/sample1.bam | head -1 | tr '\t' '\n' | grep -E "^(CL|CM|PT):"
6566
```
6667

6768
!!! note "Tag Renaming"
@@ -231,13 +232,13 @@ BWA MEM alignment output.
231232

232233
### Charging BAM
233234

234-
`bam/charging/{sample}.charging.bam`
235+
`bam/charging/{sample}/{sample}.charging.bam`
235236

236237
Remora classification output with ML/MM tags (before renaming).
237238

238239
### FASTQ
239240

240-
`fq/{sample}.fq.gz`
241+
`fq/{sample}/{sample}.fq.gz`
241242

242243
Extracted reads for alignment.
243244

docs/workflow/overview.md

Lines changed: 19 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -43,6 +43,7 @@ flowchart TB
4343
E[bwa_align<br/>Align to reference]
4444
F[classify_charging<br/>Remora ML]
4545
G[transfer_bam_tags<br/>Rename tags]
46+
G2[add_adapter_tags<br/>PT tags]
4647
end
4748
4849
subgraph Charging[aatrnaseq-charging.smk]
@@ -63,16 +64,16 @@ flowchart TB
6364
P[modkit_extract_full<br/>Full export]
6465
end
6566
66-
A --> B --> C --> D --> E --> F --> G
67+
A --> B --> C --> D --> E --> F --> G --> G2
6768
68-
G --> H --> I
69-
G --> J
70-
G --> K
71-
G --> L
72-
G --> M
73-
G --> N
74-
G --> O
75-
G --> P
69+
G2 --> H --> I
70+
G2 --> J
71+
G2 --> K
72+
G2 --> L
73+
G2 --> M
74+
G2 --> N
75+
G2 --> O
76+
G2 --> P
7677
```
7778

7879
### With Demultiplexing (WarpDemuX)
@@ -113,6 +114,7 @@ Core data processing from raw signal to classified reads:
113114
| `bwa_align` | Align reads to reference | No |
114115
| `classify_charging` | ML charging classification | Yes |
115116
| `transfer_bam_tags` | Rename ML→CL tags | No |
117+
| `add_adapter_tags` | Add PT tags for adapter positions | No |
116118

117119
### Charging Analysis Rules
118120

@@ -207,6 +209,14 @@ Original Remora tags are renamed to avoid conflicts:
207209
- `ML``CL` (charging likelihood)
208210
- `MM``CM` (charging metadata)
209211

212+
### 6. Adapter Position Tagging
213+
214+
The `add_adapter_tags` rule adds PT tags with adapter boundaries:
215+
216+
- Uses parasail Smith-Waterman alignment
217+
- Detects 5' and 3' adapter positions
218+
- Can infer 5' adapter from alignment position when truncated
219+
210220
## Resource Requirements
211221

212222
### GPU Rules

0 commit comments

Comments
 (0)