docs: update documentation for v1.3.1 release

jayhesselberth · claude · jayhesselberth · commit 60a9784731b8 · 2026-01-16T10:26:26.000-07:00
- Update dorado version from 0.9.1 to 1.3.1 in setup scripts
- Fix setup command from `pixi run setup-tools` to `pixi run setup`
- Update output paths to nested sample structure ({sample}/{sample}.bam)
- Document add_adapter_tags rule and PT tags throughout
- Fix SLURM profile path in first-analysis guide
- Update installation docs to reflect modkit via pixi, remora via uv
- Add adapter position tagging to workflow diagrams and rule reference

Co-Authored-By: Claude Opus 4.5 &lt;noreply@anthropic.com&gt;
diff --git a/CLAUDE.md b/CLAUDE.md
@@ -94,7 +94,7 @@ workflow/
 
 ```
 POD5 files → merge_pods → rebasecall (Dorado) → ubam_to_fastq → bwa_align →
-classify_charging (Remora) → transfer_bam_tags → Summary tables
+classify_charging (Remora) → transfer_bam_tags → add_adapter_tags → Summary tables
 ```
 
 ### Core Processing Pipeline (aatrnaseq-process.smk)
@@ -104,7 +104,8 @@ classify_charging (Remora) → transfer_bam_tags → Summary tables
 3. **ubam_to_fastq**: Extract reads from unmapped BAM to FASTQ
 4. **bwa_align**: Align reads to tRNA + adapter reference with BWA MEM
 5. **classify_charging**: Use Remora model to classify charged vs uncharged reads (adds ML tag to BAM)
-6. **transfer_bam_tags**: Transfer alignment tags back to classified BAM
+6. **transfer_bam_tags**: Transfer alignment tags back to classified BAM (ML→CL, MM→CM)
+7. **add_adapter_tags**: Detect adapter positions and add PT tags with 5'/3' boundaries
 
 ### Summary Generation
 
@@ -263,4 +264,4 @@ Outputs go to directory specified by `output_dir` in config. Test outputs: `.tes
 Key outputs per sample:
 - `summary/tables/{sample}/{sample}.charging.cpm.tsv.gz` - CPM-normalized charging counts
 - `summary/tables/{sample}/{sample}.charging_prob.tsv.gz` - Per-read charging probabilities
-- `bam/final/{sample}/{sample}.bam` - Final BAM with CL/CM charging tags
+- `bam/final/{sample}/{sample}.bam` - Final BAM with CL/CM (charging) and PT (adapter positions) tags
diff --git a/README.md b/README.md
@@ -28,9 +28,9 @@ cd aa-tRNA-seq-pipeline
 # Install environment
 pixi install
 
-# Download test data and setup tools (first time only)
+# One-time setup: download tools, models, and test data
+pixi run setup
 pixi run dl-test-data
-pixi run setup-tools
 
 # Dry run
 pixi run dry-run
diff --git a/docs/getting-started/first-analysis.md b/docs/getting-started/first-analysis.md
@@ -103,7 +103,7 @@ For production runs, use cluster execution:
 === "SLURM"
 
     ```bash
-    pixi run snakemake --profile cluster/generic --configfile=config/config-myproject.yml
+    pixi run snakemake --profile cluster/slurm --configfile=config/config-myproject.yml
     ```
 
 See [Cluster Setup](../cluster/lsf-setup.md) for detailed cluster configuration.
@@ -175,7 +175,7 @@ ls -la results/myproject/summary/tables/
 
 3. **Final BAM** - Verify charging tags:
    ```bash
-   samtools view results/myproject/bam/final/sample1.bam | head -1
+   samtools view results/myproject/bam/final/sample1/sample1.bam | head -1
    ```
 
 ## Troubleshooting
diff --git a/docs/getting-started/installation.md b/docs/getting-started/installation.md
@@ -67,28 +67,20 @@ This creates a `.pixi` directory with all required packages including:
 
 ## Install External Tools
 
-The pipeline requires Dorado (ONT basecaller) and Modkit (modification toolkit). Install them with:
+The pipeline requires several external tools. Install them with a single command:
 
 ```bash
-pixi run setup-tools
+pixi run setup
 ```
 
 This downloads and installs:
 
-- **Dorado** v0.9.1 - Oxford Nanopore basecaller
-- **Modkit** v0.4.3 - Modification calling toolkit
+- **Dorado** v1.3.1 - Oxford Nanopore basecaller
+- **Dorado model** - `rna004_130bps_sup@v5.1.0` basecalling model
+- **Remora** - ONT signal analysis for charging classification
+- **WarpDemuX** - Barcode demultiplexing (optional, for multiplexed samples)
 
-Tools are installed to `resources/tools/` and automatically added to PATH when running the pipeline.
-
-## Download Basecalling Model
-
-Download the Dorado basecalling model:
-
-```bash
-pixi run snakemake dorado_model --cores 1
-```
-
-This downloads `rna004_130bps_sup@v5.1.0` to `resources/models/`.
+Dorado and models are installed to `resources/tools/` and `resources/models/`. Modkit is managed by pixi (installed via conda).
 
 ## Download Test Data (Optional)
 
@@ -109,10 +101,10 @@ Verify everything is installed correctly:
 pixi run snakemake --version
 
 # Check Dorado installation
-resources/tools/dorado/0.9.1/bin/dorado --version
+resources/tools/dorado/1.3.1/bin/dorado --version
 
-# Check Modkit installation
-resources/tools/modkit/0.4.3/bin/modkit --version
+# Check Modkit installation (managed by pixi)
+pixi run modkit --version
 
 # Dry run with test config
 pixi run dry-run
@@ -122,11 +114,11 @@ pixi run dry-run
 
 ```
 aa-tRNA-seq-pipeline/
-├── .pixi/                    # Pixi environment
+├── .pixi/                    # Pixi environment (includes modkit, remora)
 ├── resources/
 │   ├── tools/
-│   │   ├── dorado/0.9.1/    # Dorado binaries
-│   │   └── modkit/0.4.3/    # Modkit binaries
+│   │   ├── dorado/1.3.1/    # Dorado binaries
+│   │   └── WarpDemuX/       # WarpDemuX (if demux enabled)
 │   ├── models/
 │   │   ├── rna004_130bps_sup@v5.1.0/  # Basecalling model
 │   │   └── cca_classifier.pt          # Remora charging model
@@ -150,7 +142,7 @@ pixi install  # Update dependencies if pixi.lock changed
 To update external tools, modify the version in `config/config-base.yml` and rerun:
 
 ```bash
-pixi run setup-tools
+pixi run setup
 ```
 
 ## Troubleshooting
@@ -171,17 +163,13 @@ If Dorado fails to detect GPU:
 2. Verify CUDA_VISIBLE_DEVICES is set correctly
 3. Ensure GPU drivers are up to date
 
-### Modkit Build Fails
+### Remora Installation Issues
 
-Modkit is built from source and requires Rust. If installation fails:
+If Remora fails to install with CUDA/PyTorch errors:
 
 ```bash
-# Install Rust manually
-curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
-source ~/.cargo/env
-
-# Retry setup
-pixi run setup-tools
+# Manually specify CUDA version
+CUDA_VERSION=cu121 pixi run setup
 ```
 
 ## Next Steps
diff --git a/docs/getting-started/quickstart.md b/docs/getting-started/quickstart.md
@@ -8,7 +8,7 @@ Complete the [Installation](installation.md) guide first:
 
 ```bash
 pixi install
-pixi run setup-tools
+pixi run setup
 pixi run dl-test-data
 ```
 
@@ -59,14 +59,20 @@ After completion, outputs are in `.tests/outputs/`:
 ```
 .tests/outputs/
 ├── pod5/
-│   └── sample1/sample1.pod5          # Merged POD5
+│   └── sample1/
+│       └── sample1.pod5              # Merged POD5
 ├── bam/
-│   ├── rebasecall/sample1/           # Basecalled BAM
-│   ├── aln/sample1/                  # Aligned BAM
-│   ├── charging/sample1.charging.bam # Remora classification
-│   └── final/sample1.bam             # Final BAM with CL/CM tags
+│   ├── rebasecall/sample1/
+│   │   └── sample1.rbc.bam           # Basecalled BAM
+│   ├── aln/sample1/
+│   │   └── sample1.aln.bam           # Aligned BAM
+│   ├── charging/sample1/
+│   │   └── sample1.charging.bam      # Remora classification
+│   └── final/sample1/
+│       └── sample1.bam               # Final BAM with CL/CM/PT tags
 ├── fq/
-│   └── sample1.fq.gz                 # Extracted FASTQ
+│   └── sample1/
+│       └── sample1.fq.gz             # Extracted FASTQ
 └── summary/
     ├── tables/sample1/
     │   ├── sample1.charging_prob.tsv.gz  # Per-read charging
@@ -119,11 +125,12 @@ Output columns:
 The final BAM contains charging classification in tags:
 
 ```bash
-samtools view .tests/outputs/bam/final/sample1.bam | head -1 | tr '\t' '\n' | grep -E "^(CL|CM):"
+samtools view .tests/outputs/bam/final/sample1/sample1.bam | head -1 | tr '\t' '\n' | grep -E "^(CL|CM|PT):"
 ```
 
 - `CL:B:C` - Charging likelihood (ML tag renamed to avoid conflict)
 - `CM:Z` - Charging model metadata (MM tag renamed)
+- `PT:Z` - Adapter positions (5' and 3' adapter boundaries)
 
 ## Run on Cluster
 
diff --git a/docs/index.md b/docs/index.md
@@ -69,8 +69,8 @@ cd aa-tRNA-seq-pipeline
 # Install environment
 pixi install
 
-# Download tools and test data
-pixi run setup-tools
+# One-time setup: download tools, models, and test data
+pixi run setup
 pixi run dl-test-data
 
 # Run test pipeline
@@ -124,7 +124,7 @@ The pipeline produces several key output files per sample:
 
 | Output | Description |
 |--------|-------------|
-| `bam/final/{sample}.bam` | Final BAM with charging tags (CL/CM) |
+| `bam/final/{sample}/{sample}.bam` | Final BAM with charging tags (CL/CM/PT) |
 | `summary/tables/{sample}/{sample}.charging.cpm.tsv.gz` | CPM-normalized charging counts per tRNA |
 | `summary/tables/{sample}/{sample}.charging_prob.tsv.gz` | Per-read charging probabilities |
 | `summary/modkit/{sample}/{sample}.pileup.bed.gz` | Modification pileup consensus |
diff --git a/docs/user-guide/outputs.md b/docs/user-guide/outputs.md
@@ -25,12 +25,12 @@ flowchart TB
     end
 
     subgraph Processing
-        B[pod5/{sample}.pod5<br/>Merged POD5]
-        C[bam/rebasecall/{sample}.rbc.bam<br/>Basecalled]
-        D[fq/{sample}.fq.gz<br/>FASTQ]
-        E[bam/aln/{sample}.aln.bam<br/>Aligned]
-        F[bam/charging/{sample}.charging.bam<br/>Classified]
-        G[bam/final/{sample}.bam<br/>Final BAM]
+        B[pod5/{sample}/{sample}.pod5<br/>Merged POD5]
+        C[bam/rebasecall/{sample}/{sample}.rbc.bam<br/>Basecalled]
+        D[fq/{sample}/{sample}.fq.gz<br/>FASTQ]
+        E[bam/aln/{sample}/{sample}.aln.bam<br/>Aligned]
+        F[bam/charging/{sample}/{sample}.charging.bam<br/>Classified]
+        G[bam/final/{sample}/{sample}.bam<br/>Final BAM]
     end
 
     subgraph Outputs
@@ -47,21 +47,22 @@ flowchart TB
 
 ### Final BAM
 
-`bam/final/{sample}.bam`
+`bam/final/{sample}/{sample}.bam`
 
-The final BAM file with charging classification tags.
+The final BAM file with charging classification and adapter position tags.
 
 **Tags:**
 
 | Tag | Type | Description |
 |-----|------|-------------|
 | `CL` | `B:C` | Charging likelihood (0-255 scale) |
 | `CM` | `Z` | Charging model metadata |
+| `PT` | `Z` | Adapter positions (5' and 3' boundaries) |
 
 **View tags:**
 
 ```bash
-samtools view results/bam/final/sample1.bam | head -1 | tr '\t' '\n' | grep -E "^(CL|CM):"
+samtools view results/bam/final/sample1/sample1.bam | head -1 | tr '\t' '\n' | grep -E "^(CL|CM|PT):"
 ```
 
 !!! note "Tag Renaming"
@@ -231,13 +232,13 @@ BWA MEM alignment output.
 
 ### Charging BAM
 
-`bam/charging/{sample}.charging.bam`
+`bam/charging/{sample}/{sample}.charging.bam`
 
 Remora classification output with ML/MM tags (before renaming).
 
 ### FASTQ
 
-`fq/{sample}.fq.gz`
+`fq/{sample}/{sample}.fq.gz`
 
 Extracted reads for alignment.
 
diff --git a/docs/workflow/overview.md b/docs/workflow/overview.md
@@ -43,6 +43,7 @@ flowchart TB
         E[bwa_align<br/>Align to reference]
         F[classify_charging<br/>Remora ML]
         G[transfer_bam_tags<br/>Rename tags]
+        G2[add_adapter_tags<br/>PT tags]
     end
 
     subgraph Charging[aatrnaseq-charging.smk]
@@ -63,16 +64,16 @@ flowchart TB
         P[modkit_extract_full<br/>Full export]
     end
 
-    A --> B --> C --> D --> E --> F --> G
+    A --> B --> C --> D --> E --> F --> G --> G2
 
-    G --> H --> I
-    G --> J
-    G --> K
-    G --> L
-    G --> M
-    G --> N
-    G --> O
-    G --> P
+    G2 --> H --> I
+    G2 --> J
+    G2 --> K
+    G2 --> L
+    G2 --> M
+    G2 --> N
+    G2 --> O
+    G2 --> P
 ```
 
 ### With Demultiplexing (WarpDemuX)
@@ -113,6 +114,7 @@ Core data processing from raw signal to classified reads:
 | `bwa_align` | Align reads to reference | No |
 | `classify_charging` | ML charging classification | Yes |
 | `transfer_bam_tags` | Rename ML→CL tags | No |
+| `add_adapter_tags` | Add PT tags for adapter positions | No |
 
 ### Charging Analysis Rules
 
@@ -207,6 +209,14 @@ Original Remora tags are renamed to avoid conflicts:
 - `ML` → `CL` (charging likelihood)
 - `MM` → `CM` (charging metadata)
 
+### 6. Adapter Position Tagging
+
+The `add_adapter_tags` rule adds PT tags with adapter boundaries:
+
+- Uses parasail Smith-Waterman alignment
+- Detects 5' and 3' adapter positions
+- Can infer 5' adapter from alignment position when truncated
+
 ## Resource Requirements
 
 ### GPU Rules
diff --git a/docs/workflow/rules-reference.md b/docs/workflow/rules-reference.md
diff --git a/scripts/setup-env.sh b/scripts/setup-env.sh
diff --git a/scripts/setup-tools.sh b/scripts/setup-tools.sh