-
Notifications
You must be signed in to change notification settings - Fork 0
Generating Sprinzl coordinates
The data-raw/sprinzl.R script assigns Sprinzl coordinates to tRNA sequences using the tRNAscan-SE bacterial covariance model and Infernal's cmalign. This page documents how to generate Sprinzl coordinate files for new organisms.
-
Infernal (
cmalignon PATH) - The bacterial CM file at
data-raw/structures/TRNAinf-bact.cm(bundled in the repo) - A FASTA file of mature tRNA sequences (DNA or RNA, no introns, no adapters)
The simplest approach is to add your organism to data-raw/sprinzl.R and re-run:
pixi run Rscript data-raw/sprinzl.RThis regenerates all Sprinzl coordinate files in inst/extdata/sprinzl/.
You need mature tRNA sequences in FASTA format. Sources include:
-
From the clover pipeline: Use the
trna_only.fa.gzoutput, which contains tRNA body sequences without adapters. - From GtRNAdb: Download mature tRNA sequences for your organism.
-
From MODOMICS: The script can extract plain sequences from cached MODOMICS data (stripping modification codes). See the
extract_modomics_fasta()function indata-raw/sprinzl.R.
tRNA names should follow the convention {prefix}-tRNA-{AA}-{anticodon} (e.g., phage-tRNA-Pro-TGG or host-tRNA-Glu-TTC-1-1). The host- and phage- prefixes are stripped automatically.
Add a new section at the bottom of data-raw/sprinzl.R:
cli::cli_h1("My organism")
my_fasta <- "path/to/my_tRNAs.fa"
my_coords <- generate_sprinzl_coords(my_fasta, cm_file)
fname <- "myOrganism_global_coords.tsv.gz"
readr::write_tsv(my_coords, file.path(out_dir, fname))
cli::cli_inform(
"Saved {nrow(my_coords)} position{?s} to {fname}."
)pixi run Rscript data-raw/sprinzl.Rdevtools::load_all()
coords <- read_sprinzl_coords(
clover_example("sprinzl/myOrganism_global_coords.tsv.gz")
)
# Check anticodon positions (should be at Sprinzl 34-36)
coords[coords$sprinzl_label %in% c("34", "35", "36"), ]
# Check CCA tail (should be at Sprinzl 74-76)
coords[coords$sprinzl_label %in% c("74", "75", "76"), ]
# Check for NA labels (structural anomalies)
coords[is.na(coords$sprinzl_label), ]The script:
- Converts input DNA sequences to RNA
- Runs
cmalign --notruncagainst the bacterial tRNAscan-SE CM - Parses the Stockholm alignment output
- Maps each CM consensus column to a Sprinzl position using a fixed 93-column table
- Assigns insertion labels for D-loop positions (17a, 20a, 20b) and variable region (47)
- For type II tRNAs (Leu, Ser with long variable arms), assigns e-positions in the variable stem
The output TSV has these columns (compatible with read_sprinzl_coords()):
| Column | Description |
|---|---|
trna_id |
tRNA identifier with RNA anticodon (e.g., tRNA-Pro-UGG) |
seq_index |
1-based position in the tRNA sequence |
sprinzl_label |
Canonical Sprinzl position (e.g., "1", "20a", "e14") |
global_index |
Alignment column position (cross-tRNA coordinate) |
region |
Structural region (acceptor-stem, D-stem, D-loop, etc.) |
residue |
Reference nucleotide (DNA: A, C, G, T) |
- Only supports the bacterial CM (
TRNAinf-bact.cm). For eukaryotic tRNAs, use the existing tRNAs-in-space workflow or adapt the script to useTRNAinf-euk.cm(90 consensus columns with a different mapping table). - Positions at unusual insert locations (e.g., extra bases in the D-stem or acceptor stem) receive
NAlabels. - The variable arm e-position numbering follows the convention where e14:e24 is always the innermost pair.