Skip to content

Generating Sprinzl coordinates

Jay Hesselberth edited this page Feb 25, 2026 · 1 revision

The data-raw/sprinzl.R script assigns Sprinzl coordinates to tRNA sequences using the tRNAscan-SE bacterial covariance model and Infernal's cmalign. This page documents how to generate Sprinzl coordinate files for new organisms.

Prerequisites

  • Infernal (cmalign on PATH)
  • The bacterial CM file at data-raw/structures/TRNAinf-bact.cm (bundled in the repo)
  • A FASTA file of mature tRNA sequences (DNA or RNA, no introns, no adapters)

Quick start

The simplest approach is to add your organism to data-raw/sprinzl.R and re-run:

pixi run Rscript data-raw/sprinzl.R

This regenerates all Sprinzl coordinate files in inst/extdata/sprinzl/.

Step-by-step: adding a new organism

1. Prepare a FASTA file

You need mature tRNA sequences in FASTA format. Sources include:

  • From the clover pipeline: Use the trna_only.fa.gz output, which contains tRNA body sequences without adapters.
  • From GtRNAdb: Download mature tRNA sequences for your organism.
  • From MODOMICS: The script can extract plain sequences from cached MODOMICS data (stripping modification codes). See the extract_modomics_fasta() function in data-raw/sprinzl.R.

tRNA names should follow the convention {prefix}-tRNA-{AA}-{anticodon} (e.g., phage-tRNA-Pro-TGG or host-tRNA-Glu-TTC-1-1). The host- and phage- prefixes are stripped automatically.

2. Add to the script

Add a new section at the bottom of data-raw/sprinzl.R:

cli::cli_h1("My organism")

my_fasta <- "path/to/my_tRNAs.fa"
my_coords <- generate_sprinzl_coords(my_fasta, cm_file)
fname <- "myOrganism_global_coords.tsv.gz"
readr::write_tsv(my_coords, file.path(out_dir, fname))
cli::cli_inform(
  "Saved {nrow(my_coords)} position{?s} to {fname}."
)

3. Run the script

pixi run Rscript data-raw/sprinzl.R

4. Verify the output

devtools::load_all()
coords <- read_sprinzl_coords(
  clover_example("sprinzl/myOrganism_global_coords.tsv.gz")
)

# Check anticodon positions (should be at Sprinzl 34-36)
coords[coords$sprinzl_label %in% c("34", "35", "36"), ]

# Check CCA tail (should be at Sprinzl 74-76)
coords[coords$sprinzl_label %in% c("74", "75", "76"), ]

# Check for NA labels (structural anomalies)
coords[is.na(coords$sprinzl_label), ]

How it works

The script:

  1. Converts input DNA sequences to RNA
  2. Runs cmalign --notrunc against the bacterial tRNAscan-SE CM
  3. Parses the Stockholm alignment output
  4. Maps each CM consensus column to a Sprinzl position using a fixed 93-column table
  5. Assigns insertion labels for D-loop positions (17a, 20a, 20b) and variable region (47)
  6. For type II tRNAs (Leu, Ser with long variable arms), assigns e-positions in the variable stem

Output format

The output TSV has these columns (compatible with read_sprinzl_coords()):

Column Description
trna_id tRNA identifier with RNA anticodon (e.g., tRNA-Pro-UGG)
seq_index 1-based position in the tRNA sequence
sprinzl_label Canonical Sprinzl position (e.g., "1", "20a", "e14")
global_index Alignment column position (cross-tRNA coordinate)
region Structural region (acceptor-stem, D-stem, D-loop, etc.)
residue Reference nucleotide (DNA: A, C, G, T)

Limitations

  • Only supports the bacterial CM (TRNAinf-bact.cm). For eukaryotic tRNAs, use the existing tRNAs-in-space workflow or adapt the script to use TRNAinf-euk.cm (90 consensus columns with a different mapping table).
  • Positions at unusual insert locations (e.g., extra bases in the D-stem or acceptor stem) receive NA labels.
  • The variable arm e-position numbering follows the convention where e14:e24 is always the innermost pair.