Skip to content

lkwhite/tRNAs-in-space

Repository files navigation

🚀🍀 tRNAs in space 🍀🚀

Tests Build Python 3.9+ License: MIT Code style: black

A standardized approach to generate shared tRNA coordinates for plotting.

Quick Start

Using pre-computed coordinates:

import pandas as pd

# Load E. coli nuclear tRNAs
df = pd.read_csv('outputs/ecoliK12_global_coords.tsv', sep='\t')

# Create alignment matrix
alignment = df.pivot_table(
    index='trna_id',
    columns='global_index',
    values='residue'
)

Note: One unified coordinate file per organism for nuclear tRNAs. Mitochondrial tRNAs have separate files (see Mitochondrial tRNAs).

Installation:

# Clone the repository
git clone https://github.com/lkwhite/tRNAs-in-space.git
cd tRNAs-in-space

# Install as a package
pip install -e .

# Or install with visualization tools
pip install -e ".[viz]"

Generate coordinates from your own data:

# After running R2DT on your FASTA files
python scripts/trnas_in_space.py ./r2dt_output_dir/ my_output.tsv

# For mitochondrial tRNAs (separate coordinate system)
python scripts/trnas_in_space.py ./r2dt_output_dir/ my_mito_output.tsv --mito

See examples/01_basic_visualization.ipynb for detailed usage examples.

Modomics Modification Annotations

Pre-computed modification data from MODOMICS is included, mapped to the global coordinate system. This enables comparison of experimental modification detection against known reference modifications from mass spectrometry data.

Available species:

  • E. coli: 261 modification positions across 14 tRNAs (12 modification types)
  • S. cerevisiae: 131 modification positions across 10 tRNAs (18 modification types)
  • H. sapiens: 43 modification positions across 16 tRNAs (16 modification types)

Usage example:

import pandas as pd

# Load Modomics annotations
mods = pd.read_csv('outputs/modomics/modomics_to_sprinzl_mapping.tsv', sep='\t')

# Join with your global coordinates
coords = pd.read_csv('outputs/ecoliK12_global_coords.tsv', sep='\t')
annotated = coords.merge(
    mods[['gtRNAdb_trna_id', 'position_gtRNAdb', 'modification_short_name']],
    left_on=['trna_id', 'seq_index'],
    right_on=['gtRNAdb_trna_id', 'position_gtRNAdb'],
    how='left'
)
# Now 'annotated' includes modification_short_name for known modifications

See docs/archive/modomics-integration/MODOMICS_INTEGRATION.md for implementation details and alignment methodology.

Global Coordinate System

This project provides a standardized coordinate system for nuclear elongator tRNAs that enables comparative structural analysis across different tRNA families. The coordinate system supports both Type I (standard) and Type II (extended variable arm) tRNAs, allowing researchers to perform analyses that were not previously possible with individual tRNA studies.

Research Capabilities

Cross-tRNA Comparative Analysis:

  • Compare modification patterns across amino acid families
  • Analyze structural domain conservation (acceptor stem, anticodon loop, T-arm)
  • Study extended variable arm differences between Leu/Ser/Tyr and other tRNAs
  • Generate multi-tRNA heatmaps and statistical comparisons

Position-Specific Studies:

  • Map modification frequencies to standardized structural positions
  • Identify hotspots of evolutionary conservation or variation
  • Correlate structural features with experimental modification data

Type I vs Type II Analysis:

  • Compare standard tRNAs (76 nt) with extended variable arm tRNAs (~90 nt)
  • Study structural adaptations in Leucine, Serine, and Tyrosine tRNAs
  • Analyze how extended arms affect surrounding structural regions

System Scope

✅ Supported tRNA Types:

  • Nuclear elongator tRNAs: Standard cytoplasmic tRNAs used in protein synthesis
  • Type I: Alanine, Phenylalanine, Glycine, and most other amino acids (standard structure)
  • Type II: Leucine, Serine, Tyrosine (extended variable arms with e1-e24 positions)

🚫 Excluded from Nuclear Coordinates:

  • Selenocysteine tRNAs: Structurally incompatible (~95 nt with unique binding requirements)
  • Initiator methionine tRNAs: Modified structure for specialized ribosome binding

📦 Separate Coordinate System:

  • Mitochondrial tRNAs: Different architecture requires separate coordinates (use --mito flag)

Usage Guidelines

High-Confidence Analyses:

  • Cross-amino-acid modification comparisons
  • Structural domain analysis (acceptor, anticodon, T-regions)
  • Type I vs Type II extended variable arm studies

Moderate-Confidence Analyses:

  • Position-specific studies (validation recommended for critical positions)
  • Inter-species comparative analysis

Alternative Approaches Recommended:

  • Fine-grained analysis within single tRNA families (use individual tRNA coordinates)
  • Studies requiring selenocysteine or mitochondrial tRNAs (specialized analysis needed)

For detailed analysis guidelines, see ANALYSIS_GUIDELINES.md. For technical implementation details, see docs/archive/coordinate-fixes/COORDINATE_SYSTEM_SCOPE.md.

Coordinate File Organization

Each organism has one unified coordinate file for nuclear tRNAs that includes both Type I (standard) and Type II (extended variable arm) tRNAs. Mitochondrial tRNAs have separate coordinate files due to their different structural architecture.

Available Coordinate Files

Organism Nuclear File tRNAs Mito File Mito tRNAs
E. coli K12 ecoliK12_global_coords.tsv 82
S. cerevisiae sacCer_global_coords.tsv 267 sacCer_mito_global_coords.tsv 18
H. sapiens hg38_global_coords.tsv 416 hg38_mito_global_coords.tsv 22

tRNA Types in Unified Files

The unified coordinate files contain both structural types:

  • Type I (short variable loop): Ala, Arg, Asn, Asp, Cys, Gln, Glu, Gly, His, Ile, Lys, Met, Phe, Pro, Thr, Trp, Val
  • Type II (extended variable arm): Leu, Ser, Tyr

Both types share the same global_index coordinate space, with extended variable arm positions (e1-e24) assigned to dedicated indices between positions 45 and 48.

Mitochondrial tRNAs

Mitochondrial tRNAs have different structural architecture (60-75 nt, variable features) and require a separate coordinate system. Generate mito coordinates with the --mito flag:

python scripts/trnas_in_space.py outputs/hg38_jsons outputs/hg38_mito_global_coords.tsv --mito

Note on yeast mito tRNAs: R2DT lacks fungal mitochondrial-specific models, so some alignments may need verification. See Reinsch and Garcia 2025 (References) for curated yeast mito tRNA alignments.


This README documents how to go from tRNA reference sequences → coordinate files for plotting and cross‑isodecoder comparisons.

Pre-computed coordinate files are available in the outputs/ directory for E. coli K12, S. cerevisiae, and H. sapiens nuclear elongator tRNAs, plus mitochondrial tRNAs for yeast and human.

The documentation and code in this repository can be used to generate coordinate files for your own tRNA sequences.

The Problem

image

tRNA biologists have classically used the Sprinzl positions pictured above (named after M. Sprinzl, see References) instead of consecutive numbering within each isodecoder[1]. This system ensures that homologous structural features line up across different tRNAs. For instance, the anticodon is always assigned to positions 34-36 regardless of whether a particular tRNA sequence is longer or shorter.

This convention is biologically meaningful, but introduces problems for data integration:

  1. Non-contiguous across isodecoders: not every tRNA contains every Sprinzl position, so some positions are absent depending on sequence length or loop structure

  2. Unequal spacing: gaps in Sprinzl numbering create irregular axes, making it difficult to generate heatmaps or plots that assume equally spaced positions

  3. Non-integer labels like 17a, 20a, 20b and the e-notations in the variable loop further complicate use of Sprinzl as a common coordinate system

R2DT 2.0 partly addresses this by embedding Sprinzl numbering in its structural templates, allowing researchers to annotate secondary structure images using positional annotations relevant to tRNA biology. This is very useful for RNA structure visualization. But for downstream analysis, a more unified coordinate system is needed.

A Global Index

For tRNA sequencing (or other positionally anchored assays), it is often more useful to work in a global coordinate system:

  • Each nucleotide position is assigned a consecutive integer index (1,2,3...).

  • The index is consistent across all tRNAs, ensuring every position in a heatmap corresponds to an equal-spaced axis.

  • Missing Sprinzl positions can be interpolated or left blank without breaking the regular grid.

Here's an example from our own work where we attempted to align nuclear and mitochondrial tRNAs from budding yeast using Sprinzl coordinates. You'll note some positions appear as "missing" (gray), with the large grey region between Sprinzl positions 48 and 49 reflecting variable loop length, where none of the tRNAs displayed contains sequence covering the full set of variable loop positions. Screenshot 2025-08-20 at 5 13 38 PM

However, there are still a few issues with the heatmap above.

  • The distribution of "missing" positions in the variable loop don't line up correctly with their Sprinzl annotations, because the Sprinzl annotations along the X axis are actually only added during the plotting step

  • Not all tRNAs are pictured, because these structural alignments were generated from a non-comprehensive .afa file

Note: This repository now includes properly aligned Modomics modification data mapped to the global coordinate system (see outputs/modomics/), which resolves these alignment issues for downstream analysis

By introducing a global index, we eliminate spacing irregularities and enable cross-isodecoder comparison in a clean, standardized coordinate space.

Implementation

Goal: To convert heterogeneous Sprinzl-style labels from R2DT output into a unified coordinate system we need to:

  • Keep per‑nucleotide sequence order (5′→3′).

  • Preserve canonical Sprinzl labels (e.g., 20, 20A).

  • Fill unlabeled residues deterministically with fractional positions.

  • Generate a global_index (1..K) so all tRNAs plot on the same x‑axis; missing positions show as NA.

Inputs A FASTA file of mature tRNA sequences used for alignment/reference. For consistency, trim adapters out of these sequences if present.

Outputs:

  • One unified TSV file per organism with per-base fields:

    • trna_idseq_indexsprinzl_indexsprinzl_labelresidue

    • sprinzl_ordinalsprinzl_continuousglobal_indexregion

Anticodon alphabet convention: tRNA IDs use RNA alphabet (U) in anticodons (e.g., tRNA-Ala-UGC-1-1). This is biologically correct since tRNA is RNA. Consumers using DNA-based pipelines may need to convert U→T for joins (e.g., tRNA-Ala-TGC-1-1).

Prerequisites

  • Docker (For R2DT)

  • Python 3.9+ with pandas.

  • Your tRNA reference fasta

  • trnas_in_space.py (from this repository): extracts per-nucleotide indices/labels from R2DT-produced .enriched.json files, fills Sprinzl gaps, builds global label order, assigns fractional/global indices, and annotates structural regions.

Step 1: Run R2DT

Note: Make sure Docker Desktop (or another Docker engine) is running before you start. On Mac/Windows you should see the 🐳 whale icon in your menu bar/system tray. You can test with docker ps — if it prints a table (even empty), you’re good.

From the project root (with your FASTA files in fastas/), run:

docker run --rm \
  -v "$(pwd):/data" \
  rnacentral/r2dt \
  r2dt.py gtrnadb draw /data/fastas/yourtRNAreference.fa /data/outputs/yourprefix_jsons

This runs R2DT in gtrnadb draw mode, using covariance models and tRNAscan-SE outputs to annotate tRNAs with structural information. It creates a folder like outputs/yourprefix_jsons/ containing R2DT .enriched.json files that include these fields:

  • templateResidueIndex = plain numeric Sprinzl positions

  • templateNumberingLabel = full Sprinzl label as a string (numbers + any special suffixes)

Step 2: Build global coordinates

You can then extract information from the above by running the following script on your R2DT output directory:

python scripts/trnas_in_space.py ./output ecoliK12_global_coords.tsv

This generates a unified coordinate file with all nuclear elongator tRNAs. The script fills missing sprinzl_index values using neighboring positions, and assigns structural regions such as anticodon-loop, acceptor-stem, etc. Unresolvable cases retain sprinzl_index of -1 and a region value of unknown.

For mitochondrial tRNAs, add the --mito flag:

python scripts/trnas_in_space.py ./output ecoliK12_mito_global_coords.tsv --mito

Documentation

Citation

If you use tRNAs in space in your research, please cite:

BibTeX:

@software{trnas_in_space,
  author = {White, Laura K.},
  title = {tRNAs in space: Standardized coordinates for tRNA analysis},
  year = {2025},
  url = {https://github.com/lkwhite/tRNAs-in-space},
  license = {MIT}
}

Related publication:

@article{white2024comparative,
  author = {White, Laura K. and Dobson, K. and Del Pozo, S. and others},
  title = {Comparative analysis of 43 distinct RNA modifications by nanopore tRNA sequencing},
  journal = {bioRxiv},
  year = {2024},
  doi = {10.1101/2024.07.23.604651}
}

Contributing

Contributions are welcome! Please feel free to:

  • Report issues or bugs
  • Suggest new features or improvements
  • Submit pull requests

See CONTRIBUTING.md for guidelines on how to contribute.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Footnotes

  • Isodecoders: tRNAs that share the same anticodon

  • Isoacceptors: tRNAs charged by the same amino acid

Implementation History

  • v1.0 (January 2025): Initial release with grouped files by offset and type.
  • v1.1 (December 2025): Unified coordinate files ({species}_global_coords.tsv) with improved sort algorithms that eliminate position collisions. Added mitochondrial tRNA support via --mito flag.
  • v1.2 (December 2025): Fixed-slot alignment for insertions (tRNAs with different insertion counts now share same global_index columns). Auto-fill missing Sprinzl labels for positions systematically unlabeled by R2DT templates. Yeast coordinates reduced from 133 to 115 unique positions.

References

Cappannini A., Ray A., Purta E., Mukherjee S., Boccaletto P., Moafinejad S.N., Lechner A., Barchet C., Klaholz B.P., Stefaniak F., Bujnicki J.M. (2023). MODOMICS: a database of RNA modifications and related information. 2023 update. Nucleic Acids Research 51(D1):D155–D163. https://doi.org/10.1093/nar/gkad1083 Resource: https://genesilico.pl/modomics/

Chan P.P., Lowe T.M. (2016). GtRNAdb 2.0: an expanded database of transfer RNA genes identified in complete and draft genomes. Nucleic Acids Research 44(Database issue):D184–D189. https://doi.org/10.1093/nar/gkv1309 Resource: https://gtrnadb.org

Chan P.P., Lin B.Y., Mak A.J., Lowe T.M. (2021). tRNAscan-SE 2.0: improved detection and functional classification of transfer RNA genes. Nucleic Acids Research 49(16):9077–9096. https://doi.org/10.1093/nar/gkab688

McCann H., Meade C.D., Williams L.D., Petrov A.S., Johnson P.Z., Simon A.E., Hoksza D., Nawrocki E.P., Chan P.P., Lowe T.M., Ribas C.E., Sweeney B.A., Madeira F., Anyango S., Appasamy S.D., Deshpande M., Varadi M., Velankar S., Zirbel C.L., Naiden A., Jossinet F., Petrov A.I. (2025). R2DT: a comprehensive platform for visualizing RNA secondary structure. Nucleic Acids Research 53(4):gkaf032. https://doi.org/10.1093/nar/gkaf032 Resource: https://r2dt.bio

Reinsch J.L., Garcia D.M. (2025). Concurrent detection of chemically modified bases in yeast mitochondrial tRNAs by nanopore direct RNA sequencing. bioRxiv [Preprint]. 2025 May 9:2025.05.09.653160. https://doi.org/10.1101/2025.05.09.653160

Sprinzl M., Horn C., Brown M., Ioudovitch A., Steinberg S. (1998). Compilation of tRNA sequences and sequences of tRNA genes. Nucleic Acids Research 26(1):148–153. https://doi.org/10.1093/nar/26.1.148

White L.K., Dobson K., Del Pozo S., Bilodeaux J.M., Andersen S.E., Baldwin A., Barrington C., Körtel N., Martinez-Seidel F., Strugar S.M., Watt K.E.N., Mukherjee N., Hesselberth J.R. (2024). Comparative analysis of 43 distinct RNA modifications by nanopore tRNA sequencing. bioRxiv [Preprint]. 2024 Jul 24:2024.07.23.604651. https://doi.org/10.1101/2024.07.23.604651

About

A standardized approach to generate shared tRNA coordinate space for plotting

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors