A standardized approach to generate shared tRNA coordinates for plotting.
Using pre-computed coordinates:
import pandas as pd
# Load E. coli nuclear tRNAs
df = pd.read_csv('outputs/ecoliK12_global_coords.tsv', sep='\t')
# Create alignment matrix
alignment = df.pivot_table(
index='trna_id',
columns='global_index',
values='residue'
)Note: One unified coordinate file per organism for nuclear tRNAs. Mitochondrial tRNAs have separate files (see Mitochondrial tRNAs).
Installation:
# Clone the repository
git clone https://github.com/lkwhite/tRNAs-in-space.git
cd tRNAs-in-space
# Install as a package
pip install -e .
# Or install with visualization tools
pip install -e ".[viz]"Generate coordinates from your own data:
# After running R2DT on your FASTA files
python scripts/trnas_in_space.py ./r2dt_output_dir/ my_output.tsv
# For mitochondrial tRNAs (separate coordinate system)
python scripts/trnas_in_space.py ./r2dt_output_dir/ my_mito_output.tsv --mitoSee examples/01_basic_visualization.ipynb for detailed usage examples.
Pre-computed modification data from MODOMICS is included, mapped to the global coordinate system. This enables comparison of experimental modification detection against known reference modifications from mass spectrometry data.
Available species:
- E. coli: 261 modification positions across 14 tRNAs (12 modification types)
- S. cerevisiae: 131 modification positions across 10 tRNAs (18 modification types)
- H. sapiens: 43 modification positions across 16 tRNAs (16 modification types)
Usage example:
import pandas as pd
# Load Modomics annotations
mods = pd.read_csv('outputs/modomics/modomics_to_sprinzl_mapping.tsv', sep='\t')
# Join with your global coordinates
coords = pd.read_csv('outputs/ecoliK12_global_coords.tsv', sep='\t')
annotated = coords.merge(
mods[['gtRNAdb_trna_id', 'position_gtRNAdb', 'modification_short_name']],
left_on=['trna_id', 'seq_index'],
right_on=['gtRNAdb_trna_id', 'position_gtRNAdb'],
how='left'
)
# Now 'annotated' includes modification_short_name for known modificationsSee docs/archive/modomics-integration/MODOMICS_INTEGRATION.md for implementation details and alignment methodology.
This project provides a standardized coordinate system for nuclear elongator tRNAs that enables comparative structural analysis across different tRNA families. The coordinate system supports both Type I (standard) and Type II (extended variable arm) tRNAs, allowing researchers to perform analyses that were not previously possible with individual tRNA studies.
Cross-tRNA Comparative Analysis:
- Compare modification patterns across amino acid families
- Analyze structural domain conservation (acceptor stem, anticodon loop, T-arm)
- Study extended variable arm differences between Leu/Ser/Tyr and other tRNAs
- Generate multi-tRNA heatmaps and statistical comparisons
Position-Specific Studies:
- Map modification frequencies to standardized structural positions
- Identify hotspots of evolutionary conservation or variation
- Correlate structural features with experimental modification data
Type I vs Type II Analysis:
- Compare standard tRNAs (76 nt) with extended variable arm tRNAs (~90 nt)
- Study structural adaptations in Leucine, Serine, and Tyrosine tRNAs
- Analyze how extended arms affect surrounding structural regions
✅ Supported tRNA Types:
- Nuclear elongator tRNAs: Standard cytoplasmic tRNAs used in protein synthesis
- Type I: Alanine, Phenylalanine, Glycine, and most other amino acids (standard structure)
- Type II: Leucine, Serine, Tyrosine (extended variable arms with e1-e24 positions)
🚫 Excluded from Nuclear Coordinates:
- Selenocysteine tRNAs: Structurally incompatible (~95 nt with unique binding requirements)
- Initiator methionine tRNAs: Modified structure for specialized ribosome binding
📦 Separate Coordinate System:
- Mitochondrial tRNAs: Different architecture requires separate coordinates (use
--mitoflag)
High-Confidence Analyses:
- Cross-amino-acid modification comparisons
- Structural domain analysis (acceptor, anticodon, T-regions)
- Type I vs Type II extended variable arm studies
Moderate-Confidence Analyses:
- Position-specific studies (validation recommended for critical positions)
- Inter-species comparative analysis
Alternative Approaches Recommended:
- Fine-grained analysis within single tRNA families (use individual tRNA coordinates)
- Studies requiring selenocysteine or mitochondrial tRNAs (specialized analysis needed)
For detailed analysis guidelines, see ANALYSIS_GUIDELINES.md. For technical implementation details, see docs/archive/coordinate-fixes/COORDINATE_SYSTEM_SCOPE.md.
Each organism has one unified coordinate file for nuclear tRNAs that includes both Type I (standard) and Type II (extended variable arm) tRNAs. Mitochondrial tRNAs have separate coordinate files due to their different structural architecture.
| Organism | Nuclear File | tRNAs | Mito File | Mito tRNAs |
|---|---|---|---|---|
| E. coli K12 | ecoliK12_global_coords.tsv |
82 | — | — |
| S. cerevisiae | sacCer_global_coords.tsv |
267 | sacCer_mito_global_coords.tsv |
18 |
| H. sapiens | hg38_global_coords.tsv |
416 | hg38_mito_global_coords.tsv |
22 |
The unified coordinate files contain both structural types:
- Type I (short variable loop): Ala, Arg, Asn, Asp, Cys, Gln, Glu, Gly, His, Ile, Lys, Met, Phe, Pro, Thr, Trp, Val
- Type II (extended variable arm): Leu, Ser, Tyr
Both types share the same global_index coordinate space, with extended variable arm positions (e1-e24) assigned to dedicated indices between positions 45 and 48.
Mitochondrial tRNAs have different structural architecture (60-75 nt, variable features) and require a separate coordinate system. Generate mito coordinates with the --mito flag:
python scripts/trnas_in_space.py outputs/hg38_jsons outputs/hg38_mito_global_coords.tsv --mitoNote on yeast mito tRNAs: R2DT lacks fungal mitochondrial-specific models, so some alignments may need verification. See Reinsch and Garcia 2025 (References) for curated yeast mito tRNA alignments.
This README documents how to go from tRNA reference sequences → coordinate files for plotting and cross‑isodecoder comparisons.
Pre-computed coordinate files are available in the outputs/ directory for E. coli K12, S. cerevisiae, and H. sapiens nuclear elongator tRNAs, plus mitochondrial tRNAs for yeast and human.
The documentation and code in this repository can be used to generate coordinate files for your own tRNA sequences.
tRNA biologists have classically used the Sprinzl positions pictured above (named after M. Sprinzl, see References) instead of consecutive numbering within each isodecoder[1]. This system ensures that homologous structural features line up across different tRNAs. For instance, the anticodon is always assigned to positions 34-36 regardless of whether a particular tRNA sequence is longer or shorter.
This convention is biologically meaningful, but introduces problems for data integration:
-
Non-contiguous across isodecoders: not every tRNA contains every Sprinzl position, so some positions are absent depending on sequence length or loop structure
-
Unequal spacing: gaps in Sprinzl numbering create irregular axes, making it difficult to generate heatmaps or plots that assume equally spaced positions
-
Non-integer labels like 17a, 20a, 20b and the e-notations in the variable loop further complicate use of Sprinzl as a common coordinate system
R2DT 2.0 partly addresses this by embedding Sprinzl numbering in its structural templates, allowing researchers to annotate secondary structure images using positional annotations relevant to tRNA biology. This is very useful for RNA structure visualization. But for downstream analysis, a more unified coordinate system is needed.
For tRNA sequencing (or other positionally anchored assays), it is often more useful to work in a global coordinate system:
-
Each nucleotide position is assigned a consecutive integer index (1,2,3...).
-
The index is consistent across all tRNAs, ensuring every position in a heatmap corresponds to an equal-spaced axis.
-
Missing Sprinzl positions can be interpolated or left blank without breaking the regular grid.
Here's an example from our own work where we attempted to align nuclear and mitochondrial tRNAs from budding yeast using Sprinzl coordinates. You'll note some positions appear as "missing" (gray), with the large grey region between Sprinzl positions 48 and 49 reflecting variable loop length, where none of the tRNAs displayed contains sequence covering the full set of variable loop positions. 
However, there are still a few issues with the heatmap above.
-
The distribution of "missing" positions in the variable loop don't line up correctly with their Sprinzl annotations, because the Sprinzl annotations along the X axis are actually only added during the plotting step
-
Not all tRNAs are pictured, because these structural alignments were generated from a non-comprehensive
.afafile
Note: This repository now includes properly aligned Modomics modification data mapped to the global coordinate system (see outputs/modomics/), which resolves these alignment issues for downstream analysis
By introducing a global index, we eliminate spacing irregularities and enable cross-isodecoder comparison in a clean, standardized coordinate space.
Goal: To convert heterogeneous Sprinzl-style labels from R2DT output into a unified coordinate system we need to:
-
Keep per‑nucleotide sequence order (5′→3′).
-
Preserve canonical Sprinzl labels (e.g., 20, 20A).
-
Fill unlabeled residues deterministically with fractional positions.
-
Generate a global_index (1..K) so all tRNAs plot on the same x‑axis; missing positions show as NA.
Inputs A FASTA file of mature tRNA sequences used for alignment/reference. For consistency, trim adapters out of these sequences if present.
Outputs:
-
One unified TSV file per organism with per-base fields:
-
trna_id,seq_index,sprinzl_index,sprinzl_label,residue -
sprinzl_ordinal,sprinzl_continuous,global_index,region
-
Anticodon alphabet convention: tRNA IDs use RNA alphabet (U) in anticodons (e.g.,
tRNA-Ala-UGC-1-1). This is biologically correct since tRNA is RNA. Consumers using DNA-based pipelines may need to convert U→T for joins (e.g.,tRNA-Ala-TGC-1-1).
-
Python 3.9+ with
pandas. -
Your tRNA reference fasta
-
trnas_in_space.py(from this repository): extracts per-nucleotide indices/labels from R2DT-produced.enriched.jsonfiles, fills Sprinzl gaps, builds global label order, assigns fractional/global indices, and annotates structural regions.
Note: Make sure Docker Desktop (or another Docker engine) is running before you start. On Mac/Windows you should see the 🐳 whale icon in your menu bar/system tray. You can test with
docker ps— if it prints a table (even empty), you’re good.
From the project root (with your FASTA files in fastas/), run:
docker run --rm \
-v "$(pwd):/data" \
rnacentral/r2dt \
r2dt.py gtrnadb draw /data/fastas/yourtRNAreference.fa /data/outputs/yourprefix_jsons
This runs R2DT in gtrnadb draw mode, using covariance models and tRNAscan-SE outputs to annotate tRNAs with structural information. It creates a folder like outputs/yourprefix_jsons/ containing R2DT .enriched.json files that include these fields:
-
templateResidueIndex= plain numeric Sprinzl positions -
templateNumberingLabel= full Sprinzl label as a string (numbers + any special suffixes)
You can then extract information from the above by running the following script on your R2DT output directory:
python scripts/trnas_in_space.py ./output ecoliK12_global_coords.tsv
This generates a unified coordinate file with all nuclear elongator tRNAs. The script fills missing sprinzl_index values using neighboring positions, and assigns structural regions such as anticodon-loop, acceptor-stem, etc. Unresolvable cases retain sprinzl_index of -1 and a region value of unknown.
For mitochondrial tRNAs, add the --mito flag:
python scripts/trnas_in_space.py ./output ecoliK12_mito_global_coords.tsv --mito
- OUTPUT_FORMAT.md - Detailed specification of output TSV columns
- FAQ.md - Frequently asked questions and practical tips
- examples/01_basic_visualization.ipynb - Interactive visualization tutorial
- CHANGELOG.md - Version history and release notes
- ANALYSIS_GUIDELINES.md - Guidelines for using the coordinate system in research
- docs/archive/ - Historical documentation and completed development notes
If you use tRNAs in space in your research, please cite:
BibTeX:
@software{trnas_in_space,
author = {White, Laura K.},
title = {tRNAs in space: Standardized coordinates for tRNA analysis},
year = {2025},
url = {https://github.com/lkwhite/tRNAs-in-space},
license = {MIT}
}Related publication:
@article{white2024comparative,
author = {White, Laura K. and Dobson, K. and Del Pozo, S. and others},
title = {Comparative analysis of 43 distinct RNA modifications by nanopore tRNA sequencing},
journal = {bioRxiv},
year = {2024},
doi = {10.1101/2024.07.23.604651}
}Contributions are welcome! Please feel free to:
- Report issues or bugs
- Suggest new features or improvements
- Submit pull requests
See CONTRIBUTING.md for guidelines on how to contribute.
This project is licensed under the MIT License - see the LICENSE file for details.
-
Isodecoders: tRNAs that share the same anticodon
-
Isoacceptors: tRNAs charged by the same amino acid
- v1.0 (January 2025): Initial release with grouped files by offset and type.
- v1.1 (December 2025): Unified coordinate files (
{species}_global_coords.tsv) with improved sort algorithms that eliminate position collisions. Added mitochondrial tRNA support via--mitoflag. - v1.2 (December 2025): Fixed-slot alignment for insertions (tRNAs with different insertion counts now share same global_index columns). Auto-fill missing Sprinzl labels for positions systematically unlabeled by R2DT templates. Yeast coordinates reduced from 133 to 115 unique positions.
Cappannini A., Ray A., Purta E., Mukherjee S., Boccaletto P., Moafinejad S.N., Lechner A., Barchet C., Klaholz B.P., Stefaniak F., Bujnicki J.M. (2023). MODOMICS: a database of RNA modifications and related information. 2023 update. Nucleic Acids Research 51(D1):D155–D163. https://doi.org/10.1093/nar/gkad1083 Resource: https://genesilico.pl/modomics/
Chan P.P., Lowe T.M. (2016). GtRNAdb 2.0: an expanded database of transfer RNA genes identified in complete and draft genomes. Nucleic Acids Research 44(Database issue):D184–D189. https://doi.org/10.1093/nar/gkv1309 Resource: https://gtrnadb.org
Chan P.P., Lin B.Y., Mak A.J., Lowe T.M. (2021). tRNAscan-SE 2.0: improved detection and functional classification of transfer RNA genes. Nucleic Acids Research 49(16):9077–9096. https://doi.org/10.1093/nar/gkab688
McCann H., Meade C.D., Williams L.D., Petrov A.S., Johnson P.Z., Simon A.E., Hoksza D., Nawrocki E.P., Chan P.P., Lowe T.M., Ribas C.E., Sweeney B.A., Madeira F., Anyango S., Appasamy S.D., Deshpande M., Varadi M., Velankar S., Zirbel C.L., Naiden A., Jossinet F., Petrov A.I. (2025). R2DT: a comprehensive platform for visualizing RNA secondary structure. Nucleic Acids Research 53(4):gkaf032. https://doi.org/10.1093/nar/gkaf032 Resource: https://r2dt.bio
Reinsch J.L., Garcia D.M. (2025). Concurrent detection of chemically modified bases in yeast mitochondrial tRNAs by nanopore direct RNA sequencing. bioRxiv [Preprint]. 2025 May 9:2025.05.09.653160. https://doi.org/10.1101/2025.05.09.653160
Sprinzl M., Horn C., Brown M., Ioudovitch A., Steinberg S. (1998). Compilation of tRNA sequences and sequences of tRNA genes. Nucleic Acids Research 26(1):148–153. https://doi.org/10.1093/nar/26.1.148
White L.K., Dobson K., Del Pozo S., Bilodeaux J.M., Andersen S.E., Baldwin A., Barrington C., Körtel N., Martinez-Seidel F., Strugar S.M., Watt K.E.N., Mukherjee N., Hesselberth J.R. (2024). Comparative analysis of 43 distinct RNA modifications by nanopore tRNA sequencing. bioRxiv [Preprint]. 2024 Jul 24:2024.07.23.604651. https://doi.org/10.1101/2024.07.23.604651