-
Notifications
You must be signed in to change notification settings - Fork 0
Description
Summary
Add functionality to fetch tRNA modification annotations from the MODOMICS database, align them to reference tRNA sequences, and integrate into the clover data model.
Context
read_mod_annotations() exists but is a minimal TSV reader with no way to generate annotation data. We need a way to programmatically download known tRNA modifications from MODOMICS and map them onto our reference sequences.
Proposed Approach
New file: R/modomics.R
Exported function: fetch_modomics_mods(fasta, organism, cache_dir = NULL, min_identity = 0.7)
- Fetch MODOMICS modification dictionary (maps unicode chars → mod names) via
/api/modifications?format=json - Fetch tRNA sequences for organism via
/api/sequences?RNAtype=tRNA&organism={org}&format=json - Strip modification codes to get alignable RNA sequences
- Align to reference FASTA using
Biostrings::pairwiseAlignment(type = "local") - Transfer modification positions through the alignment
- Return tibble with columns
ref,pos,mod_full,mod1(compatible withread_mod_annotations())
Internal helpers: .fetch_modomics_modifications(), .fetch_modomics_sequences(), .strip_modifications(), .extract_mod_positions(), .align_modomics_to_ref(), .match_modomics_to_refs(), .load_or_fetch() (cache helper)
Other changes
R/clover-se.R: Updateread_mod_annotations()to also accept a tibble (not just file path)DESCRIPTION: Addhttr2andjsonliteto SuggestsR/globals.R: Add new global variablestests/testthat/test-modomics.R: Unit tests (mock data) + integration test (skip_if_offline())
Design Decisions
- Local alignment handles different 5'/3' ends and CCA tails between MODOMICS and reference
- Pre-filter candidates by amino acid before alignment to reduce cost
httr2/jsonliteas Suggests (gated byrlang::check_installed())- Optional
cache_dirfor offline use viasaveRDS()/readRDS()
Organisms
S. cerevisiae, E. coli, H. sapiens, M. musculus initially (any MODOMICS organism supported)
Verification
fa <- clover_example("yeast/trna-ref.fa.gz")
mods <- fetch_modomics_mods(fa, "Saccharomyces cerevisiae")
mods