Skip to content

Add MODOMICS modification annotation support #11

@jayhesselberth

Description

@jayhesselberth

Summary

Add functionality to fetch tRNA modification annotations from the MODOMICS database, align them to reference tRNA sequences, and integrate into the clover data model.

Context

read_mod_annotations() exists but is a minimal TSV reader with no way to generate annotation data. We need a way to programmatically download known tRNA modifications from MODOMICS and map them onto our reference sequences.

Proposed Approach

New file: R/modomics.R

Exported function: fetch_modomics_mods(fasta, organism, cache_dir = NULL, min_identity = 0.7)

  1. Fetch MODOMICS modification dictionary (maps unicode chars → mod names) via /api/modifications?format=json
  2. Fetch tRNA sequences for organism via /api/sequences?RNAtype=tRNA&organism={org}&format=json
  3. Strip modification codes to get alignable RNA sequences
  4. Align to reference FASTA using Biostrings::pairwiseAlignment(type = "local")
  5. Transfer modification positions through the alignment
  6. Return tibble with columns ref, pos, mod_full, mod1 (compatible with read_mod_annotations())

Internal helpers: .fetch_modomics_modifications(), .fetch_modomics_sequences(), .strip_modifications(), .extract_mod_positions(), .align_modomics_to_ref(), .match_modomics_to_refs(), .load_or_fetch() (cache helper)

Other changes

  • R/clover-se.R: Update read_mod_annotations() to also accept a tibble (not just file path)
  • DESCRIPTION: Add httr2 and jsonlite to Suggests
  • R/globals.R: Add new global variables
  • tests/testthat/test-modomics.R: Unit tests (mock data) + integration test (skip_if_offline())

Design Decisions

  • Local alignment handles different 5'/3' ends and CCA tails between MODOMICS and reference
  • Pre-filter candidates by amino acid before alignment to reduce cost
  • httr2/jsonlite as Suggests (gated by rlang::check_installed())
  • Optional cache_dir for offline use via saveRDS()/readRDS()

Organisms

S. cerevisiae, E. coli, H. sapiens, M. musculus initially (any MODOMICS organism supported)

Verification

fa <- clover_example("yeast/trna-ref.fa.gz")
mods <- fetch_modomics_mods(fa, "Saccharomyces cerevisiae")
mods

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions