Skip to content

Module: IO RNA Structure

JΓΆrg Winkler edited this page Mar 6, 2018 · 6 revisions

Project goals

This project aims to implement I/O routines for

  • fixed interactions with pseudoknot support
  • base pair probability matrix (only read)
  • alignments with consensus structure

User scenario 1:

StructureFile sf ("file.db", STRSEQ | ENERGY);
for (auto rec : sf)
{
    structured_rna<rna4, dot_bracket3> structured_sequence;
    structured_sequence = get<STRSEQ>(rec);
    cout << get<ENERGY>(rec) << endl;
}

User scenario 2:

StructureFile sf ("bpp.ps", BPP);
for (auto rec : sf)
    vector<vector<double>> matrix = get<BPP>(rec);

Design ideas

There is a config enum that allows the user to specify data fields to be obtained. A tuple of all required data is returned and can be accessed also through the enum variables.

Work packages

  • Develop the enum. Which data fields do we need? (see rna_record in SeqAn2)
  • Implement the tuple to be returned. Its length is the number of elements in the enum.
  • Write constructor: support Dot_bracket, Stockholm and ViennaRNA ps format as a start (we also need fasta, but this is sequence I/O).
  • Implement the read() function dependent on file type. Skip information that is not stored.
  • Implement the write() function. Check if required data fields are missing. Calculate values or set defaults for missing optional information (sequence length etc).

Files to be created

  • seqan3/io/structure/ (directory)
  • structure_file.hpp
  • dot_bracket3.hpp
  • bpp_matrix.hpp
  • stockholm.hpp

Clone this wiki locally