Skip to content

Module: Alphabets

Hannes Hauswedell edited this page Mar 8, 2017 · 10 revisions

Possible Layout

alph/alphabet.hpp               // Alphabet concept
alph/alphabet_container.hpp     // generic alphabet traits for basic_string adaption and string->ostream adapters

alph/nucl/dna4.hpp                   // dna alphabet definition; alias from dna4 to dna
alph/nucl/dna4_container.hpp         // dna traits specialization; aliases for vector; literal

alph/nucl/dna5.hpp                   // plus N
alph/nucl/dna5_container.hpp

alph/nucl/dna16.hpp                  // full IUPAC code
alph/nucl/dna16_container.hpp

alph/nucl/rna4.hpp                   // rna4 alphabet definition; alias from rna4 to rna; inherits dna4
alph/nucl/rna4_container.hpp         // n; aliases for vector; literal

alph/nucl/rna5.hpp                   // ...
alph/nucl/rna5_container.hpp

alph/nucl/rna16.hpp                  // ...
alph/nucl/rna16_container.hpp

alph/nucl/conversion.hpp             // code for converting between differenct nucl alphabets and containers
alph/nucl/conversion_container.hpp   // code for converting containers; view implementation


alph/aminoacid/aa27.hpp                   // amino acid (27 letter code)
alph/aminoacid/aa27_container.hpp

alph/aminoacid/aa10murphy.hpp             // murphy reduction (10 letter code)
alph/aminoacid/aa10murphy_container.hpp

alph/aminoacid/conversion.hpp             // code for converting between different amino acid alphabets and containers
alph/aminoacid/conversion_container.hpp   // code for converting containers; view implementations

alph/quality/phred.hpp              // phred quality scores      

alph/translation.hpp            // code for translating nucl -> amino acid

General notes

  • the rna* alphabets inherit corresponding dna* alphabets and just overwrite value_to_char static member
  • there will be an alias from dna4 to dna and rna4 to rna (TODO should this really be default *16 alphabets?)
  • quality is an independent alphabet (likely won't be implemented during the retreat)
  • there will be a compound_alphabet concept, where a character can consist of multiple characters, e.g. dna5 and phred; by default this will use two bytes, but bit-compressing containers may/shall compress this to less than a byte
  • we support general containers like std::vector. std::string will work, but only in a limited fashion (and isn't recommended)

Open Questions

  • should the default dna be dna4 or dna16?

Prototype implementation

Clone this wiki locally