Skip to content

Search Module

cpockrandt edited this page Dec 14, 2018 · 8 revisions

Search Module

This gives an overview over the search module and is a good start to get familiar with the basics of it.

Overview

The search module covers to areas: string indices (such as FM indices, suffix arrays or k-mer indices) and search algorithms (those that need a string index and those who don't). Currently only FM indices and search algorithms for FM indices are implemented.

Indices

The string indices in SeqAn3 are based on the SDSL (Succinct Data Structure Library) and so far only cover FM indices (i.e. the Burrows-Wheeler-Transform and a sampled suffix array). In the SDSL FM indices are referred to as compressed suffix arrays (CSAs). SeqAn3 provides wrapper classes for indices to have a consistent interface with other SeqAn3 classes and for Cereal Support (Serialization). It also takes care of indexing string collections (formerly known as string sets), e.g. a vector of DNA sequences.

SeqAn3 index wrappers only construct and hold the data structures and allow serialization. The actual search is performed by iterators. These iterators (which actually do not behave like actual iterators. Thus they will be renamed to cursors soon) can search a sequence in the indexed text. The text is searched character by character, either to the right (or to the left if the index is bidirectional).

Algorithms

TODO

Code changes to SeqAn2

Unfortunately commits have been squashed so the algorithmic differences of the search schemes from SeqAn2 and SeqAn3 are not visible in the commits anymore. This might be helpful if unexpected behaviour occurs. Some if clauses were superfluous and the behaviour for insertions/deletion at the beginning and end of a sequene changed. All changes refer to search/algorithm/detail/search_scheme_algorithm.hpp.

inline bool search_ss_deletion(...)
{

     ...

     // Insert deletions into the current block as long as possible
-    if (!(search.pi[block_id] == 1 && !go_right) && max_error_left_in_block > 0 &&
-        error_left.total > 0 && error_left.deletion > 0 &&
+    if (!(search.pi[block_id] == 1 && !go_right) &&              // Do not allow deletions at the beginning of the leftmost block
+        !(search.pi[block_id] == search.blocks() && go_right) && // Do not allow deletions at the end of the rightmost block
+        max_error_left_in_block > 0 && error_left.total > 0 && error_left.deletion > 0 &&

     ...

}
inline bool search_ss_children(...)
{

     ...

-    // TODO: incorporate error_left.deletion into formula and simplify a bit
-    if (error_left.deletion == 0 && min_error_left_in_block > 0 && chars_left + delta < min_error_left_in_block + 1u)
+    // TODO: incorporate error_left.deletion into formula
+    if (error_left.deletion == 0 && chars_left + delta < min_error_left_in_block + 1u)

     ...

     // Deletion
-    if (error_left.deletion > 0)
+    // TODO: check whether the conditions for deletions at the beginning/end of the query are really necessary
+    if (error_left.deletion > 0 &&
+        !(go_right && (rb == 1 || rb == query.size() + 1)) && // No deletion at the beginning of the leftmost block.
+        !(!go_right && (lb == 0 || lb == query.size())))      // No deletion at the end of the rightmost block.

     ...

}

Clone this wiki locally