Performance improvement ideas

Although AccSeq does well in some tasks, it's still important to further improve its performance. Some ideas have developed during the implementation of this project, but these ideas are mostly immature that has not been added into the algorithm.

Ideas:
1. Reduce the suffix array to an array of gene indices. Each entry in the suffix array represents a location in the original file, we can create another auxiliary data structure that holds the index of the gene of that location. With this approach, we can use this auxiliary data structure in the voting procedure, which saves a lot of memory and should be faster when doing the voting.
1. We need a run-length-encoding library which enables random access for many tasks. The auxiliary data structure mentioned above and many other things could be compressed using RLE.
1. The LC-hash algorithm might be able to be further improved. Can we figure out an equation for "forward search"? If so, the LC-hash could help with queries shorter than hash length. Moreover, if we replace the hash table with a different one which can hold a longer hash length for only the most used queries, we may be able to fit the algorithm to any hash length. The most used queries could be searched during the indexing phase, or during runtime. Runtime LC-hash would be the most ideal case.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance improvement ideas #6

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Performance improvement ideas #6

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions