add faster msa preprocessing #22

RaphaelBouvet · 2023-06-30T12:52:41Z

Hi,
Thank you for developing and releasing Tranception :)

During testing of this model, i noticed that the MSA_processing in tranception/utils/msa_utils.py was a limiting step if the msa was too large.

This PR adds a Fast_MSA_processing class with improved speed 🔥 at the cost of more memory.
For example for a msa with 21 k sequences:

    fast processing : 13 sec
    base processing : 472 sec

Instead of doing sequences by sequences comparisons in the original code, i parallelize the calculation.
For big msa, doing the whole comparisons at once is not possible so i used sub_arrays to split the calculation.

The resulting weights are identical compared to the prereleased weights in my tests.
The memory usage can be adjusted manually by changing the size of the subarrays (maybe we can adjust depending on user ram)
The code might not work if there is empty sequences in the msa (not tested)

I am sure there is a better/faster way to do this calculation but this method worked for me

Do not hesitate if you have any questions,
Best wishes,
Raphaël

msa preprocessing is faster

RaphaelBouvet added 2 commits March 30, 2023 14:14

add fastmsa preprocessing class

cc388ae

msa preprocessing is faster

add more description

f781346

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

add faster msa preprocessing #22

add faster msa preprocessing #22

Uh oh!

RaphaelBouvet commented Jun 30, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

add faster msa preprocessing #22

Are you sure you want to change the base?

add faster msa preprocessing #22

Uh oh!

Conversation

RaphaelBouvet commented Jun 30, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant