MMseqs2 Release 18 Error in fwbw Alignment (linclust function, Uniref50 dataset)

Hello, I am deploying MMseqs on Databricks for data pipelining work, and ran into an issue while testing the "linclust" function (separated out easy-linclust for testing). The test dataset is Uniref50 [https://www.uniprot.org/uniref?query=%28identity%3A0.5%29]. 

When running on the current release (cf9e9f2605b15436eea129dc42961ec411e57af4), clustering failed on the fwbw alignment step (shortened logging below). 

I then pinned to release version 17 (b804fbe384e6f6c9fe96322ec0e92d48bccd0a42) and the function performed as normal, indicating that there is an issue with the new fwbw alignment step when processing Uniref50. 

For now I can continue using a pinned version, but this would be good to resolve in the future. 

**NOTE:** After a number of tries, the current version succeeded a couple times, indicating a random error. Majority of tests hit the alignment error. 

**Code (personal paths removed)**
mmseqs createdb uniref50.fasta uniref50_db
**mmseqs linclust uniref50_db uniref50_clusters tmpA --min-seq-id 0.5 -c 0.5 --similarity-type 2 --cov-mode 1**
mmseqs createtsv uniref50_db uniref50_db uniref50_clusters uniref50_clusters.tsv

**Logs (config and error)**
MMseqs Version:                           cf9e9f2605b15436eea129dc42961ec411e57af4
Cluster mode                              0
Max connected component depth             1000
Similarity type                           2
Threads                                   32
Compressed                                0
Verbosity                                 3
Weight file name                          
Cluster Weight threshold                  0.9
Set mode                                  false
Substitution matrix                       aa:blosum62.out,nucl:nucleotide.out
Add backtrace                             false
Alignment mode                            2
Alignment mode                            0
Allow wrapped scoring                     false
E-value threshold                         0.001
Seq. id. threshold                        0.5
Min alignment length                      0
Seq. id. mode                             0
Alternative alignments                    0
Coverage threshold                        0.5
Coverage mode                             1
Max sequence length                       65535
Compositional bias                        1
Compositional bias scale                  1
Max reject                                2147483647
Max accept                                2147483647
Include identical seq. id.                false
Preload mode                              0
Pseudo count a                            substitution:1.100,context:1.400
Pseudo count b                            substitution:4.100,context:5.800
Score bias                                0
Realign hits                              false
Realign score bias                        -0.2
Realign max seqs                          2147483647
Correlation score weight                  0
Gap open cost                             aa:11,nucl:5
Gap extension cost                        aa:1,nucl:2
Zdrop                                     40
Alphabet size                             aa:21,nucl:5
k-mers per sequence                       21
Spaced k-mers                             0
Spaced k-mer pattern                      
Scale k-mers per sequence                 aa:0.000,nucl:0.200
Adjust k-mer length                       false
Mask residues                             0
Mask residues probability                 0.9
Mask lower case residues                  0
Mask lower letter repeating N times       0
k-mer length                              0
Shift hash                                67
Split memory limit                        0
Include only extendable                   false
Skip repeating k-mers                     false
Rescore mode                              0
Remove hits by seq. id. and coverage      false
Sort results                              0
Remove temporary files                    false
Force restart with latest tmp             false
MPI runner                                

Compute score and coverage
Query database size: 48398278 type: Aminoacid
Target database size: 48398278 type: Aminoacid
Calculation of alignments
r.word: 2
bests_reverse.first.score: 14565
r.score1: 14563
Score of forward/backward SW differ. This should not happen.
Start: Q: 16052, T: 14692. End: Q: 19494, T 18103
**[===============================Error: Alignment step died**

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

MMseqs2 Release 18 Error in fwbw Alignment (linclust function, Uniref50 dataset) #1060

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

MMseqs2 Release 18 Error in fwbw Alignment (linclust function, Uniref50 dataset) #1060

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions