Skip to content

MMseqs2 Release 18 Error in fwbw Alignment (linclust function, Uniref50 dataset) #1060

@JaceWA99

Description

@JaceWA99

Hello, I am deploying MMseqs on Databricks for data pipelining work, and ran into an issue while testing the "linclust" function (separated out easy-linclust for testing). The test dataset is Uniref50 [https://www.uniprot.org/uniref?query=%28identity%3A0.5%29].

When running on the current release (cf9e9f2), clustering failed on the fwbw alignment step (shortened logging below).

I then pinned to release version 17 (b804fbe) and the function performed as normal, indicating that there is an issue with the new fwbw alignment step when processing Uniref50.

For now I can continue using a pinned version, but this would be good to resolve in the future.

NOTE: After a number of tries, the current version succeeded a couple times, indicating a random error. Majority of tests hit the alignment error.

Code (personal paths removed)
mmseqs createdb uniref50.fasta uniref50_db
mmseqs linclust uniref50_db uniref50_clusters tmpA --min-seq-id 0.5 -c 0.5 --similarity-type 2 --cov-mode 1
mmseqs createtsv uniref50_db uniref50_db uniref50_clusters uniref50_clusters.tsv

Logs (config and error)
MMseqs Version:       cf9e9f2605b15436eea129dc42961ec411e57af4
Cluster mode       0
Max connected component depth       1000
Similarity type       2
Threads       32
Compressed       0
Verbosity       3
Weight file name       
Cluster Weight threshold       0.9
Set mode       false
Substitution matrix       aa:blosum62.out,nucl:nucleotide.out
Add backtrace       false
Alignment mode       2
Alignment mode       0
Allow wrapped scoring       false
E-value threshold       0.001
Seq. id. threshold       0.5
Min alignment length       0
Seq. id. mode       0
Alternative alignments       0
Coverage threshold       0.5
Coverage mode       1
Max sequence length       65535
Compositional bias       1
Compositional bias scale       1
Max reject       2147483647
Max accept       2147483647
Include identical seq. id.       false
Preload mode       0
Pseudo count a       substitution:1.100,context:1.400
Pseudo count b       substitution:4.100,context:5.800
Score bias       0
Realign hits       false
Realign score bias       -0.2
Realign max seqs       2147483647
Correlation score weight       0
Gap open cost       aa:11,nucl:5
Gap extension cost       aa:1,nucl:2
Zdrop       40
Alphabet size       aa:21,nucl:5
k-mers per sequence       21
Spaced k-mers       0
Spaced k-mer pattern       
Scale k-mers per sequence       aa:0.000,nucl:0.200
Adjust k-mer length       false
Mask residues       0
Mask residues probability       0.9
Mask lower case residues       0
Mask lower letter repeating N times       0
k-mer length       0
Shift hash       67
Split memory limit       0
Include only extendable       false
Skip repeating k-mers       false
Rescore mode       0
Remove hits by seq. id. and coverage      false
Sort results       0
Remove temporary files       false
Force restart with latest tmp       false
MPI runner       

Compute score and coverage
Query database size: 48398278 type: Aminoacid
Target database size: 48398278 type: Aminoacid
Calculation of alignments
r.word: 2
bests_reverse.first.score: 14565
r.score1: 14563
Score of forward/backward SW differ. This should not happen.
Start: Q: 16052, T: 14692. End: Q: 19494, T 18103
[===============================Error: Alignment step died

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions