-
Notifications
You must be signed in to change notification settings - Fork 264
Description
Hello, I am deploying MMseqs on Databricks for data pipelining work, and ran into an issue while testing the "linclust" function (separated out easy-linclust for testing). The test dataset is Uniref50 [https://www.uniprot.org/uniref?query=%28identity%3A0.5%29].
When running on the current release (cf9e9f2), clustering failed on the fwbw alignment step (shortened logging below).
I then pinned to release version 17 (b804fbe) and the function performed as normal, indicating that there is an issue with the new fwbw alignment step when processing Uniref50.
For now I can continue using a pinned version, but this would be good to resolve in the future.
NOTE: After a number of tries, the current version succeeded a couple times, indicating a random error. Majority of tests hit the alignment error.
Code (personal paths removed)
mmseqs createdb uniref50.fasta uniref50_db
mmseqs linclust uniref50_db uniref50_clusters tmpA --min-seq-id 0.5 -c 0.5 --similarity-type 2 --cov-mode 1
mmseqs createtsv uniref50_db uniref50_db uniref50_clusters uniref50_clusters.tsv
Logs (config and error)
MMseqs Version: cf9e9f2605b15436eea129dc42961ec411e57af4
Cluster mode 0
Max connected component depth 1000
Similarity type 2
Threads 32
Compressed 0
Verbosity 3
Weight file name
Cluster Weight threshold 0.9
Set mode false
Substitution matrix aa:blosum62.out,nucl:nucleotide.out
Add backtrace false
Alignment mode 2
Alignment mode 0
Allow wrapped scoring false
E-value threshold 0.001
Seq. id. threshold 0.5
Min alignment length 0
Seq. id. mode 0
Alternative alignments 0
Coverage threshold 0.5
Coverage mode 1
Max sequence length 65535
Compositional bias 1
Compositional bias scale 1
Max reject 2147483647
Max accept 2147483647
Include identical seq. id. false
Preload mode 0
Pseudo count a substitution:1.100,context:1.400
Pseudo count b substitution:4.100,context:5.800
Score bias 0
Realign hits false
Realign score bias -0.2
Realign max seqs 2147483647
Correlation score weight 0
Gap open cost aa:11,nucl:5
Gap extension cost aa:1,nucl:2
Zdrop 40
Alphabet size aa:21,nucl:5
k-mers per sequence 21
Spaced k-mers 0
Spaced k-mer pattern
Scale k-mers per sequence aa:0.000,nucl:0.200
Adjust k-mer length false
Mask residues 0
Mask residues probability 0.9
Mask lower case residues 0
Mask lower letter repeating N times 0
k-mer length 0
Shift hash 67
Split memory limit 0
Include only extendable false
Skip repeating k-mers false
Rescore mode 0
Remove hits by seq. id. and coverage false
Sort results 0
Remove temporary files false
Force restart with latest tmp false
MPI runner
Compute score and coverage
Query database size: 48398278 type: Aminoacid
Target database size: 48398278 type: Aminoacid
Calculation of alignments
r.word: 2
bests_reverse.first.score: 14565
r.score1: 14563
Score of forward/backward SW differ. This should not happen.
Start: Q: 16052, T: 14692. End: Q: 19494, T 18103
[===============================Error: Alignment step died