Unable to reproduce Precision-Recall plot Supp. Fig. 5

![foldseek_prec_recall_screenshot](https://github.com/user-attachments/assets/ff422ca8-dc11-4276-990b-d79d86038955)

I have tried reproducing Supp. Fig. 5 using the scripts in this repository and also my own code but I get quite different results. For example, as shown in the figure above I find TM-align is better than DALI on Superfamily over the entire range, while your figure shows TM-align to be substantially worse (Fig. S5 on the left, my results on the right). My plot on the right was generated using your data and scripts as follows.

Hits downloaded from https://wwwuser.gwdguser.de/~compbiol/foldseek/scop.benchmark.result.tar.gz

<pre>
sort -rgk3 ../alns/TMalign.txt > TMalign.sorted.txt
sort -rgk3 ../alns/dali.txt > dali.sorted.txt
bench.fdr.noselfhit.awk TMalign.sorted.txt scop_lookup.fix.tsv <(cat $TMalign.sorted.txt) > TM-align.rocx
bench.fdr.noselfhit.awk dali.sorted.txt scop_lookup.fix.tsv <(cat dali.sorted.txt) > dali.rocx
</pre>

Plot column `PREC_SFAM` (y axis) vs. `RECALL_SFAM` (x axis).

As you can see, the plot for DALI looks right but TM-align looks very wrong. 

Any help in resolving this discrepancy will be much appreciated.

Also, the calculation of precision and recall in `bench.fdr.noselfhit.awk` appears to be making corrections compared to the standard formulas, but I don't understand how it works. Can you clarify? In particular, what is the variable `norm` is doing? Thanks!


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unable to reproduce Precision-Recall plot Supp. Fig. 5 #5

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Unable to reproduce Precision-Recall plot Supp. Fig. 5 #5

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions