-
Notifications
You must be signed in to change notification settings - Fork 12
Description
ANIclustermap v2.0.1
I was using ANIclustermap to compare some genomes and found some discrepancies between what the visualization said were the more similar genomes and which were according to the ANI matrix.
After some digging, I found that when in the next line you calculate the linkage:
| linkage = hc.linkage(ani_matrix_df, method="average") |
According to the documentation since you gave it a 2D matrix it will calculate the euclidean distances between the rows (and columns). I think that a better approach is to transform the ANI matrix to a 1-D condensed distance matrix so it can use it directly as the distance matrix using the squareform function:
linkage = hc.linkage(squareform(100 - ani_matrix_df), method="average")ANI matrix using the example files and skani mode:
| GCF_000009725.1_ASM972v1_genomic | GCF_000270265.1_ASM27026v1_genomic | GCF_000478825.2_ASM47882v2_genomic | GCF_018282115.1_ASM1828211v1_genomic | GCF_018845095.1_ASM1884509v1_genomic | GCF_019924095.1_ASM1992409v1_genomic | |
|---|---|---|---|---|---|---|
| GCF_000009725.1_ASM972v1_genomic | 100 | 100 | 86.41 | 88.12 | 99.99 | 88.09 |
| GCF_000270265.1_ASM27026v1_genomic | 100 | 100 | 86.46 | 88.57 | 100 | 88.58 |
| GCF_000478825.2_ASM47882v2_genomic | 86.41 | 86.46 | 100 | 84.83 | 86.41 | 85.33 |
| GCF_018282115.1_ASM1828211v1_genomic | 88.12 | 88.57 | 84.83 | 100 | 88.13 | 99.46 |
| GCF_018845095.1_ASM1884509v1_genomic | 99.99 | 100 | 86.41 | 88.13 | 100 | 88.09 |
| GCF_019924095.1_ASM1992409v1_genomic | 88.09 | 88.58 | 85.33 | 99.46 | 88.09 | 100 |
Heatmap with current behaviour:

Heatmap with 1-D condensed distance matrix:

GCF_000478825.2_ASM47882v2_genomic is consistently the genome with lower ANI to all other genomes and now this is also reflected in the heatmap with the 1-D condensed distance matrix.
I will make a PR implementing this.