-
Notifications
You must be signed in to change notification settings - Fork 22
Description
Description
I didn't do anything per se, I rather implemented this algorithm in Java and I have some useful tweaks.
What I Observed
The implemented algorithm uses the numpy.argpartition() method.
As the documentation of this method states :
The k-th element will be in its final sorted position and all smaller elements will be moved before it
and all larger elements behind it. The order all elements in the partitions is undefined.
Concretely, we sort the distance_profile and we get the indices using
indices = np.argpartition(tmp, k)
However in the for loop, we go through every index, so it could be that the first k indices are trivial matches or get excluded in the subsequent iterations and the "motifs" that get picked as nearest neighbors are motifs that are in indices[k:] and thus the order of them is undefined i.e. unsorted. We thus end up with neighbors that are not really neighbors.
One fix for this is to just loop over indices[0:k] and notify the user that less that k neighbors were found if some are excluded.
Moreover, I think one should apply the exclusion zone after one adds an index to the found indices, so putting the code inside of the if condition.
Let's assume we have
exclusion_zone = 2
k = 5
indices = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14 ,15]
distance_profile = [Inf, Inf, Inf, x, x, x, ...]
The for loop will go through the indices array, see that index 1 is excluded, and then exclude in again. All good until now.
However, once the loop gets to index 2, as the exclusion zone gets applied on either side of the index, index 4 gets excluded.
Arriving at index 3, it is excluded, and thus is excludes it again and index 5 is then put into the exclusion zone.
This goes on until all indices get consumed and the found indices remains empty, whereas it should have returned at least the index 4 as a neighbor.
I know this is an artificial example, I didn't test it using your library. I didn't read the calling code, but this could be a non-issue if the query is already excluded from the distance profile (I think, but I am not sure). However, the easiest way would be to put the exclusion zone inside of the if statement.