-
Notifications
You must be signed in to change notification settings - Fork 171
Open
Description
Hi there! I am working on a PyTorch implementation of SLIDE. I'm currently trying to compare its performance against SLIDE. I'm faced with a few doubts/issues while evaluating SLIDE, and need clarifications for the same.
- I'm unable to replicate the accuracy vs iteration plot for Delicious 200K dataset using the paramters Simhash, K=9, L=50 mentioned in the paper (plot attached). I also observe that SLIDE's accuracy seems to worsen beyond a certain point. What could be the reasons for these?

- I observe a few inconsistencies in the implementations of WTA and DWTA hashes.
-
HashingDeepLearning/SLIDE/LSH.cpp
Line 82 in 3cebe6f
index += h<<((_K-1-j)*(int)floor(log(binsize)));
The hashes are combined as index += h<<((_K-1-j)(int)floor(log(binsize))); But, if the hashes are to simply be concatenated, shouldn't it instead be index += h<<((_K-1-j)(int)ceil(log2(binsize))); However, for binsize = 8, I also observe that shifting by floor(log(binsize)) = 2 bits gives better convergence than shifting by ceil(log2(binsize)) = 3 bits. Is this intentional? Why is this the case? -
There appears to be a bug in WTA hash .
HashingDeepLearning/SLIDE/WtaHash.cpp
Line 57 in 3cebe6f
values[i] = data[i*binsize+j];
- What is the reason behind using simhash for Delicious 200K and DWTA hash for Amazon 670K?
- The paper had mentioned extension of SLIDE to convolution as a future direction. Has there been any progress along this line?
Metadata
Metadata
Assignees
Labels
No labels