Skip to content

Issues with Implementation and Replicating Paper Results #36

@its-sandy

Description

@its-sandy

Hi there! I am working on a PyTorch implementation of SLIDE. I'm currently trying to compare its performance against SLIDE. I'm faced with a few doubts/issues while evaluating SLIDE, and need clarifications for the same.

  1. I'm unable to replicate the accuracy vs iteration plot for Delicious 200K dataset using the paramters Simhash, K=9, L=50 mentioned in the paper (plot attached). I also observe that SLIDE's accuracy seems to worsen beyond a certain point. What could be the reasons for these?
    git_issue
  2. I observe a few inconsistencies in the implementations of WTA and DWTA hashes.
  • index += h<<((_K-1-j)*(int)floor(log(binsize)));

    The hashes are combined as index += h<<((_K-1-j)(int)floor(log(binsize))); But, if the hashes are to simply be concatenated, shouldn't it instead be index += h<<((_K-1-j)(int)ceil(log2(binsize))); However, for binsize = 8, I also observe that shifting by floor(log(binsize)) = 2 bits gives better convergence than shifting by ceil(log2(binsize)) = 3 bits. Is this intentional? Why is this the case?

  • There appears to be a bug in WTA hash .

    values[i] = data[i*binsize+j];

  1. What is the reason behind using simhash for Delicious 200K and DWTA hash for Amazon 670K?
  2. The paper had mentioned extension of SLIDE to convolution as a future direction. Has there been any progress along this line?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions