Skip to content

HuggingFace script and Sentence Transformers script giving different results #44

@vladkvit

Description

@vladkvit

I copy-pasted the two scripts [0][1] into a notebook without any changes. They produce different embeddings and different results.
HG gives:
Cosine similarity between "I'm searching for a planet not too far from Earth." and "Neptune is the eight..." is: 0.622
Cosine similarity between "I'm searching for a planet not too far from Earth." and "TRAPPIST-1d, also de..." is: 0.490
Cosine similarity between "I'm searching for a planet not too far from Earth." and "A harsh desert world..." is: 0.433

Sentence Transformers gives:
Cosine similarity between "I'm searching for a planet not too far from Earth." and "Neptune is the eight..." is: 0.480
Cosine similarity between "I'm searching for a planet not too far from Earth." and "TRAPPIST-1d, also de..." is: 0.370
Cosine similarity between "I'm searching for a planet not too far from Earth." and "A harsh desert world..." is: 0.369

I checked the embeddings; both the doc and the query embeddings are different between the two scripts. I also tried running on GPU (by adding .cuda() in relevant places) - same results as above.

If it helps, I can dump the embedding vectors or the full code in the comments.

It would be nice to have the expected output in the README as well.

[0] https://github.com/Muennighoff/sgpt#asymmetric-semantic-search-be
[1] https://github.com/Muennighoff/sgpt#asymmetric-semantic-search-be-st

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions