-
Notifications
You must be signed in to change notification settings - Fork 51
Description
I copy-pasted the two scripts [0][1] into a notebook without any changes. They produce different embeddings and different results.
HG gives:
Cosine similarity between "I'm searching for a planet not too far from Earth." and "Neptune is the eight..." is: 0.622
Cosine similarity between "I'm searching for a planet not too far from Earth." and "TRAPPIST-1d, also de..." is: 0.490
Cosine similarity between "I'm searching for a planet not too far from Earth." and "A harsh desert world..." is: 0.433
Sentence Transformers gives:
Cosine similarity between "I'm searching for a planet not too far from Earth." and "Neptune is the eight..." is: 0.480
Cosine similarity between "I'm searching for a planet not too far from Earth." and "TRAPPIST-1d, also de..." is: 0.370
Cosine similarity between "I'm searching for a planet not too far from Earth." and "A harsh desert world..." is: 0.369
I checked the embeddings; both the doc and the query embeddings are different between the two scripts. I also tried running on GPU (by adding .cuda() in relevant places) - same results as above.
If it helps, I can dump the embedding vectors or the full code in the comments.
It would be nice to have the expected output in the README as well.
[0] https://github.com/Muennighoff/sgpt#asymmetric-semantic-search-be
[1] https://github.com/Muennighoff/sgpt#asymmetric-semantic-search-be-st