Skip to content

Conversation

@JingyaHuang
Copy link
Collaborator

@JingyaHuang JingyaHuang commented Oct 24, 2025

What does this PR do?

Fixes #975

  • Add encode() and similarity()
  • tests
  • new doc for sentence transformers

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you make sure to update the documentation with your changes?
  • Did you write any new necessary tests?

@JingyaHuang JingyaHuang marked this pull request as ready for review October 26, 2025 20:20
@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

return sum([len(t) for t in text]) # Sum of length of individual strings

@property
def similarity_fn_name(self) -> Literal["cosine", "dot", "euclidean", "manhattan"]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def similarity_fn_name(self) -> Literal["cosine", "dot", "euclidean", "manhattan"]:
def similarity_fn_name(self) -> SimilarityFunction:

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why? I think in the setter self._similarity will always be str.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If self._similarity_fn_name is None, then you return SimilarityFunction.COSINE.
But that's a nit I guess, it's equivalent right?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah in that case it returns SimilarityFunction.COSINE which is just the str "cosine" which is not a function but its name:
https://github.com/huggingface/sentence-transformers/blob/85ec64559f4414aa536eca4bf53538291e0a333f/sentence_transformers/similarity_functions.py#L31C14-L31C22

@JingyaHuang JingyaHuang merged commit 99ff466 into main Oct 31, 2025
8 checks passed
@JingyaHuang JingyaHuang deleted the improve-sentence-trfrs branch October 31, 2025 13:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add support for encode and similarity for sentence transformer models

4 participants