-
Notifications
You must be signed in to change notification settings - Fork 2
Description
Not sure it would be possible to live-embed within the sentence transformers framework, as SignCLIP isn't on HuggingFace and has some arcane dependencies and running requirements. BUT... what if we could embed first, load the video embeddings, and then apply sentence transformers on the text queries?
Characteristics of Sentence Transformer (a.k.a bi-encoder) models:
- Calculates a fixed-size vector representation (embedding) given texts or images.
- Embedding calculation is often efficient, embedding similarity calculation is very fast.
- Applicable for a wide range of tasks, such as semantic textual similarity, semantic search, clustering, classification, paraphrase mining, and more.
- Often used as a first step in a two-step retrieval process, where a Cross-Encoder (a.k.a. reranker) model is used to re-rank the top-k results from the bi-encoder.
Links:
- https://github.com/UKPLab/sentence-transformers/blob/04ae2e064b0f92d6a80fe6d0cd58827b987e67ee/examples/sentence_transformer/applications/image-search/example.py#L25 example for Image search, uses
cos_scores = util.cos_sim(img_emb, text_emb) - https://sbert.net/examples/sentence_transformer/applications/image-search/README.html guide for image search
- training example https://github.com/UKPLab/sentence-transformers/blob/04ae2e064b0f92d6a80fe6d0cd58827b987e67ee/examples/sentence_transformer/training/clip/train_clip.ipynb#L7
- CLIPModel wrapper source code: https://github.com/UKPLab/sentence-transformers/blob/04ae2e064b0f92d6a80fe6d0cd58827b987e67ee/sentence_transformers/models/CLIPModel.py#L15
- https://sbert.net/docs/package_reference/sentence_transformer/SentenceTransformer.html#sentence_transformers.SentenceTransformer.encode documentation for the encoder() function
- https://sbert.net/docs/package_reference/sentence_transformer/SentenceTransformer.html#sentence_transformers.SentenceTransformer.similarity similarity function
- Image and Text models: https://sbert.net/docs/sentence_transformer/pretrained_models.html#image-text-models
- LOTS of examples here: https://github.com/UKPLab/sentence-transformers/tree/04ae2e064b0f92d6a80fe6d0cd58827b987e67ee/examples/sentence_transformer/applications/image-search
In this one they actually use precomputed embeddings
https://colab.research.google.com/drive/16OdADinjAg3w3ceZy3-cOR9A-5ZW9BYr?usp=sharing
When I print out the shape and type of the embeddings it's
Embeddings: (24996, 512) <class 'numpy.ndarray'>
Multilingual
While we're at it, here's multilingual models supported by Sentence Transformers: https://sbert.net/docs/sentence_transformer/pretrained_models.html#multilingual-models
And here's all the sentence transformer models on HF: https://huggingface.co/models?library=sentence-transformers