Skip to content

Apply Sentence Transformers on SignCLIP embeddings? #30

@cleong110

Description

@cleong110

Not sure it would be possible to live-embed within the sentence transformers framework, as SignCLIP isn't on HuggingFace and has some arcane dependencies and running requirements. BUT... what if we could embed first, load the video embeddings, and then apply sentence transformers on the text queries?

Characteristics of Sentence Transformer (a.k.a bi-encoder) models:

  1. Calculates a fixed-size vector representation (embedding) given texts or images.
  2. Embedding calculation is often efficient, embedding similarity calculation is very fast.
  3. Applicable for a wide range of tasks, such as semantic textual similarity, semantic search, clustering, classification, paraphrase mining, and more.
  4. Often used as a first step in a two-step retrieval process, where a Cross-Encoder (a.k.a. reranker) model is used to re-rank the top-k results from the bi-encoder.

Links:

In this one they actually use precomputed embeddings
https://colab.research.google.com/drive/16OdADinjAg3w3ceZy3-cOR9A-5ZW9BYr?usp=sharing
When I print out the shape and type of the embeddings it's

Embeddings: (24996, 512) <class 'numpy.ndarray'>
Multilingual

While we're at it, here's multilingual models supported by Sentence Transformers: https://sbert.net/docs/sentence_transformer/pretrained_models.html#multilingual-models

And here's all the sentence transformer models on HF: https://huggingface.co/models?library=sentence-transformers

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions