Skip to content

Image Retrival with Cos Sim #184

@Matagi1996

Description

@Matagi1996

How would I best do pure cos sim based retrival with precalculated Image and Sentence Vectors (say vector databases)?
(The example Notebooks dont realy show that)
In my current case, I am able to retrieve similar Images given other Image embeddings, but Word=>Image does not work at all
(is it even supposed to before applying logit scale and bias?)

Aka when looking at the Transformers Implementation I follow to extract Vectors

# normalized features
  image_embeds = image_embeds / image_embeds.norm(p=2, dim=-1, keepdim=True)
  text_embeds = text_embeds / text_embeds.norm(p=2, dim=-1, keepdim=True)

 # cosine similarity as logits
  logits_per_text = torch.matmul(text_embeds, image_embeds.t().to(text_embeds.device))

  logit_scale, logit_bias = self.logit_scale.to(text_embeds.device), self.logit_bias.to(text_embeds.device)
  logits_per_text = logits_per_text * logit_scale.exp() + logit_bias

At the moment I save just the "image_embeds" and "text_embeds" and try to do retrieval with them, but scores are quite low, especially compared to CLIP baseline.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions