Image Retrival with Cos Sim

How would I best do pure cos sim based retrival with precalculated Image and Sentence Vectors (say vector databases)? 
(The example Notebooks dont realy show that)
In my current case, I am able to retrieve similar Images given other Image embeddings, but Word=>Image does not work at all 
(is it even supposed to before applying logit scale and bias?)

Aka when looking at the Transformers Implementation I follow to extract Vectors

```
# normalized features
  image_embeds = image_embeds / image_embeds.norm(p=2, dim=-1, keepdim=True)
  text_embeds = text_embeds / text_embeds.norm(p=2, dim=-1, keepdim=True)

 # cosine similarity as logits
  logits_per_text = torch.matmul(text_embeds, image_embeds.t().to(text_embeds.device))

  logit_scale, logit_bias = self.logit_scale.to(text_embeds.device), self.logit_bias.to(text_embeds.device)
  logits_per_text = logits_per_text * logit_scale.exp() + logit_bias
```

At the moment I save just the "image_embeds" and "text_embeds" and try to do retrieval with them, but scores are quite low, especially compared to CLIP baseline.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Image Retrival with Cos Sim #184

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Image Retrival with Cos Sim #184

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions