-
Notifications
You must be signed in to change notification settings - Fork 11
Open
Description
Hello, I tried finetuning Image-Text Vectorizer CLIP model using above approach. But I get stuck with the error -
Link to full code - Colab
What I need is something which gives cosine similarity between an image and a text, shall I finetune with triplet, or with cosine similarity? if its cosine similarity, then how will I get those cosine similarity?
The triplet variant takes text and image and gives one normalised vector, I am bit confused because I thought it would give a cosine similarity.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels
