You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
There was a website that did exactly that, but it didnt display a score, it displayed the three images that matched the keyword (and connected CLIP embedding) the most. Wish I could find it (and its code)
Lets say I have a set of images and I want to ask clip how much of a "dog" is in there, and it should give me a number. How would I go about realising this? Of course, a CLIP model needs to be loaded and accessed in python. I know all the basics, about embeddings and the sampling process, but I am not a very experienced python coder and putting this puzzle piece together is really really difficult for me.
I got an aesthetic scorer going but that was only possible because I was able to use code parts from someone else.
Anyone have an idea about a project that does this, that I can modify? Or where do I even begin to code this from scratch?
edit: Solved. For anyone interested: this here does the trick and even more:
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
There was a website that did exactly that, but it didnt display a score, it displayed the three images that matched the keyword (and connected CLIP embedding) the most. Wish I could find it (and its code)
Lets say I have a set of images and I want to ask clip how much of a "dog" is in there, and it should give me a number. How would I go about realising this? Of course, a CLIP model needs to be loaded and accessed in python. I know all the basics, about embeddings and the sampling process, but I am not a very experienced python coder and putting this puzzle piece together is really really difficult for me.
I got an aesthetic scorer going but that was only possible because I was able to use code parts from someone else.
Anyone have an idea about a project that does this, that I can modify? Or where do I even begin to code this from scratch?
edit: Solved. For anyone interested: this here does the trick and even more:
https://github.com/shonenkov/CLIP-ODS
https://openai.com/blog/clip/
(sorry for using an openai link here :D)
Beta Was this translation helpful? Give feedback.
All reactions