Replies: 1 comment
-
|
Cool! |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi all,
I saw an interesting paper claiming two issues: opposite visualization and noisy activations for original CLIP model, which affects various downstream task performance and proposed an interesting fix without training
https://github.com/xmed-lab/CLIP_Surgery
Basically the CLIP is recognizing target object by looking at the background, instead of foreground, which indicates wrong relation in self-attention. I played with their demo, I think it's indeed so. And I also tested 2 open_clip models to further test it out. Here are the results.
I use an image and look ['window', 'wall','piano','cat']









CLIP VIT-B/16 Official checkpoint
OPEN_CLIP VIT-B/16 laion2b_s34b_b88k
OPEN_CLIP VIT-L/14 commonpool_xl_clip_s13b_b90k
Beta Was this translation helpful? Give feedback.
All reactions