Skip to content

fix: Error in siglip output conversion#4205

Merged
KennethEnevoldsen merged 13 commits intomainfrom
fix-siglip
Mar 11, 2026
Merged

fix: Error in siglip output conversion#4205
KennethEnevoldsen merged 13 commits intomainfrom
fix-siglip

Conversation

@KennethEnevoldsen
Copy link
Contributor

Discovered when running for #4193

@KennethEnevoldsen KennethEnevoldsen enabled auto-merge (squash) March 6, 2026 09:45
@Samoed Samoed disabled auto-merge March 6, 2026 09:55
Copy link
Member

@Samoed Samoed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure why my comments are not displayed in here, only in file changes.

Seems that siglip uses lasttokenpooling, not mean pooling https://github.com/huggingface/transformers/blob/aad13b87ed59f2afcfaebc985f403301887a35fc/src/transformers/models/siglip/modeling_siglip.py#L523-L525

>>> torch.mean(
                    text_outputs["last_hidden_state"], dim=1
                )
tensor([[-0.3865, -0.5087,  0.1831,  ...,  0.8009,  0.1759, -0.5749],
        [-0.3301, -0.8056,  0.2536,  ...,  0.7427, -0.2328, -0.5802],
        [-0.2553, -1.1079,  0.4744,  ...,  0.0874, -0.0792, -0.6855],
        ...,
        [-0.4886, -0.6445,  0.0385,  ...,  0.3665, -0.1208, -0.5695],
        [ 0.0374, -0.6736,  0.1148,  ...,  0.1854,  0.2513, -0.7220],
        [-0.0087,  0.3741, -0.0740,  ...,  0.1965, -0.2676, -0.6060]])
>>> text_outputs.pooler_output
tensor([[ 1.0248,  0.2559,  0.1031,  ...,  0.1529, -0.5528,  0.2910],
        [ 0.7734, -0.6904, -0.7753,  ..., -0.5422,  0.6960, -0.1439],
        [ 0.4678, -0.2966, -0.2353,  ...,  0.1854, -0.6198,  0.6368],
        ...,
        [ 0.0607,  0.5256, -0.2899,  ..., -0.9362,  1.1828, -0.0575],
        [ 0.0698, -0.8170,  0.1015,  ..., -0.1641,  0.2055,  0.2031],
        [ 0.5874, -0.5110, -0.3075,  ...,  0.4028,  0.3083, -0.3046]])

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>
@KennethEnevoldsen
Copy link
Contributor Author

Good catch - let me run some comparisons to check

@KennethEnevoldsen
Copy link
Contributor Author

Yeah so clearly better, but still bad:

Vidore3FinanceEnRetrieval.v2.json
current: 0.03747,
mean pooling: 0.00119,

Though this models isn't really designed for this so very unsure if we would expect it to be any better.

@Samoed
Copy link
Member

Samoed commented Mar 6, 2026

I don't think that results will be better

@Samoed
Copy link
Member

Samoed commented Mar 6, 2026

You can also try to reprocude scores from mieb

@isaac-chung isaac-chung changed the title fix: Error in siglib output conversion fix: Error in siglip output conversion Mar 7, 2026
@isaac-chung
Copy link
Collaborator

I can't seem to find what the error is, and how to reproduce it?

@Samoed
Copy link
Member

Samoed commented Mar 7, 2026

I tried on main mteb run -m google/siglip-base-patch16-384 -t VidoreArxivQARetrieval

  File "/mteb/mteb/models/model_implementations/siglip_models.py", line 71, in get_text_embeddings
    all_text_embeddings.append(text_outputs.cpu())
                               ^^^^^^^^^^^^^^^^
AttributeError: 'BaseModelOutputWithPooling' object has no attribute 'cpu'

@isaac-chung
Copy link
Collaborator

When I ran that, I got an ModuleNotFoundError for sentence piece, and then for protobuf

@isaac-chung
Copy link
Collaborator

isaac-chung commented Mar 7, 2026

Ok, got the same AttributeError when i run

uv run --no-sync --extra siglip mteb run -t MSCOCOT2IRetrieval -m google/siglip-so400m-patch14-224

but when I run with MSCOCOI2TRetrieval, I get

ValueError: Found None in batch for key 'text'

@Samoed
Copy link
Member

Samoed commented Mar 7, 2026

I think this error not related to siglip. We just handle this task incorrectly

@KennethEnevoldsen
Copy link
Contributor Author

I ran it on CIFAR10

>>> res[0].get_score()
np.float64(0.9693000000000002)

notably better than what is reported in the paper: 83.79. I then also checked for STS12VisualSTS, which reports 61.90 and matches my score:

>>> res[0].get_score()
np.float64(0.618997279216088)

So I think the implementation is correct, and that we might have had some issues in earlier versions either of the task or the model.

@isaac-chung also resolved the issue with missing dependencies.

@isaac-chung
Copy link
Collaborator

Looks good! Only thing left is likely the dependency conflicts then

@KennethEnevoldsen
Copy link
Contributor Author

Looks good! Only thing left is likely the dependency conflicts then

added a fix.


I also found another problem when running it on Caltech101ZeroShot - added a fix for that as well

@KennethEnevoldsen
Copy link
Contributor Author

KennethEnevoldsen commented Mar 11, 2026

I think this is good to merge with the fixes

@KennethEnevoldsen KennethEnevoldsen enabled auto-merge (squash) March 11, 2026 09:39
@KennethEnevoldsen KennethEnevoldsen merged commit ec20d1e into main Mar 11, 2026
13 checks passed
@KennethEnevoldsen KennethEnevoldsen deleted the fix-siglip branch March 11, 2026 09:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants