Skip to content

Conversion of Donut (vision-encoder-decoder type of model) fails if we don't pass the --task argument to the conversion script #1453

@felladrin

Description

@felladrin

System Info

Transformers.js v3.7.6 running on Linux.
The issue is related to the conversion script only, with runs on Python 3.12.

Environment/Platform

  • Website/web-app
  • Browser extension
  • Server-side (e.g., Node.js, Deno, Bun)
  • Desktop app (e.g., Electron)
  • Other (e.g., VSCode extension)

Description

As discussed in on this thread in huggingface.co/spaces/onnx-community/convert-to-onnx, when passing --task image-to-text, the conversion of Norm/nougat-latex-base works fine.

But if we run without the --task argument, the conversion fails with the following message:

Conversion failed: Config of the encoder: <class 'transformers.models.donut.modeling_donut_swin.DonutSwinModel'> is overwritten by shared encoder config: DonutSwinConfig { "attention_probs_dropout_prob": 0.0, "depths": [ 2, 2, 14, 2 ], "drop_path_rate": 0.1, "embed_dim": 128, "hidden_act": "gelu", "hidden_dropout_prob": 0.0, "hidden_size": 1024, "image_size": [ 224, 560 ], "initializer_range": 0.02, "layer_norm_eps": 1e-05, "mlp_ratio": 4.0, "model_type": "donut-swin", "num_channels": 3, "num_heads": [ 4, 8, 16, 32 ], "num_layers": 4, "patch_size": 4, "qkv_bias": true, "torch_dtype": "float32", "transformers_version": "4.49.0", "use_absolute_embeddings": false, "window_size": 7 }

Config of the decoder: <class 'transformers.models.mbart.modeling_mbart.MBartForCausalLM'> is overwritten by shared decoder config: MBartConfig { "activation_dropout": 0.0, "activation_function": "gelu", "add_cross_attention": true, "add_final_layer_norm": true, "attention_dropout": 0.0, "bos_token_id": 0, "classifier_dropout": 0.0, "d_model": 1024, "decoder_attention_heads": 16, "decoder_ffn_dim": 4096, "decoder_layerdrop": 0.0, "decoder_layers": 10, "dropout": 0.1, "encoder_attention_heads": 16, "encoder_ffn_dim": 4096, "encoder_layerdrop": 0.0, "encoder_layers": 12, "eos_token_id": 2, "forced_eos_token_id": 2, "init_std": 0.02, "is_decoder": true, "is_encoder_decoder": false, "max_length": 800, "max_position_embeddings": 4096, "model_type": "mbart", "num_hidden_layers": 12, "pad_token_id": 1, "scale_embedding": true, "tie_word_embeddings": false, "torch_dtype": "float32", "transformers_version": "4.49.0", "use_cache": true, "vocab_size": 50000 }

Using a slow image processor as use_fast is unset and a slow processor was saved with this model. use_fast=True will be the default behavior in v4.48, even if the model was saved with a slow processor. This will result in minor differences in outputs. You'll still be able to use a slow processor with use_fast=False. /usr/local/lib/python3.12/site-packages/transformers/models/donut/modeling_donut_swin.py:265: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! if width % self.patch_size[1] != 0: /usr/local/lib/python3.12/site-packages/transformers/models/donut/modeling_donut_swin.py:268: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! if height % self.patch_size[0] != 0: /usr/local/lib/python3.12/site-packages/transformers/models/donut/modeling_donut_swin.py:575: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! if min(input_resolution) ≤ self.window_size: /usr/local/lib/python3.12/site-packages/transformers/models/donut/modeling_donut_swin.py:669: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! was_padded = pad_values[3] > 0 or pad_values[5] > 0 /usr/local/lib/python3.12/site-packages/transformers/models/donut/modeling_donut_swin.py:670: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! if was_padded: /usr/local/lib/python3.12/site-packages/transformers/models/donut/modeling_donut_swin.py:307: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! should_pad = (height % 2 == 1) or (width % 2 == 1) /usr/local/lib/python3.12/site-packages/transformers/models/donut/modeling_donut_swin.py:308: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! if should_pad: /usr/local/lib/python3.12/site-packages/transformers/models/donut/modeling_donut_swin.py:579: TracerWarning: torch.tensor results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect. torch.min(torch.tensor(input_resolution)) if torch.jit.is_tracing() else min(input_resolution) /usr/local/lib/python3.12/site-packages/transformers/modeling_attn_mask_utils.py:88: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! if input_shape[-1] > 1 or self.sliding_window is not None: /usr/local/lib/python3.12/site-packages/transformers/modeling_attn_mask_utils.py:164: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! if past_key_values_length > 0: /usr/local/lib/python3.12/site-packages/transformers/models/mbart/modeling_mbart.py:503: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! if attn_output.size() != (bsz, self.num_heads, tgt_len, self.head_dim): /usr/local/lib/python3.12/site-packages/transformers/models/mbart/modeling_mbart.py:490: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! is_causal = True if self.is_causal and attention_mask is None and tgt_len > 1 else False /usr/local/lib/python3.12/site-packages/transformers/models/mbart/modeling_mbart.py:455: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! and past_key_value[0].shape[2] == key_value_states.shape[1]

Reproduction

  1. Access https://huggingface.co/spaces/onnx-community/convert-to-onnx
  2. Insert "Norm/nougat-latex-base" as the model name to convert and hit Enter
  3. Fill in your Hugging Face write token, so the model can be uploaded to your account. Otherwise it won't display the "Proceed [with conversion]" button. Don't change any other option.
  4. Click the "Proceed" button.
  5. Wait a few minutes and the conversion error will be displayed.

Note: The same error should happen if you run the conversion script manually.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions