Transformer Sequence Length Error and Error Handling #13797

esbh25 · 2025-04-10T19:47:18Z

esbh25
Apr 10, 2025

I'm encountering a very rare error where a document will fail during nlp.pipe() with the message:

"RuntimeError: The size of tensor a (514) must match the size of tensor b (512) at non-singleton dimension 1"

This is due to tensor a being longer than the max_sequence length of the transformer model, which should be handled by the transformer.model.get_spans component if I'm not mistaken?

I've only seen this issue on one sequence of Russian text, running through a Russian language pipeline. However, running it through a multilingual (xx) language pipeline does not give me this error.

The transformers used are different in each, https://huggingface.co/ai-forever/ruBert-base for russian, vs XLM-Roberta-Base for the xx pipeline.

If anyone knows what could be the underlying issue that'd be great, however I'm also willing to just skip the document if this occurs, since it's happened only once in thousands processed.

The only issue I have is that processing with nlp.pipe() doesn't seem to have a way to skip individual documents if they fail during the for doc in nlp.pipe(): line. If the underlying issue can't be fixed, does anyone know of a way to do error handling where if this error occurs on one document out of 100, the other 99 will still process?

Thanks in advance

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Transformer Sequence Length Error and Error Handling #13797

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Uh oh!

Transformer Sequence Length Error and Error Handling #13797

Uh oh!

esbh25 Apr 10, 2025

Replies: 0 comments

esbh25
Apr 10, 2025