Transformer Sequence Length Error and Error Handling #13797
Unanswered
esbh25
asked this question in
Help: Coding & Implementations
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
I'm encountering a very rare error where a document will fail during
nlp.pipe()
with the message:"RuntimeError: The size of tensor a (514) must match the size of tensor b (512) at non-singleton dimension 1"
This is due to tensor a being longer than the max_sequence length of the transformer model, which should be handled by the transformer.model.get_spans component if I'm not mistaken?
I've only seen this issue on one sequence of Russian text, running through a Russian language pipeline. However, running it through a multilingual (xx) language pipeline does not give me this error.
The transformers used are different in each, https://huggingface.co/ai-forever/ruBert-base for russian, vs XLM-Roberta-Base for the xx pipeline.
If anyone knows what could be the underlying issue that'd be great, however I'm also willing to just skip the document if this occurs, since it's happened only once in thousands processed.
The only issue I have is that processing with
nlp.pipe()
doesn't seem to have a way to skip individual documents if they fail during thefor doc in nlp.pipe():
line. If the underlying issue can't be fixed, does anyone know of a way to do error handling where if this error occurs on one document out of 100, the other 99 will still process?Thanks in advance
Beta Was this translation helpful? Give feedback.
All reactions