How does spacy-transformers deal with textcat_multilabel for long documents? #12574
-
I've had a lot of trouble finding this information. How does spacy-transformers handle text classification tasks for long documents? By default, I know it uses the overlapping strided spans, but this only seems to make sense for NER, but not as much for text classification tasks. Thanks! |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 9 replies
-
Hi @vsocrates!
Document length doesn't influence
Why do you think that overlapping strided spans don't make much sense for text classification? Overlapping spans help with providing more context to the transformer model, which in turn results in richer word representations for downstream components. |
Beta Was this translation helpful? Give feedback.
-
For instance, if our task is to classify document topics (e.g. science, politics, etc.) and only a couple sentences/words indicate a politics topic, then with overlapping strided spans, there may be a text chunk that will have a politics topic label, but no text indicating this topic. This situation is the one I'm concerned about. Would NER make more sense for this setting then? |
Beta Was this translation helpful? Give feedback.
Since it applies attention over all the hidden representations in the document, there is no theoretical limit to the input/doc size. Of course there are practical limits (e.g., you'll want to adjust the batch size to the amount of GPU memory available if you run on the GPU).