How does spacy-transformers deal with textcat_multilabel for long documents? #12574

vsocrates · 2023-04-25T22:29:56Z

vsocrates
Apr 25, 2023

I've had a lot of trouble finding this information. How does spacy-transformers handle text classification tasks for long documents?

By default, I know it uses the overlapping strided spans, but this only seems to make sense for NER, but not as much for text classification tasks. Thanks!

Answered by danieldk

May 17, 2023

Since it applies attention over all the hidden representations in the document, there is no theoretical limit to the input/doc size. Of course there are practical limits (e.g., you'll want to adjust the batch size to the amount of GPU memory available if you run on the GPU).

View full answer

rmitsch · 2023-04-26T09:12:11Z

rmitsch
Apr 26, 2023
Maintainer

Hi @vsocrates!

How does spacy-transformers handle text classification tasks for long documents?

Document length doesn't influence spacy-transformers handles text classification. We do however recommend to split up very long documents into smaller ones to avoid excessive memory consumption - for more information on that see here and here.

By default, I know it uses the overlapping strided spans, but this only seems to make sense for NER, but not as much for text classification tasks.

Why do you think that overlapping strided spans don't make much sense for text classification? Overlapping spans help with providing more context to the transformer model, which in turn results in richer word representations for downstream components.

0 replies

vsocrates · 2023-04-26T16:30:02Z

vsocrates
Apr 26, 2023
Author

Why do you think that overlapping strided spans don't make much sense for text classification? Overlapping spans help with providing more context to the transformer model, which in turn results in richer word representations for downstream components.

For instance, if our task is to classify document topics (e.g. science, politics, etc.) and only a couple sentences/words indicate a politics topic, then with overlapping strided spans, there may be a text chunk that will have a politics topic label, but no text indicating this topic. This situation is the one I'm concerned about.

Would NER make more sense for this setting then?

9 replies

vsocrates May 15, 2023
Author

Apologies for the late reply, got pulled away on a few other tasks!

The pooling to generate word-level tensors makes sense, but I'm then confused how the textcat works. Looking at the config below, it looks like the pipeline uses a TextCatEnsemble component for prediction. It says this is a BOW model on top of a tok2vec neural network, but I don't understand how the word-level tensors are used in the BOW model.

[components.textcat_multilabel]
factory = "textcat_multilabel"
scorer = {"@scorers":"spacy.textcat_multilabel_scorer.v1"}
threshold = 0.5

[components.textcat_multilabel.model]
@architectures = "spacy.TextCatEnsemble.v2"
nO = null

[components.textcat_multilabel.model.linear_model]
@architectures = "spacy.TextCatBOW.v2"
exclusive_classes = false
ngram_size = 1
no_output_layer = false
nO = null

[components.textcat_multilabel.model.tok2vec]
@architectures = "spacy-transformers.TransformerListener.v1"
grad_factor = 1.0
pooling = {"@layers":"reduce_mean.v1"}
upstream = "*"

danieldk May 16, 2023

It says this is a BOW model on top of a tok2vec neural network

It is a (BOW model) and (a neural network on top of tok2vec) (parentheses for disambiguation). The neural network uses a single-headed attention mechanism over the output of the tok2vec pipe (in this case the transformer). The BOW model is applied separately to the document tokens. The model ensemble model combines the predictions of the neural net and the BOW model using a learned output layer.

I hope that clarifies things!

vsocrates May 16, 2023
Author

Hi, yes that makes sense.

Last question, for the neural network on top of the tok2vec pipe, what is the input size? This seems like it'll determine how big the document can be for textcat prediction. <= This may be an assumption, please clarify if this isn't the case!

Thanks!

danieldk May 17, 2023

Since it applies attention over all the hidden representations in the document, there is no theoretical limit to the input/doc size. Of course there are practical limits (e.g., you'll want to adjust the batch size to the amount of GPU memory available if you run on the GPU).

Answer selected by vsocrates

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

How does spacy-transformers deal with textcat_multilabel for long documents? #12574

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments 9 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

How does spacy-transformers deal with textcat_multilabel for long documents? #12574

Uh oh!

vsocrates Apr 25, 2023

Replies: 2 comments · 9 replies

Uh oh!

rmitsch Apr 26, 2023 Maintainer

Uh oh!

vsocrates Apr 26, 2023 Author

Uh oh!

vsocrates May 15, 2023 Author

Uh oh!

Uh oh!

danieldk May 16, 2023

Uh oh!

vsocrates May 16, 2023 Author

Uh oh!

danieldk May 17, 2023

vsocrates
Apr 25, 2023

Replies: 2 comments 9 replies

rmitsch
Apr 26, 2023
Maintainer

vsocrates
Apr 26, 2023
Author

vsocrates May 15, 2023
Author

vsocrates May 16, 2023
Author