Skip to content
Discussion options

You must be logged in to vote

Hey there,

Thanks for the question! Let me first clarify one thing: the

pooling = {"@layers":"reduce_mean.v1"}

in the config refers to pooling the word-piece representations of the transformer together to align with the tokens produced by the Tokenizer.

The pooling within the textcat uses the reduce_sum and it happens right here https://github.com/explosion/spaCy/blob/master/spacy/ml/models/textcat.py#L123. But you are right, for each Doc the textcat does pool whole document into a single vector before feeding it forward to the output layer.

You are also correct in that you can retrieve the components from the pipeline. I tend to use nlp.get_pipe("textcat") instead of the nlp.components[i]

Replies: 1 comment 2 replies

Comment options

You must be logged in to vote
2 replies
@Mayazure
Comment options

@kadarakos
Comment options

Answer selected by Mayazure
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feat / textcat Feature: Text Classifier feat / transformer Feature: Transformer
2 participants