spacy.TextCatBOW.v1 - further documentation #7948

hp52 · 2021-04-29T11:49:31Z

hp52
Apr 29, 2021

Hi,
I'm reading the documentation about the TextCatEnsemble.v2. It says that it's a stacked ensemble of a linear bag-of-words model and a neural network model. As a linear model, a TextCatBOW is used in the example which is described as an n-gram “bag-of-words” model. Since there is no more documentation about it, I'm wondering what kind of “bag-of-words” model it is?
Could you give an explanation?
Thanks for your help.

Answered by svlandeg

May 3, 2021

Hi! The TextCatBow architecture extracts n-grams from the text. If n is 3, it'll fetch combinations of 3 consecutive tokens. A true "bag of words" model is obtained when you set n to 1, then it'll extract each word separately. There's a linear layer following those n-grams. The final output layer depends on whether or not the classes of your textcat are exclusive: if they are, the output layer is a softmax activation, otherwise it's a sigmoid activation layer (also called Logistic in some of our code). You can find the code implementation here: https://github.com/explosion/spaCy/blob/master/spacy/ml/models/textcat.py

And if you run init config, you can see some default values for this imp…

View full answer

svlandeg · 2021-05-03T21:32:42Z

svlandeg
May 3, 2021

Hi! The TextCatBow architecture extracts n-grams from the text. If n is 3, it'll fetch combinations of 3 consecutive tokens. A true "bag of words" model is obtained when you set n to 1, then it'll extract each word separately. There's a linear layer following those n-grams. The final output layer depends on whether or not the classes of your textcat are exclusive: if they are, the output layer is a softmax activation, otherwise it's a sigmoid activation layer (also called Logistic in some of our code). You can find the code implementation here: https://github.com/explosion/spaCy/blob/master/spacy/ml/models/textcat.py

And if you run init config, you can see some default values for this implementation, e.g:

spacy init config -p textcat textcat.cfg -o accuracy

would give you

...
[components.textcat.model]
@architectures = "spacy.TextCatEnsemble.v2"
nO = null

[components.textcat.model.linear_model]
@architectures = "spacy.TextCatBOW.v1"
exclusive_classes = true
ngram_size = 1
no_output_layer = false
nO = null

[components.textcat.model.tok2vec]
@architectures = "spacy.Tok2VecListener.v1"
width = ${components.tok2vec.model.encode.width}
upstream = "*"
...

Where you see that by default, n is set to 1 (single tokens instead of n-grams)

4 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

spacy.TextCatBOW.v1 - further documentation #7948

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 4 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

spacy.TextCatBOW.v1 - further documentation #7948

Uh oh!

hp52 Apr 29, 2021

Replies: 1 comment · 4 replies

Uh oh!

svlandeg May 3, 2021

Uh oh!

Joshmantova Dec 27, 2021

Uh oh!

polm Dec 27, 2021

Uh oh!

Joshmantova Dec 27, 2021

Uh oh!

polm Dec 27, 2021

hp52
Apr 29, 2021

Replies: 1 comment 4 replies

svlandeg
May 3, 2021