How to train a model with a textcat or textcat_multilabel but with conditions ? #11197

Souheil-b · 2022-07-25T13:25:30Z

Souheil-b
Jul 25, 2022

Hello,
I find myself in a particular situation and I would appreciate some advice or suggestions.
I would like to conduct textcat with conditions; the goal is to have several output labels, but a label cannot be present at the same time as its inverse.
I imagine a doc like this:

data =  ("some text",
            {"cats": 
                "Textcat_1":  {"Label_1": False, "Label_2": True},
                "Textcat_2": {"Label_3": True, "Label_4" :  False},
                "Textcat_3": {"Label_5": True, "Label_6" :  False}
            })

and the output should resemble or be split with True or False values (ignore the float):

{
  "Label_1": 0,000234958,
  "Label_2": 0,900542000,
  "Label_3": 0,905662462,
  "Label_4": 0,000455658,
  "Label_5": 0,900459405,
  "Label_6": 0,000234958
}

The purpose is to ensure that there are several output labels while respecting the condition of not having an output label and its opposing (e.g. Lable_1: True and Label_2: True)

I inquired and discovered that we can handle many outputs and various losses with Keras, however I am not sure if Spacy does as well.

I can also use spacy's textcat multilabel because labels are non-mutually exclusive, but I will most likely end up with a label and its opposite.

The last option I have is to train a model for each pair of labels and then train them, however this will take a long time.

Thank you in advance for your help.

Answered by polm

Jul 26, 2022

There's no way to add constraints like that to a textcat model. There are a couple of different ways you can model it instead.

To be clear, are these opposed labels strictly binary, in that a document is always one or the other? Or can a document be neither of them?

Assuming that they are strictly binary, you could train a non-exclusive multilabel model with only the positive variants, and then the probability of the negative label would be 1 - positive prob.

You could also train a textcat_multilabel model with all the labels, and instead of using a simple threshold, take the higher value from the opposed pairs as positive and the other as negative. That could be implemented as a simple c…

View full answer

polm · 2022-07-26T05:57:18Z

polm
Jul 26, 2022

There's no way to add constraints like that to a textcat model. There are a couple of different ways you can model it instead.

To be clear, are these opposed labels strictly binary, in that a document is always one or the other? Or can a document be neither of them?

Assuming that they are strictly binary, you could train a non-exclusive multilabel model with only the positive variants, and then the probability of the negative label would be 1 - positive prob.

You could also train a textcat_multilabel model with all the labels, and instead of using a simple threshold, take the higher value from the opposed pairs as positive and the other as negative. That could be implemented as a simple component.

I would also recommend you try the approach of training a textcat for each pair of labels, as it might end up being an easier task that way.

1 reply

Souheil-b Jul 26, 2022
Author

To answer your question, a text can have only one label for each "label category", implying that they are strictly binary, and a text is needed to have a label for each category.

I had already considered conducting more training, but I would have to figure out how to freeze the tok2vec after training the first model and how to merge all the models at the end, which I have never done before.

I had not considered your suggestion to establish a textcat_multilabel model and compare the values of opposite labels.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

How to train a model with a textcat or textcat_multilabel but with conditions ? #11197

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Select a reply

Uh oh!

Uh oh!

How to train a model with a textcat or textcat_multilabel but with conditions ? #11197

Uh oh!

Souheil-b Jul 25, 2022

Replies: 1 comment · 1 reply

Uh oh!

polm Jul 26, 2022

Uh oh!

Uh oh!

Souheil-b Jul 26, 2022 Author

Souheil-b
Jul 25, 2022

Replies: 1 comment 1 reply

polm
Jul 26, 2022

Souheil-b Jul 26, 2022
Author