is possible to train Textcat with rules/matcher? #9636
-
I want to use the matcher (phrase matcher) power using tokens and lemmas to build a textcat train dataset in front of a python regex that always will need exact word match
Is this the correct way ? or exists a better way on spacy to augmentate a textcat dataset? |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
It is definitely possible to use the Matcher to create training data for a textcat model. That's a form of "weak supervision", where you train a statistical model using the output of a rule-based model. The code you have works. I assume it's example code, but just in case, I will note that "names" is kind of a weird category for a document. Also note you can use entities from existing pipelines if you actually need to match on names. We recently released a weak supervision tutorial project that you might find useful. |
Beta Was this translation helpful? Give feedback.
It is definitely possible to use the Matcher to create training data for a textcat model. That's a form of "weak supervision", where you train a statistical model using the output of a rule-based model.
The code you have works. I assume it's example code, but just in case, I will note that "names" is kind of a weird category for a document. Also note you can use entities from existing pipelines if you actually need to match on names.
We recently released a weak supervision tutorial project that you might find useful.