Training NER and TextCategorizer together #12991
-
Hello, I have a dataset of raw bank transactions. My task - out of a raw bank transaction description like
I have already trained from scratch a model with What I don't get now is - provided I do have the raw annotations (let's say in a .csv file) - how I should go about training
The output I imagine is a custom spacy model that allows me to do the following:
In general, I couldn't find examples of pipeline where you or folks around do exactly this. So I wonder what's the best-practice-way to approach this problem. Thanks a lot! |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 6 replies
-
If you have both the entity annotations and the text category in the same data (which you seem to have), then it is best to treat this as as single training task. This will allow both the named entity recognizer and text categorizer to use the same underlying contextual vector (tok2vec representations). Training two separate pipelines and merging them is more of a last resort when the data sets are disjoint. E.g. if you had one document collection with text categories and a completely different document collection with named entity annotations. |
Beta Was this translation helpful? Give feedback.
If you have both the entity annotations and the text category in the same data (which you seem to have), then it is best to treat this as as single training task. This will allow both the named entity recognizer and text categorizer to use the same underlying contextual vector (tok2vec representations).
Training two separate pipelines and merging them is more of a last resort when the data sets are disjoint. E.g. if you had one document collection with text categories and a completely different document collection with named entity annotations.