Skip to content
Discussion options

You must be logged in to vote

If you have both the entity annotations and the text category in the same data (which you seem to have), then it is best to treat this as as single training task. This will allow both the named entity recognizer and text categorizer to use the same underlying contextual vector (tok2vec representations).

Training two separate pipelines and merging them is more of a last resort when the data sets are disjoint. E.g. if you had one document collection with text categories and a completely different document collection with named entity annotations.

Replies: 1 comment 6 replies

Comment options

You must be logged in to vote
6 replies
@danieldk
Comment options

@darioprencipe
Comment options

@danieldk
Comment options

@darioprencipe
Comment options

@rmitsch
Comment options

Answer selected by darioprencipe
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
usage General spaCy usage feat / ner Feature: Named Entity Recognizer feat / textcat Feature: Text Classifier feat / training Feature: Training utils, Example, Corpus and converters
3 participants