Skip to content
Discussion options

You must be logged in to vote

What you should probably start with:

  • Use "accuracy" in the quickstart to use the static word vectors from en_core_web_lg, with no further changes to the config.
  • You do want a separate tok2vec for a textcat component. Don't use the tok2vec from en_core_web_lg.

After training, you could try out spacy report (https://spacy.io/universe/project/spacy-report) to experiment with the threshold. The threshold is only used for scoring, it doesn't affect the training process itself or the annotations saved to doc.cats, which are always the scores for all categories.

There's a bit of confusing duplication in the settings, so to modify the threshold after training if you want to use spacy evaluate w…

Replies: 1 comment 4 replies

Comment options

You must be logged in to vote
4 replies
@Jarathael
Comment options

@adrianeboyd
Comment options

@Jarathael
Comment options

@adrianeboyd
Comment options

Answer selected by Jarathael
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feat / textcat Feature: Text Classifier
2 participants