Skip to content
Discussion options

You must be logged in to vote

I checked and using prodigy train --base-model en_core_web_sm or --base-model en_core_web_trf doesn't affect the default generated textcat config, which is just BOW by default. (As a side note, if the performance is good enough for your use case, then I would recommend just using BOW. It's simple and fast.)

--base-model might make sense if you're fine-tuning an existing component already in the base model and there isn't a shared tok2vec component, but not in this case with textcat and en_core_web_*. If you fine-tune an existing shared tok2vec or transformer for your textcat component, it's going to degrade the performance for the other components like tagger and parser.

If you want to ex…

Replies: 3 comments 1 reply

Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
1 reply
@adrianeboyd
Comment options

Comment options

You must be logged in to vote
0 replies
Answer selected by cbjrobertson
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feat / textcat Feature: Text Classifier ✨ prodigy Issues related to using spaCy with the Prodigy annotation tool
2 participants
Converted from issue

This discussion was converted from issue #11823 on November 22, 2022 09:16.