Replicate v2.3 textcat in 3.3 config #11198
Unanswered
python3Berg
asked this question in
Help: Coding & Implementations
Replies: 1 comment 7 replies
-
And just to double check, the labels are exclusive, right? One doc has exactly one label in your gold standard? |
Beta Was this translation helpful? Give feedback.
7 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Hello,
Summer's the time for some long overdue maintenance, but I am having a puzzling issue with the most basic of my pipelines. Using spacy 2.3, I can train a classifier to about a 93% score over the course of 40-50 epochs. It quickly reaches about 70% about 2-3 and then slow evolution. In spacy 3.3, I cannot achieve even 50% regardless of the epochs, architecture or learning rate. My data has about 150 labels trained across about 8000, often large documents. It is normalized to provide a set of training and testing split at the classification level and then randomized. Lots of stuff below, but looking for ideas here. Eager to take advantage of new 3.3 capabilities but must first reestablish baseline. Has anyone experienced same?
Spacy 2.3 pipeline code is almost a cut and paste from old docs...
I am using a very similar normalization to generate the spacy binary files required by 3.3...
I've tried the fast/simple and the slow/accurate configs generated from docs. I settled on the textcat config in project/templates/textcat_demo. Using the ensemble model with embeddings seems closest to 2.3 defaults.
Working config...
Beta Was this translation helpful? Give feedback.
All reactions