textcat works or works better without balancing classes? #10089
-
Hello. I would like to ask three questions about "textcat":
I created two models of "textcat": 1) considering all the examples, 2) performing undersampling by removing examples from the majority classes randomly.
I then split 70% of the data for training and 30% for testing. For the sake of curiosity I compared the metrics for model evaluation in the "meta.json" file with the metrics that would be given using sklearn's "classification_report" function (using spaCy predictions on y_test). In case 1) the results are identical, in case 2) the results are different.
Thanks a lot! |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
Hi @astrouhiu ,
|
Beta Was this translation helpful? Give feedback.
Hi @astrouhiu ,