What size should my corpus be? #10974
Answered
by
polm
moonman239
asked this question in
Help: Coding & Implementations
-
I'm thinking of building a corpus based on social media posts that are written in English. How large should the corpus be so that I can get a good resulting spaCy model? |
Beta Was this translation helpful? Give feedback.
Answered by
polm
Jun 17, 2022
Replies: 1 comment
-
It depends, but bigger is (almost always) better. There is no magic number or way to answer this perfectly in advance, but at a minimum you would need a few hundred examples, and if possible it's better to get thousands. This flowchart may be helpful. For more perspective, what kind of model are you trying to train, and what are you trying to label with it specifically? |
Beta Was this translation helpful? Give feedback.
0 replies
Answer selected by
polm
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
It depends, but bigger is (almost always) better. There is no magic number or way to answer this perfectly in advance, but at a minimum you would need a few hundred examples, and if possible it's better to get thousands. This flowchart may be helpful.
For more perspective, what kind of model are you trying to train, and what are you trying to label with it specifically?