Obtain the size of training data used to build a model #1066
adam-ra
started this conversation in
New Features & Project Ideas
Replies: 2 comments
-
Hm. It's more about the relationship between the step size and the L2-norm of the weights, though. I agree that this sort of diagnostic would be easy and useful to dump from the training process into the meta, though. I'll keep this in mind, thanks. |
Beta Was this translation helpful? Give feedback.
0 replies
-
This feature is available on Prodigy's train-curve recipe. For more information on how to get training estimates on spaCy, you can see discussion #5639. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
It could be quite useful to be able to get (accurate or estimated, whatever) size of the training data that has been used to train a tagging and parsing model (I guess it also holds for NER).
This could be for instance available as
nlp.tagger.model.examples_seen
or sth alike (perhaps ameta
dict with more statistics if available).This would be useful to guesstimate the number of examples needed to post-train a tagger (as in #1015). Making the post-training work as expected is obviously more complex than repeating the same few training examples FRACTION * ORIGINAL_CORPUS_SIZE but it's still better than hardcoding an out-of-the-blue absolute number.
Beta Was this translation helpful? Give feedback.
All reactions