Training, dev and test data #10176
NixBiks
started this conversation in
Help: Best practices
Replies: 1 comment
-
Hello, |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
It is pretty common to have three datasets. Some for training, some for dev for parameter tuning and finally a set for testing/evaluation.
I'm a little confused about the best practice split for spacy. In docs you usually have
train.spacy
anddev.spacy
- I assume those are intended as I just mentioned? But I don't really see any docs on how to integrate test data (to be able to see how new models perform on the same unseen data as previous models).Currently I have split some prodigy datasets into training and evaluation. Then I use
prodigy train
on the training data with 20% eval split e.g.. Is that reasonable? When I useprodigy train
then my config is auto generated as a good starting point, but I'd like to integrate the test data as well. I.e. generate a classification report based on the test data.Any pointers would be much appreciated
Beta Was this translation helpful? Give feedback.
All reactions