Skip to content
Discussion options

You must be logged in to vote

Hi @mandar-avhad , in general, spaCy accepts the standard train / dev / test split during training and evaluation. This means that you have to split your data, preferably in a serialized DocBin format, beforehand. You can check a sample conversion script here.

If you want to do stratified splits, you can implement a custom Corpus that gives you the correct batch during training. You can also check out an example project that does cross-validation (note: it may not be the most efficient solution).

Replies: 1 comment 2 replies

Comment options

You must be logged in to vote
2 replies
@mandar-avhad
Comment options

@ljvmiranda921
Comment options

Answer selected by adrianeboyd
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
training Training and updating models feat / ner Feature: Named Entity Recognizer
2 participants