Skip to content
Discussion options

You must be logged in to vote

A corpus is a dataset containing gold-standard annotations to train (and evaluate) a ML model. Typically, a Corpus registered function is used, as seems to be the case in your config (as I can deduce from the error message).

It looks like your config probably contains a section like this:

[corpora.dev]
@readers = "spacy.Corpus.v1"
path = ${paths.dev}
max_length = 0
...

This means that you'll need to define the variable dev from the section [paths]. You can keep this at null and override on the CLI, exactly as you've done for the variable train. So you'll need something like this:

python -m spacy train config.cfg --paths.train NER_dataset_train.spacy --paths.dev NER_dataset_dev.spacy --ou…

Replies: 7 comments 2 replies

Comment options

You must be logged in to vote
0 replies
Answer selected by svlandeg
Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
2 replies
@blizaga
Comment options

@svlandeg
Comment options

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🌙 nightly Discussion and contributions related to nightly builds feat / training Feature: Training utils, Example, Corpus and converters
4 participants
Converted from issue

This discussion was converted from issue #6472 on December 13, 2020 01:25.