spaCy debug data warning message interpretation #10647
-
I have a Thai dataset consisting of
I run
Could you please clarify the meaning of the warning message above? I have checked multiple times and I do not have any duplicates in the dataset, so I was wondering if it implies that train and eval docs are identical? Also, if 14 000 sentences are on the lower side of the number of examples, how much would be enough? |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 1 reply
-
Hi @kannaricci , can you double-check the command you ran? It's possible that the paths to the train and dev datasets are incorrect that's why You can configure this in the command-line by passing values to the |
Beta Was this translation helpful? Give feedback.
Hi @kannaricci , can you double-check the command you ran? It's possible that the paths to the train and dev datasets are incorrect that's why
debug data
showed that report. One giveaway is that the number of dev documents is different from what you mentioned (147) to what was reported (1173).You can configure this in the command-line by passing values to the
--paths.train
and--paths.dev
parameters, or by updating the config file directly.