TextCat Training Error on Custom Preprocessed Dataset #12357

daffahilmyf · 2023-03-02T18:10:50Z

daffahilmyf
Mar 2, 2023

Hi

I'm new to machine learning and have been working with a dataset that I annotated using Prodigy. I trained a model using the CLI model training from Prodigy and everything ran smoothly.

However, I recently attempted to preprocess the dataset by applying some additional steps that altered the data. While there were no issues saving the preprocessed data to the Prodigy database, I encountered errors when trying to train the model using the following command:

python -m prodigy train ./training/spancat/test --spancat test --eval-split 0.25

The error message I received was:

⚠ Aborting and saving the final best model. Encountered exception:
ValueError('all sequence lengths must be >= 0')

I've attached links to the annotated and preprocessed dataset samples for reference. I'm hoping to get some advice on how to resolve these errors and improve the performance of my model with the preprocessed data.

Any insights or guidance would be greatly appreciated. Thanks in advance for your help!

Annotated data: https://gist.github.com/daffahilmyf/77cbd546f28070ca27048a7f0d88d1ed
Preprocessed data: https://gist.github.com/daffahilmyf/ef43e7bd79083f6a017a66ba7da4b8be

Answered by adrianeboyd

Mar 3, 2023

We've seen this error in the past when there was a bug related to docs without any suggestions, but this should be fixed in spacy v3.3.2 and v3.4.4. Can you double-check which version of spacy you are using (spacy info)?

View full answer

adrianeboyd · 2023-03-03T09:24:34Z

adrianeboyd
Mar 3, 2023

We've seen this error in the past when there was a bug related to docs without any suggestions, but this should be fixed in spacy v3.3.2 and v3.4.4. Can you double-check which version of spacy you are using (spacy info)?

2 replies

daffahilmyf Mar 3, 2023
Author

Thanks for your response. I really appreciate it! @adrianeboyd

Thanks to your help, I was able to get my code working on v3.3.2, but unfortunately not on v3.4.4. After downgrading the version, I'm now encountering a new error like this

or

I was wondering if you have a solution to this problem?

adrianeboyd Mar 6, 2023

We can't figure much out from these screenshots, let's keep this discussion in one place: https://support.prodi.gy/t/textcat-training-error-on-custom-preprocessed-dataset/6405

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

TextCat Training Error on Custom Preprocessed Dataset #12357

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 2 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

TextCat Training Error on Custom Preprocessed Dataset #12357

Uh oh!

daffahilmyf Mar 2, 2023

Replies: 1 comment · 2 replies

Uh oh!

adrianeboyd Mar 3, 2023

Uh oh!

daffahilmyf Mar 3, 2023 Author

Uh oh!

adrianeboyd Mar 6, 2023

daffahilmyf
Mar 2, 2023

Replies: 1 comment 2 replies

adrianeboyd
Mar 3, 2023

daffahilmyf Mar 3, 2023
Author