Skip to content
Discussion options

You must be logged in to vote

It looks like you're running into issues with the default corpus handling. By default all training data is read into memory to be shuffled. You can instead stream your data by setting max_epochs to -1, see here. If that doesn't fix things, let us know.

A couple of other things to check / be aware of:

You are using Python 3.6, which has reached end of life and is no longer supported. I don't think it's related to this issue at all but you should upgrade if possible.

Reporting the size of your training data is helpful, but you might want to check how long your longest document is - that is more important than the average for out of memory errors.

Replies: 1 comment

Comment options

You must be logged in to vote
0 replies
Answer selected by adrianeboyd
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
perf / memory Performance: memory use
2 participants
Converted from issue

This discussion was converted from issue #10623 on April 06, 2022 05:11.