Skip to content
Discussion options

You must be logged in to vote

As I understand spacy.batch_by_words.v1 I have increasing batching size. Is it correctly understood that the documents are batched together into batches of a total size between a 100 words and 1000 words (unless the document is more than the current batch size then it'll be its own batch)? Assuming all my documents were more than 1000 words then I'd have a batch size of 1?

That is correct.

Now I started wondering about this because I saw that at step 16800 I reached the next epoch, which leaves me with an average batch size of 24404 / 16800 = ~1.45. Is that right?

I'm not sure what you mean. If your units are "whole passes over the training data", then batch size is 1 by definition.

N…

Replies: 1 comment

Comment options

You must be logged in to vote
0 replies
Answer selected by adrianeboyd
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
training Training and updating models feat / textcat Feature: Text Classifier
2 participants