Understanding training output for textcat_multilabel - steps vs epochs #10343

NixBiks · 2022-02-21T12:26:08Z

NixBiks
Feb 21, 2022

Question original asked in prodigy support.

I'm trying to understand the training output for textcat_multilabel.

As I understand then an epoch means one iteration over all of the training data. In my case that's 24.404 documents. The training part of my config looks like this

[training]
dev_corpus = "corpora.dev"
train_corpus = "corpora.train"
seed = ${system.seed}
gpu_allocator = ${system.gpu_allocator}
dropout = 0.1
accumulate_gradient = 1
patience = 20000
max_epochs = 0
max_steps = 60000
eval_frequency = 800
frozen_components = []
annotating_components = []
before_to_disk = null

[training.batcher]
@batchers = "spacy.batch_by_words.v1"
discard_oversize = false
tolerance = 0.2
get_length = null

[training.batcher.size]
@schedules = "compounding.v1"
start = 100
stop = 1000
compound = 1.001
t = 0.0

[training.logger]
@loggers = "spacy.ConsoleLogger.v1"
progress_bar = false

[training.optimizer]
@optimizers = "Adam.v1"
beta1 = 0.9
beta2 = 0.999
L2_is_weight_decay = true
L2 = 0.01
grad_clip = 1.0
use_averages = false
eps = 0.00000001
learn_rate = 0.001

[training.score_weights]
cats_score = 1.0
cats_score_desc = null
cats_micro_p = null
cats_micro_r = null
cats_micro_f = null
cats_macro_p = null
cats_macro_r = null
cats_macro_f = null
cats_macro_auc = null
cats_f_per_type = null
cats_macro_auc_per_type = null

As I understand spacy.batch_by_words.v1 I have increasing batching size. Is it correctly understood that the documents are batched together into batches of a total size between a 100 words and 1000 words (unless the document is more than the current batch size then it'll be its own batch)? Assuming all my documents were more than 1000 words then I'd have a batch size of 1?

Now I started wondering about this because I saw that at step 16800 I reached the next epoch, which leaves me with an average batch size of 24404 / 16800 = ~1.45. Is that right? In general my documents are pretty big but performance are good so I don't need to chop it into smaller docs and average over. But maybe I could benefit from fiddling with the batching strategy. Any comments on that?

Answered by polm

Mar 6, 2022

As I understand spacy.batch_by_words.v1 I have increasing batching size. Is it correctly understood that the documents are batched together into batches of a total size between a 100 words and 1000 words (unless the document is more than the current batch size then it'll be its own batch)? Assuming all my documents were more than 1000 words then I'd have a batch size of 1?

That is correct.

Now I started wondering about this because I saw that at step 16800 I reached the next epoch, which leaves me with an average batch size of 24404 / 16800 = ~1.45. Is that right?

I'm not sure what you mean. If your units are "whole passes over the training data", then batch size is 1 by definition.

N…

View full answer

polm · 2022-03-06T06:23:27Z

polm
Mar 6, 2022

As I understand spacy.batch_by_words.v1 I have increasing batching size. Is it correctly understood that the documents are batched together into batches of a total size between a 100 words and 1000 words (unless the document is more than the current batch size then it'll be its own batch)? Assuming all my documents were more than 1000 words then I'd have a batch size of 1?

That is correct.

Now I started wondering about this because I saw that at step 16800 I reached the next epoch, which leaves me with an average batch size of 24404 / 16800 = ~1.45. Is that right?

I'm not sure what you mean. If your units are "whole passes over the training data", then batch size is 1 by definition.

Note that if you have batch size 1000 in literal docs, you'll see output like this:

#     E
900   0
1000  1

If you interpret this as meaning that the first epoch was actually 900 iterations, that would be wrong. It might help to think of the iterations column as "iterations finished".

In general my documents are pretty big but performance are good so I don't need to chop it into smaller docs and average over. But maybe I could benefit from fiddling with the batching strategy. Any comments on that?

The main reason to adjust batch size is for a training speed / memory use tradeoff. There's a lot of research into which batch sizes are better, but in the majority of cases I would expect minimal effects on accuracy after changing batch size. You can always try and see though.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Understanding training output for textcat_multilabel - steps vs epochs #10343

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

Understanding training output for textcat_multilabel - steps vs epochs #10343

Uh oh!

NixBiks Feb 21, 2022

Replies: 1 comment

Uh oh!

polm Mar 6, 2022

NixBiks
Feb 21, 2022

polm
Mar 6, 2022