'en_core_web_trf' optimal optimizer's learning rate and number of training epochs? #7066

traceymai · 2021-02-15T01:16:26Z

traceymai
Feb 15, 2021

Hi guys, I've been experimenting with spaCy v3.0's pretrained models en_core_web_trf and en_core_web_lg (in particular I was using the textcat pipeline for sentiment analysis). From experimenting with learning rate settings (my optimizer is Adam v1) of 0.01 and 0.001, my transformers model's accuracy increased from 76% to 80% with 10 training epochs, which in my opinion is pretty good. However I tried to switch up the number of training epochs abit to 15 (loss has been decreasing throughout all 15 training epochs, which I'd think is good?), and the accuracy surprisingly dropped to 78% again. I was just wondering if in your experience there's any way to determine (or if there exists) an optimal learning rate and an optimal number of training epochs I should train this trf model for? Any idea is greatly welcome!

traceymai · 2021-02-15T01:43:11Z

traceymai
Feb 15, 2021
Author

Also, the reason I had to change the default learning rate for this model is that if I just get my optimizer from optimizer = nlp.resume_training() and apply training for 10 epochs, the training loss doesn't change at all, which I found out results from the default training rate for en_core_web_trf being 0.0.

4 replies

honnibal Feb 15, 2021
Maintainer

The default learning rate actually uses a schedule, perhaps you forgot to call optimizer.step_schedules() in the training loop? You can also use the spacy.train.loop.train_while_improving function if you don't want to keep track of all these details.

Generally for transformers I've found that I need a pretty low learning rate, but I guess it depends on the problem. I've also found that parameter averaging helps keep the learning consistent. In any case, it's not unexpected that more epochs can harm accuracy on the held-out data.

traceymai Feb 15, 2021
Author

Hey thanks for your reply. Would you mind shedding some light on whether this schedule is pre-defined or if I need to define it anywhere in the training loop, or where to call optimizer.step_schedules()

honnibal Feb 16, 2021
Maintainer

It's defined when the optimizer is created, which often happens via the config. Here's the docs for the optimizer class: https://thinc.ai/docs/usage-training#schedules . You might find the training config docs useful too: https://spacy.io/usage/training#config-custom

traceymai Feb 16, 2021
Author

Thanks, I'm currently trying to finetune the model through a Python script but I'll definitely look into using the config system :)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

'en_core_web_trf' optimal optimizer's learning rate and number of training epochs? #7066

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 4 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

'en_core_web_trf' optimal optimizer's learning rate and number of training epochs? #7066

Uh oh!

traceymai Feb 15, 2021

Replies: 1 comment · 4 replies

Uh oh!

traceymai Feb 15, 2021 Author

Uh oh!

honnibal Feb 15, 2021 Maintainer

Uh oh!

traceymai Feb 15, 2021 Author

Uh oh!

honnibal Feb 16, 2021 Maintainer

Uh oh!

traceymai Feb 16, 2021 Author

traceymai
Feb 15, 2021

Replies: 1 comment 4 replies

traceymai
Feb 15, 2021
Author

honnibal Feb 15, 2021
Maintainer

traceymai Feb 15, 2021
Author

honnibal Feb 16, 2021
Maintainer

traceymai Feb 16, 2021
Author