Multiple optimizers but only one loss #5813

selflein · 2019-10-21T17:02:11Z

selflein
Oct 21, 2019

Hey! I have a question regarding this library. Really, like how it forces me to structure my code better. I encountered one problem I did not know how to solve based on the documentation.

Let's say I have two optimizers for two parts of the network, e.g. my configure_optimizers() looks like this:

    def configure_optimizers(self):
        optimizer_encoder = optim.Adam(self.encoder.parameters(), ...)
        optimizer_decoder = optim.Adam(self.decoder.parameters(), ...)
        return [optimizer_encoder, optimizer_decoder]

now in the training loop I forward pass the encoder, then the decoder and compute my loss based on the output:

    def training_step(self, batch, batch_nb, optimizer_idx):
        inp, gt = ...

        encoding = self.encoder(inp)
        pred = self.decoder(encoding)

        loss = F.mse_loss(pred, gt)
        return {'loss': loss}

Since I have two optimizers I have to respect that this function is called two times with different optimizer_idx however I just have one loss to backprop. How would I go about this?

What have you tried?

I tried something like this

    def training_step(self, batch, batch_nb, optimizer_idx):
        if optimizer_idx == 1:
              return {}
        inp, gt = ...

        encoding = self.encoder(inp)
        pred = self.decoder(encoding)

        loss = F.mse_loss(pred, gt)
        return {'loss': loss}

However, this leads to an error since no loss key is present in trainer.py:1392.

Answered by williamFalcon

Oct 22, 2019

in that case, just pass in both sets of params to a single optimizer

View full answer

williamFalcon · 2019-10-22T02:11:09Z

williamFalcon
Oct 22, 2019
Maintainer

in that case, just pass in both sets of params to a single optimizer

0 replies

selflein · 2019-10-22T07:43:26Z

selflein
Oct 22, 2019
Author

But I explicitly want two different learning rates for different parts of the network. That is not really possible with a single optimizer AFAIK. One possibility could be to scale gradients on the weights for which I want lower learning rate before running the optimizer but that is really not a a clean solution.

0 replies

amatsukawa · 2019-10-22T19:07:59Z

amatsukawa
Oct 22, 2019

It is possible with parameter groups using a single optimizer. Your use-case is actually the example in the docs: https://pytorch.org/docs/stable/optim.html#per-parameter-options

0 replies

selflein · 2019-10-22T20:41:33Z

selflein
Oct 22, 2019
Author

That’s nice. Thank you for hint!

0 replies

abhijeetdhakane · 2022-09-15T17:10:17Z

abhijeetdhakane
Sep 15, 2022

Is it possible #14728?

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Multiple optimizers but only one loss #5813

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 5 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Multiple optimizers but only one loss #5813

Uh oh!

Uh oh!

selflein Oct 21, 2019

What have you tried?

Replies: 5 comments

Uh oh!

williamFalcon Oct 22, 2019 Maintainer

Uh oh!

selflein Oct 22, 2019 Author

Uh oh!

amatsukawa Oct 22, 2019

Uh oh!

selflein Oct 22, 2019 Author

Uh oh!

abhijeetdhakane Sep 15, 2022

selflein
Oct 21, 2019

williamFalcon
Oct 22, 2019
Maintainer

selflein
Oct 22, 2019
Author

amatsukawa
Oct 22, 2019

selflein
Oct 22, 2019
Author

abhijeetdhakane
Sep 15, 2022