self.global_step understanding #8007

ajinkyaambatwar · 2021-06-16T17:33:41Z

ajinkyaambatwar
Jun 16, 2021

I had a doubt about the keyword self.global_step
Does it represent the total number of batches seen so far or total number of samples seen so far.
For eg. If I have batch size of 32 and each epoch has 200 batches(total of 6400 samples). After 2 epochs, what should be the value of self.global_step?

Now I intend to use this concept for scheduling my LR using LambdaLR. I intend to update it after 20000(decay_step) steps. So is my lr_lbmd inside configure_optimizer right? Or should I just use
int(self.global_step/self.hparams["optimizer.decay_step])?

def lr_lbmd(_):
            return max(
                self.hparams["optimizer.lr_decay"]
                ** (
                    int(
                        self.global_step
                        * self.hparams["batch_size"]
                        / self.hparams["optimizer.decay_step"]
                    )
                ),
                lr_clip / self.hparams["optimizer.lr"],
            )

awaelchli · 2021-06-18T10:15:52Z

awaelchli
Jun 18, 2021

self.global_step represents the current index of training batch across all epochs. If you are using multi-gpu, important detail: it represents only the counting for one gpu, not all batches combined for all gpus!

global step is incremented after each training step is fully processed and optimizer has stepped and logger has logged.

For eg. If I have batch size of 32 and each epoch has 200 batches(total of 6400 samples). After 2 epochs, what should be the value of self.global_step

It will be 200 * 2 and it means batches 0 to (200 * 2 - 1) have been processed (total 400) and the next batch will be the index 400.

2 replies

awaelchli Jun 18, 2021

If you set interval="step" in the scheduler dict returned from the configure_optimizers() hook, it will call your lr_scheduler.step() function every global step.

sevmag Nov 6, 2024

How does the global step behave for the validation period. I'm trying to log train_loss_step and val_loss_step with the WandbLogger . I tried to plot them on the same graph with global_step on the x-axis but for some reason the val_loss_step extends to far greater global_step values. I'm using log_every_n_steps=1 and val_check_interval=0.1 in the lightning Trainer. I have the suspicion that there is a different global_step used. For testing purposes I used 100 datasamples loaded with a batchsize 10.

JohnHerry · 2025-02-21T02:45:31Z

JohnHerry
Feb 21, 2025

Is the global_step means the number of mini-batches, or the number of weight-update operations? what is the change when we set accumulate_grad larger then One?

1 reply

ShandyDrm Feb 22, 2025

https://lightning.ai/docs/pytorch/2.5.0/common/lightning_module.html#global-step

The number of optimizer steps taken (does not reset each epoch). This includes multiple optimizers (if enabled).

I assume that if for example accumulate_grad_batches is set to 4, then global_step will increase after 4 batches.

stf976 · 2026-02-13T13:25:17Z

stf976
Feb 13, 2026

The value of global_step is the "number of optimizer steps taken (does not reset each epoch). This includes multiple optimizers (if enabled)."

(Source: https://lightning.ai/docs/pytorch/stable/common/lightning_module.html#global-step)

With automatic optimization (the default), there is one optimizer step after each training step.

With multiple optimizers, global_step increases accordingly.

If accumulate_grad_batches is used and set to e.g. 4, there is indeed only one global_step per 4 batches. This can be easily verified by logging the number of batches / training steps and global_step.

During validation, global_step does not increase. Again, this is confirmed by logging its value.

If you want to adjust the learning rate or checkpoint based on another metric, such as the total number of batches or samples, it might be easiest to log that metric yourself and monitor it in the learning rate scheduler or checkpoint callback.

The trainer already keeps track of the total number of batches. You could log it like this:

    def training_step(self, batch, batch_idx, dataloader_idx=0):
        loss = ...
        self.log('total_batch_idx', self.trainer.fit_loop.epoch_loop.total_batch_idx, on_step=True, on_epoch=False, logger=True)
        return loss

and access it in the lr scheduler config like this:

    def configure_optimizers(self):        
        optimizer = ...
        scheduler = ...

        lr_scheduler_config = {
            "scheduler": scheduler,
            "interval": "step",
            "frequency": 1,
            "monitor": "total_batch_idx",
        }

        return {
            'optimizer': optimizer,
            'lr_scheduler': lr_scheduler_config,
        }

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

self.global_step understanding #8007

Uh oh!

{{title}}

Uh oh!

Replies: 3 comments 3 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

self.global_step understanding #8007

Uh oh!

ajinkyaambatwar Jun 16, 2021

Replies: 3 comments · 3 replies

Uh oh!

awaelchli Jun 18, 2021

Uh oh!

awaelchli Jun 18, 2021

Uh oh!

sevmag Nov 6, 2024

Uh oh!

JohnHerry Feb 21, 2025

Uh oh!

ShandyDrm Feb 22, 2025

Uh oh!

stf976 Feb 13, 2026

ajinkyaambatwar
Jun 16, 2021

Replies: 3 comments 3 replies

awaelchli
Jun 18, 2021

JohnHerry
Feb 21, 2025

stf976
Feb 13, 2026