Skip to content

Confusion in training_step_end() API #9617

Discussion options

You must be logged in to vote

It's mentioned in the doc that this configuration works only for DP or DDP2, but in your code, you are using DDP so there will only be 1 loss item since gradient sync happens within DDP so each device has its own loss and backward call and won't require manual reduction of loss across devices.

Replies: 1 comment 2 replies

Comment options

You must be logged in to vote
2 replies
@hankyul2
Comment options

@meneshail
Comment options

Answer selected by akihironitta
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment