Trainloss：Very high volatility in loss

Hello， thanks for your codes.  They are elegant and clear. These codes help me a lot.
I got a problem as the training loss performed very well about 0.001 at the beginning of the training.  
The default end epoch is set as 10000. But the training loss will get a surprising number about  "Training Loss : 325440.0592" at 2000+ epochs. I am curious. Have you ever encountered this issue before？
The training batch size is 96 with 4 GPUs with PyTorch.DDP.  Since the full training data set only includes about 4000 images, 4 GPUs only need about 10 iterations to end an epoch.  Do you think this is the reason?
Thanks for your codes.


 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Trainloss：Very high volatility in loss #13

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Trainloss：Very high volatility in loss #13

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions