It is not an issue. I just want to ask why do you use running_loss = running_loss*0.9 + loss.item()*0.1 for monitoring the loss during training?
Do you have any special reason for this?
Isnt it conventional to monitor the average loss after each epoch (in this case, after each iteration)?