Training progress management #7176
              
                Unanswered
              
          
                  
                    
                      Nomination-NRB
                    
                  
                
                  asked this question in
                Q&A
              
            Replies: 0 comments
  
    Sign up for free
    to join this conversation on GitHub.
    Already have an account?
    Sign in to comment
  
        
    
Uh oh!
There was an error while loading. Please reload this page.
-
How to do training progress management, because I have a lot of pictures, if it takes a long time to train 10 epochs, if I can't continue the last training after interruption, should the training progress standard be managed by epoch or step
***** Running training *****
Num examples = 10445
Num batches each epoch = 10445
Num Epochs = 10
Instantaneous batch size per device = 1
Total train batch size (w. parallel, distributed & accumulation) = 1
Gradient Accumulation steps = 1
Total optimization steps = 104450
Beta Was this translation helpful? Give feedback.
All reactions