Skip to content

Trying to start from a saved state makes it start from zero again #336

@AdAstra-93

Description

@AdAstra-93

Hi, I'll describe what happened, I enable saving states before starting training a Flux LoRA. half-way during the training (in this case epoch8/16) I had to stop the training. I came back later and trying to resume the training got me the acceletate KeyError = 'step', which I solved following a couple past issues on this repo where they recommend downgrading Accelerate on the SD_Scripts folder to 0.31.

Now my issue is that the resuming starts, but reading the terminal I found these lines:

INFO     Could not load random states                checkpointing.py:254
INFO     Loading in 0 custom states                   accelerator.py:3135

The checkpointing.py file shows this block of code:

# Random states
    try:
        states = torch.load(input_dir.joinpath(f"{RNG_STATE_NAME}_{process_index}.pkl"))
        random.setstate(states["random_state"])
        np.random.set_state(states["numpy_random_seed"])
        torch.set_rng_state(states["torch_manual_seed"])
        if is_xpu_available():
            torch.xpu.set_rng_state_all(states["torch_xpu_manual_seed"])
        else:
            torch.cuda.set_rng_state_all(states["torch_cuda_manual_seed"])
        if is_torch_xla_available():
            xm.set_rng_state(states["xm_seed"])
        logger.info("All random states loaded successfully")
    except Exception:
        logger.info("Could not load random states")

And surely, the step count starts from 0% again and 0/1936 instead of 968/1936. The countdown shows the initial estimate as well (4 hours. Was already 2 hours training when I stopped midway). Why is not starting from the 50% epoch/steps? Why it couldn't load the random states file?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions