Skip to content

Checkpointing crashes with ZeRO optimizer #96

@norabelrose

Description

@norabelrose

Describe the bug
Checkpointing crashes when --zero is set, with the error RuntimeError: Tensors must be CUDA and dense being thrown inside the method consolidate_state_dict()

Expected behavior
Shouldn't crash

Screenshots
Captura de pantalla 2023-05-14 a la(s) 12 01 03 p m

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions