Skip to content

Checkpointing is not compatible with .grad(), please use .backward() if possible #2758

@chengmengli06

Description

@chengmengli06

when training deepseek v3 model using cuda_graph, it reports the following error:
Checkpointing is not compatible with .grad(), please use .backward() if possible

setting the following args:
--recompute-granularity selective
--recompute-modules mla_up_proj mlp

--cuda-graph-impl transformer_engine
--cuda-graph-scope full

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions