Skip to content

nn.Graph Tutorial of ZeRO #450

@strint

Description

@strint

参考测试用例

https://github.com/Oneflow-Inc/oneflow/blob/master/python/oneflow/test/graph/test_graph_zero.py

API

接口1:
对应stage 1

https://oneflow.readthedocs.io/en/master/graph.html#oneflow.nn.graph.graph_config.GraphConfig.set_zero_redundancy_optimizer_mode

接口2:
对应zero stage 2

需要再打开下 flow.boxing.nccl.enable_use_compute_stream(True)

接口3:
set_zero_redundancy_optimizer_min_size_after_split 可以不打开,这个是设置参数分片的最小大小的,测试用例的参数比较小,为了保证一定做切分,才设置的

使用场景

zero目前约束是和数据并行,即参数都为Broadcast结合使用,也可以和流水并行结合使用也要求参数都为Broadcast;不能和模型并行使用,即参数为S。

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions