Skip to content

[QUESTION] Why is expert parallelism not supported during fp16 training? #810

@yutian-mt

Description

@yutian-mt
assert not args.model_parallel.fp16, \
            "Expert parallelism is not supported with fp16 training."

from

"Expert parallelism is not supported with fp16 training."

compared to the case when ep=1, the difference when ep>1 is that it introduces additional all-to-all communication operation. I'm a bit confused about why this setup does not support fp16 training.

Metadata

Metadata

Assignees

No one assigned

    Labels

    staleNo activity in 60 days on issue or PR

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions