Skip to content

ulysses, num_attention_head整除问题 #5

@xs1997zju

Description

@xs1997zju

hello, 看论文和代码描述, 序列并行使用的是DeepSpeed-Ulysses, 并行度设置为world size, 但DeepSpeed-Ulysses并行需要num_attn_heads能够整除并行度, 如果是单机8卡, 而qwen2.5的num_attn_heads是28, 这个也能跑起来?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions