Skip to content

[AutoParallel] fix GPT embedding placements #10834

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: develop
Choose a base branch
from

Conversation

waliwali777
Copy link
Contributor

Before submitting

  • Lint code. If there are lint issues, please format the code first.
# Install and register `pre-commit` in the project folder
pip install pre-commit && pre-commit install

# Process previous code files separately
pre-commit run --file XXXX.py
  • Add test cases into tests folder. If there are codecov issues, please add tests cases first.

PR types

Others

PR changes

Others

Description

GPT 动半下,修改 placemetns 以提高性能
1、修改 embedding 和 lmhead 层 weight 的 placemetns 为列切
2、修改 GPTEmbeddingsAuto 层输出的 placemetns 为 replicate

  • 修改前:
    GPTEmbeddingsAuto 输出为 [Replicate(), Reshard(2)],导致每次进入 encoder 层的状态都为 [Replicate(), Reshard(2)],在进行 residual + dropout 计算时,有 [Replicate(), Reshard(2)] -> [Replicate(), Replicate()],这在前向和反向中会引入 allgather 通信
    GPT 单机8卡 H卡 最优配置 20layer 吞吐(token/card/s):7902.79

  • 修改后:
    GPTEmbeddingsAuto 输出为 [Replicate(), Replicate()], encoder 层输入的状态也为 replicate,在进行 residual + dropout 计算时,可以直接进行 replicate 计算,不引入额外通信
    GPT 单机8卡 H卡 最优配置 20layer 吞吐(token/card/s):8305.75(+402.96,+5.1%)

Copy link

paddle-bot bot commented Jul 10, 2025

Thanks for your contribution!

Copy link
Contributor

@liym27 liym27 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants