Skip to content

Conversation

hushenwei2000
Copy link

No description provided.

Copy link

paddle-bot bot commented Sep 1, 2025

Thanks for your contribution!

$LAUNCH_CMD \
--run_mode=collective \
${script:-run_pretrain.py} \
$@
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

写个文档说明一下数据集制作,模型下载(比如FP8转BF16),以及如何训练

RowParallelQuantizationLinear,
)

try:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这块不用try保留原本逻辑就行

# if pp_first, the order = ["dp", "pp", "moe_sharding", "sharding", "sep", "ep", "mp"]
# if sharding_first, the order is ["dp", "moe_sharding", "sharding", "pp", "sep", "ep", "mp"]
order.insert(sd_idx, "moe_sharding")
if not os.getenv("DSV3_FAST_PRETRAIN", "False"):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里的处理太hack了,有没有更合理的传参命名不要用环境变量

fleet.init(is_collective=True, strategy=strategy)
logger.info(strategy)

if os.getenv("DSV3_FAST_PRETRAIN", "False"):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

同上

# limitations under the License.


import json
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个函数针对DSV3定制化的建议,放到example里面

attention_dropout=0.0,
speculate_model_type=False,
using_flex_token=False,
use_dualpipev=False,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

建议Config也写一个DeepseekV2FastConfig(DeepseekV2Config),保持原本类的简洁性


import contextlib
import math
import os
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

所有不通用修改都放到modeling_fast目录中,这也会给忠慧SFT模型组网迁入和后续我们组网通用模块改造带来困难

import paddle.nn as nn
from paddle.distributed.fleet.meta_parallel import (
LayerDesc,
LocalSharedLayerDesc,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

不要动原来的modeling_pp.py 写一个modeling_pp_fast.py

async_finish=False,
allocate_on_comm_stream=False,
recv_x, recv_token_probs, states, event = fused_dispatch_forward_func(
x, token_indices, token_probs, num_experts, group, previous_event
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这些都不要会影响其他模块的使用吗?

group_scores = (
scores_for_choice.reshape([bsz_seq_len, self.n_group, -1]).topk(2, axis=-1)[0].sum(axis=-1)
) # fmt:skip [n, n_group]
reshape_tmp_rst = scores_for_choice.reshape([bsz_seq_len, self.n_group, -1])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这些修改是否有影响

@lugimzzz
Copy link
Collaborator

lugimzzz commented Sep 4, 2025

DeepseekV3 Pretrained目前在PaddleFormers看起来引入了很多新组件,并且不够规范化,这块建议暂时把组网和新组件先放在example中,等功能ready完善后再合入PaddleFormers

@@ -0,0 +1,1678 @@
# Copyright (c) 2025 PaddlePaddle Authors. All Rights Reserved.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里应该就不需要fast文件了,原来的fast是为了在modeling.py以外单独修改不共用的模块,现在都放在modeling.py里面就可以了

export FLAGS_large_pool_pre_alloc_in_mb=61440
export FLAGS_deep_ep_comm_prealloc_in_mb=1000

export DSV3_FAST_PRETRAIN=true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里删掉


# mpirun sh script/kill_process.sh
# mpirun rm -rf output
nohup bash script/train_gpu.sh ./config/pretrain_argument.json --dsv3_fast_pretrain=True > run.log 2>&1 &
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dsv3_fast_pretrain放在config里吧

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

原来使用 DSV3_FAST_PRETRAIN 的地方是在 training_args.py 的 TrainingArguments 创建,创建是早于读取 config.json 的,所以这时候还没有 config 的内容

)

tokenizer = AutoTokenizer.from_pretrained(model_args.tokenizer_name_or_path, download_hub="huggingface")
config = DeepseekV2FastConfig.from_pretrained("./config/config.json")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

路径不要在这里写死了吧,像之前一样写在config里

# Flags for best performance
export FLAGS_share_tensor_for_grad_tensor_holder=1
export FLAGS_use_default_stream=false
export =false
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

删掉

# if pp_first, the order = ["dp", "pp", "moe_sharding", "sharding", "sep", "ep", "mp"]
# if sharding_first, the order is ["dp", "moe_sharding", "sharding", "pp", "sep", "ep", "mp"]
order.insert(sd_idx, "moe_sharding")
if not self.dsv3_fast_pretrain:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

self.dsv3_fast_pretrain 这里代码改动是因为什么导致的,建议不要用一个指定的模型名字来作为开关。比如如果是dualpipe需要开这个开关,那么命名为apply_dual_pipe之类?

@@ -0,0 +1,100 @@
# Copyright (c) 2025 PaddlePaddle Authors. All Rights Reserved.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

预训练需要可加回paddleformers/data目录中

@lugimzzz
Copy link
Collaborator

lugimzzz commented Sep 11, 2025

需要一个文档说明需要的硬件资源、cuda nccl版本要求、如何安装依赖、预训练数据集准备、训练启动、模型合参(如果需要)、还有模型如何转化回一个可推理的权重

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants