feat(dsv3):Runnable N1C8 configs #2525

hushenwei2000 · 2025-09-01T03:49:18Z

No description provided.

paddle-bot · 2025-09-01T03:49:24Z

Thanks for your contribution!

lugimzzz · 2025-09-04T06:49:03Z

paddleformers/examples/deepseek_v3/script/train_gpu.sh

+    $LAUNCH_CMD \
+    --run_mode=collective \
+    ${script:-run_pretrain.py}  \
+    $@


写个文档说明一下数据集制作，模型下载（比如FP8转BF16），以及如何训练

lugimzzz · 2025-09-04T06:49:48Z

paddleformers/trainer/trainer.py

-    RowParallelQuantizationLinear,
-)

+try:


这块不用try保留原本逻辑就行

lugimzzz · 2025-09-04T06:51:56Z

paddleformers/trainer/training_args.py

-                        # if pp_first, the order = ["dp", "pp", "moe_sharding", "sharding", "sep", "ep", "mp"]
-                        # if sharding_first, the order is ["dp", "moe_sharding", "sharding", "pp", "sep", "ep", "mp"]
-                        order.insert(sd_idx, "moe_sharding")
+                    if not os.getenv("DSV3_FAST_PRETRAIN", "False"):


这里的处理太hack了，有没有更合理的传参命名不要用环境变量

lugimzzz · 2025-09-04T06:52:08Z

paddleformers/trainer/training_args.py

                fleet.init(is_collective=True, strategy=strategy)
                logger.info(strategy)

+                if os.getenv("DSV3_FAST_PRETRAIN", "False"):


lugimzzz · 2025-09-04T06:56:31Z

paddleformers/trainer/utils/load_hf_ckpt.py

+# limitations under the License.
+
+
+import json


这个函数针对DSV3定制化的建议，放到example里面

lugimzzz · 2025-09-04T07:04:08Z

paddleformers/transformers/deepseek_v2/configuration.py

        attention_dropout=0.0,
        speculate_model_type=False,
        using_flex_token=False,
+        use_dualpipev=False,


建议Config也写一个DeepseekV2FastConfig(DeepseekV2Config)，保持原本类的简洁性

lugimzzz · 2025-09-04T07:05:55Z

paddleformers/transformers/deepseek_v2/modeling.py


 import contextlib
 import math
+import os


所有不通用修改都放到modeling_fast目录中，这也会给忠慧SFT模型组网迁入和后续我们组网通用模块改造带来困难

lugimzzz · 2025-09-04T07:07:05Z

paddleformers/transformers/deepseek_v2/modeling_pp.py

 import paddle.nn as nn
 from paddle.distributed.fleet.meta_parallel import (
    LayerDesc,
+    LocalSharedLayerDesc,


不要动原来的modeling_pp.py 写一个modeling_pp_fast.py

lugimzzz · 2025-09-04T07:09:57Z

paddleformers/transformers/fused_a2a.py

-            async_finish=False,
-            allocate_on_comm_stream=False,
+        recv_x, recv_token_probs, states, event = fused_dispatch_forward_func(
+            x, token_indices, token_probs, num_experts, group, previous_event


这些都不要会影响其他模块的使用吗？

lugimzzz · 2025-09-04T07:10:12Z

paddleformers/transformers/moe_gate.py

-        group_scores = (
-            scores_for_choice.reshape([bsz_seq_len, self.n_group, -1]).topk(2, axis=-1)[0].sum(axis=-1)
-        )  # fmt:skip [n, n_group]
+        reshape_tmp_rst = scores_for_choice.reshape([bsz_seq_len, self.n_group, -1])


这些修改是否有影响

lugimzzz · 2025-09-04T07:11:45Z

DeepseekV3 Pretrained目前在PaddleFormers看起来引入了很多新组件，并且不够规范化，这块建议暂时把组网和新组件先放在example中，等功能ready完善后再合入PaddleFormers

…SV3_USE_ATTEN_RECOMPUTE DSV3_USE_FP8_DISPATCH USE_DS_GEMM into config.json

…3) move load_hf_ckpt

chen2016013 · 2025-09-09T02:57:58Z

paddleformers/examples/deepseek_v3/modeling_fast.py

@@ -0,0 +1,1678 @@
+# Copyright (c) 2025 PaddlePaddle Authors. All Rights Reserved.


这里应该就不需要fast文件了，原来的fast是为了在modeling.py以外单独修改不共用的模块，现在都放在modeling.py里面就可以了

chen2016013 · 2025-09-09T02:58:30Z

paddleformers/examples/deepseek_v3/script/train_gpu.sh

+export FLAGS_large_pool_pre_alloc_in_mb=61440
+export FLAGS_deep_ep_comm_prealloc_in_mb=1000
+
+export DSV3_FAST_PRETRAIN=true


这里删掉

chen2016013 · 2025-09-09T02:59:08Z

paddleformers/examples/deepseek_v3/run.sh

+
+# mpirun sh script/kill_process.sh 
+# mpirun rm -rf output
+nohup bash script/train_gpu.sh ./config/pretrain_argument.json --dsv3_fast_pretrain=True > run.log 2>&1 &


dsv3_fast_pretrain放在config里吧

原来使用 DSV3_FAST_PRETRAIN 的地方是在 training_args.py 的 TrainingArguments 创建，创建是早于读取 config.json 的，所以这时候还没有 config 的内容

chen2016013 · 2025-09-09T03:06:11Z

paddleformers/examples/deepseek_v3/run_pretrain.py

+            )
+
+    tokenizer = AutoTokenizer.from_pretrained(model_args.tokenizer_name_or_path, download_hub="huggingface")
+    config = DeepseekV2FastConfig.from_pretrained("./config/config.json")


路径不要在这里写死了吧，像之前一样写在config里

chen2016013 · 2025-09-09T03:14:30Z

paddleformers/examples/deepseek_v3/script/train_gpu.sh

+# Flags for best performance
+export FLAGS_share_tensor_for_grad_tensor_holder=1
+export FLAGS_use_default_stream=false
+export =false


lugimzzz · 2025-09-11T01:52:08Z

paddleformers/trainer/training_args.py

-                        # if pp_first, the order = ["dp", "pp", "moe_sharding", "sharding", "sep", "ep", "mp"]
-                        # if sharding_first, the order is ["dp", "moe_sharding", "sharding", "pp", "sep", "ep", "mp"]
-                        order.insert(sd_idx, "moe_sharding")
+                    if not self.dsv3_fast_pretrain:


self.dsv3_fast_pretrain 这里代码改动是因为什么导致的，建议不要用一个指定的模型名字来作为开关。比如如果是dualpipe需要开这个开关，那么命名为apply_dual_pipe之类？

lugimzzz · 2025-09-11T01:57:33Z

paddleformers/examples/deepseek_v3/config/__init__.py

@@ -0,0 +1,100 @@
+# Copyright (c) 2025 PaddlePaddle Authors. All Rights Reserved.


放在这个目录下https://github.com/PaddlePaddle/PaddleFormers/tree/develop/examples/experiments/deepseek_v3_pretrain

lugimzzz · 2025-09-11T02:00:51Z

paddleformers/examples/deepseek_v3/data/blendable_dataset.py

+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software


预训练需要可加回paddleformers/data目录中

lugimzzz · 2025-09-11T02:06:24Z

需要一个文档说明需要的硬件资源、cuda nccl版本要求、如何安装依赖、预训练数据集准备、训练启动、模型合参（如果需要）、还有模型如何转化回一个可推理的权重

…ove dsv3 code into new directory

update

4e79040

fix:(moe config): default using_flex_token

1b6c1f4

hushenwei2000 force-pushed the merge_dsv3_pr branch from a1c5bb4 to 1b6c1f4 Compare September 1, 2025 03:56

zhangbo9674 approved these changes Sep 1, 2025

View reviewed changes

hushenwei2000 added 2 commits September 2, 2025 02:20

doc(comment): fix code comment

0905f8c

doc(comment): fix code comment

65bb06d

lugimzzz reviewed Sep 4, 2025

View reviewed changes

hushenwei2000 added 4 commits September 8, 2025 02:44

(1) move all updates into example folder (2) move DSV3_USE_FP8_GEMM D…

788e712

…SV3_USE_ATTEN_RECOMPUTE DSV3_USE_FP8_DISPATCH USE_DS_GEMM into config.json

(1)recover bos download (2) move dsv3_fast_pretrain from env to arg (…

a733ccb

…3) move load_hf_ckpt

code format

49663f1

Merge branch 'develop' into merge_dsv3_pr

d0f203f

hushenwei2000 force-pushed the merge_dsv3_pr branch from a8b9ba6 to d0f203f Compare September 8, 2025 08:05

hushenwei2000 added 2 commits September 8, 2025 08:32

code format

c4446cc

add fa_version in config; fix code

0ff74b2

chen2016013 reviewed Sep 9, 2025

View reviewed changes

hushenwei2000 added 5 commits September 9, 2025 08:47

remove modeling_fast; move config into config file; format code

bf201bb

Merge branch 'develop' into merge_dsv3_pr

9164dbf

code format

b881da9

add old dataset

35d1b97

use DeepseekV2PretrainingCriterionFast

c99cbb7

lugimzzz reviewed Sep 11, 2025

View reviewed changes

1) replace "dsv3_fast_pretrain" with "reorder_pipeline_priority" 2) m…

e1b2801

…ove dsv3 code into new directory

		@@ -0,0 +1,1678 @@
		# Copyright (c) 2025 PaddlePaddle Authors. All Rights Reserved.

		@@ -0,0 +1,100 @@
		# Copyright (c) 2025 PaddlePaddle Authors. All Rights Reserved.

feat(dsv3):Runnable N1C8 configs #2525

Are you sure you want to change the base?

feat(dsv3):Runnable N1C8 configs #2525

Uh oh!

Conversation

hushenwei2000 commented Sep 1, 2025

Uh oh!

paddle-bot bot commented Sep 1, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lugimzzz commented Sep 4, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lugimzzz commented Sep 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

lugimzzz commented Sep 11, 2025 •

edited

Loading