Skip to content

Commit b72aa7e

Browse files
authored
[GPT-3/MOE] Adapt recompute for latest paddle (#3191)
1 parent 9729a47 commit b72aa7e

File tree

3 files changed

+3
-5
lines changed

3 files changed

+3
-5
lines changed

examples/language_model/gpt-3/dygraph/modeling.py

Lines changed: 1 addition & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1178,6 +1178,4 @@ def _logits_helper(embedding, output):
11781178
loss_fn=GPTPretrainingCriterionPipe(),
11791179
topology=topology,
11801180
seg_method="layer:TransformerDecoderLayer",
1181-
recompute_interval=1 if use_recompute else 0,
1182-
recompute_partition=False,
1183-
recompute_offload=False)
1181+
recompute_interval=1 if use_recompute else 0)

examples/language_model/gpt-3/dygraph/run.sh

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -23,5 +23,5 @@ python -m paddle.distributed.launch --log_dir $log_dir --gpus "0,1,2,3,4,5,6,7"
2323
--sharding_degree 1\
2424
--use_pure_fp16 True\
2525
--use_recompute False\
26-
--sharding_stage 2\
26+
--sharding_stage 1\
2727
--sharding_offload False

examples/language_model/moe/dygraph/modeling.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -456,7 +456,7 @@ def __init__(self,
456456
}
457457
self.moe_mlp = MoeLayer(d_model=d_model,
458458
experts=experts_list,
459-
gate_config=gate_config,
459+
gate=gate_config,
460460
moe_group=moe_group,
461461
mp_group=mp_group,
462462
recompute_interval=self.recompute_interval)

0 commit comments

Comments
 (0)