Skip to content

Commit 16d3c49

Browse files
[Auto Parallel] Support dynamic semi-auto training in Llama2 model (PaddlePaddle#7851)
* update * add ci --------- Co-authored-by: liuzhenhai93 <[email protected]>
1 parent 915fa43 commit 16d3c49

File tree

9 files changed

+3036
-17
lines changed

9 files changed

+3036
-17
lines changed

llm/llama/auto_parallel/run_pretrain_3D_auto.py

Lines changed: 723 additions & 0 deletions
Large diffs are not rendered by default.
Lines changed: 78 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,78 @@
1+
# Copyright (c) 2023 PaddlePaddle Authors. All Rights Reserved.
2+
#
3+
# Licensed under the Apache License, Version 2.0 (the "License");
4+
# you may not use this file except in compliance with the License.
5+
# You may obtain a copy of the License at
6+
#
7+
# http://www.apache.org/licenses/LICENSE-2.0
8+
#
9+
# Unless required by applicable law or agreed to in writing, software
10+
# distributed under the License is distributed on an "AS IS" BASIS,
11+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12+
# See the License for the specific language governing permissions and
13+
# limitations under the License.
14+
15+
# just for debug
16+
17+
set -x
18+
unset CUDA_VISIBLE_DEVICES
19+
20+
export FLAGS_call_stack_level=3
21+
export FLAGS_use_cuda_managed_memory=true
22+
23+
task_name="llama_auto_dp2mp2pp2"
24+
rm -rf output/$task_name/
25+
rm -rf "output/$task_name""_log"
26+
27+
export SOT_LOG_LEVEL=4
28+
export PYTHONPATH=../../../:$PYTHONPATH
29+
#ulimit -c unlimited
30+
#export GLOG_v=10
31+
32+
rm -rf log_auto
33+
34+
export FLAGS_embedding_deterministic=1
35+
export FLAGS_cudnn_deterministic=1
36+
export NVIDIA_TF32_OVERRIDE=0
37+
38+
python3.8 -u -m paddle.distributed.launch \
39+
--gpus "0,1,2,3,4,5,6,7" \
40+
--log_dir "auto_3d" \
41+
run_pretrain_3D_auto.py \
42+
--model_type "llama" \
43+
--model_name_or_path "facebook/llama-7b" \
44+
--tokenizer_name_or_path "facebook/llama-7b" \
45+
--input_dir "./data" \
46+
--output_dir "output/$task_name" \
47+
--split 949,50,1 \
48+
--max_seq_length 2048 \
49+
--per_device_train_batch_size 1 \
50+
--per_device_eval_batch_size 2 \
51+
--gradient_accumulation_steps 2 \
52+
--use_flash_attention 0 \
53+
--use_fused_rms_norm 1 \
54+
--fp16 0 \
55+
--fp16_opt_level "O2" \
56+
--scale_loss 1024 \
57+
--pipeline_parallel_degree 2 \
58+
--tensor_parallel_degree 2 \
59+
--sharding_parallel_degree 1 \
60+
--learning_rate 0.0001 \
61+
--min_learning_rate 0.00001 \
62+
--max_steps 20000 \
63+
--save_steps 5000000 \
64+
--weight_decay 0.01 \
65+
--warmup_ratio 0.01 \
66+
--logging_steps 1\
67+
--dataloader_num_workers 1 \
68+
--sharding "" \
69+
--eval_steps 1000000 \
70+
--disable_tqdm true \
71+
--continue_training 0\
72+
--recompute 0 \
73+
--do_train \
74+
--do_eval \
75+
--device "gpu" \
76+
--data_impl "mmap" \
77+
--parallel_mode "auto" \
78+
--max_grad_norm 1.0 \

0 commit comments

Comments
 (0)