Skip to content

Commit 3b90f68

Browse files
authored
Fix DeepSeek-V2 sequence packing sft (#444)
* Enhance DeepSeek-V2 236B mcore to hf conversion * Fix DeepSeek-V2 sequence packing sft --------- Co-authored-by: 同润 <jerry.lp@alibaba-inc.com>
1 parent 2ccaab2 commit 3b90f68

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

examples/deepseek_v2/pretrain_deepseek.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -102,7 +102,7 @@ def get_batch(data_iterator):
102102
if args.train_mode == "pretrain":
103103
batch = get_batch_on_this_tp_rank(data_iterator)
104104
else:
105-
batch = get_batch_on_this_tp_rank_idxmap_sft(data_iterator)
105+
batch = get_batch_on_this_tp_rank_idxmap_sft(data_iterator, per_seq_average=True)
106106

107107
packed_seq_params = None
108108
if args.reset_position_ids:

0 commit comments

Comments
 (0)