Skip to content

Commit fa21d3c

Browse files
authored
fix: Added sequence packing keys to SFT and GRPO recipes (#805)
Signed-off-by: Ahmad Kiswani <kiswani.ahmad@gmail.com>
1 parent 84abe2c commit fa21d3c

13 files changed

+47
-0
lines changed

examples/configs/recipes/llm/grpo-gemma3-1b-it-1n8g-fsdp2tp1.yaml

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -49,6 +49,10 @@ policy:
4949
sequence_length_round: 64
5050
sequence_packing:
5151
enabled: false
52+
train_mb_tokens: ${mul:${policy.max_total_sequence_length}, ${policy.train_micro_batch_size}}
53+
logprob_mb_tokens: ${mul:${policy.max_total_sequence_length}, ${policy.logprob_batch_size}}
54+
algorithm: "modified_first_fit_decreasing"
55+
sequence_length_round: 64
5256
make_sequence_length_divisible_by: 1
5357
max_grad_norm: 1
5458
optimizer:

examples/configs/recipes/llm/grpo-gemma3-27b-it-16n8g-fsdp2tp8sp-actckpt-long.yaml

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -50,6 +50,10 @@ policy:
5050
sequence_length_round: 64
5151
sequence_packing:
5252
enabled: false
53+
train_mb_tokens: ${mul:${policy.max_total_sequence_length}, ${policy.train_micro_batch_size}}
54+
logprob_mb_tokens: ${mul:${policy.max_total_sequence_length}, ${policy.logprob_batch_size}}
55+
algorithm: "modified_first_fit_decreasing"
56+
sequence_length_round: 64
5357
make_sequence_length_divisible_by: 8
5458
max_grad_norm: 1
5559
optimizer:

examples/configs/recipes/llm/grpo-llama3.1-8b-instruct-4n8g-fsdp2tp1-long.v3.yaml

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -49,6 +49,10 @@ policy:
4949
sequence_length_round: 64
5050
sequence_packing:
5151
enabled: false
52+
train_mb_tokens: ${mul:${policy.max_total_sequence_length}, ${policy.train_micro_batch_size}}
53+
logprob_mb_tokens: ${mul:${policy.max_total_sequence_length}, ${policy.logprob_batch_size}}
54+
algorithm: "modified_first_fit_decreasing"
55+
sequence_length_round: 64
5256
make_sequence_length_divisible_by: 1
5357
max_grad_norm: 1
5458
optimizer:

examples/configs/recipes/llm/grpo-llama3.2-1b-instruct-1n8g-fsdp2tp1.v3.yaml

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -49,6 +49,10 @@ policy:
4949
sequence_length_round: 64
5050
sequence_packing:
5151
enabled: false
52+
train_mb_tokens: ${mul:${policy.max_total_sequence_length}, ${policy.train_micro_batch_size}}
53+
logprob_mb_tokens: ${mul:${policy.max_total_sequence_length}, ${policy.logprob_batch_size}}
54+
algorithm: "modified_first_fit_decreasing"
55+
sequence_length_round: 64
5256
make_sequence_length_divisible_by: 1
5357
max_grad_norm: 1
5458
optimizer:

examples/configs/recipes/llm/grpo-qwen2.5-32b-32n8g-fsdp2tp8sp-actckpt-long.v3.yaml

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -49,6 +49,10 @@ policy:
4949
sequence_length_round: 64
5050
sequence_packing:
5151
enabled: false
52+
train_mb_tokens: ${mul:${policy.max_total_sequence_length}, ${policy.train_micro_batch_size}}
53+
logprob_mb_tokens: ${mul:${policy.max_total_sequence_length}, ${policy.logprob_batch_size}}
54+
algorithm: "modified_first_fit_decreasing"
55+
sequence_length_round: 64
5256
make_sequence_length_divisible_by: 8
5357
max_grad_norm: 1
5458
optimizer:

examples/configs/recipes/llm/grpo-qwen2.5-32b-32n8g-fsdp2tp8sp-actckpt.v3.yaml

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -49,6 +49,10 @@ policy:
4949
sequence_length_round: 64
5050
sequence_packing:
5151
enabled: false
52+
train_mb_tokens: ${mul:${policy.max_total_sequence_length}, ${policy.train_micro_batch_size}}
53+
logprob_mb_tokens: ${mul:${policy.max_total_sequence_length}, ${policy.logprob_batch_size}}
54+
algorithm: "modified_first_fit_decreasing"
55+
sequence_length_round: 64
5256
make_sequence_length_divisible_by: 8
5357
max_grad_norm: 1
5458
optimizer:

examples/configs/recipes/llm/grpo-qwen2.5-7b-instruct-4n8g-fsdp2tp4sp.v3.yaml

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -49,6 +49,10 @@ policy:
4949
sequence_length_round: 64
5050
sequence_packing:
5151
enabled: false
52+
train_mb_tokens: ${mul:${policy.max_total_sequence_length}, ${policy.train_micro_batch_size}}
53+
logprob_mb_tokens: ${mul:${policy.max_total_sequence_length}, ${policy.logprob_batch_size}}
54+
algorithm: "modified_first_fit_decreasing"
55+
sequence_length_round: 64
5256
make_sequence_length_divisible_by: 4
5357
max_grad_norm: 1
5458
optimizer:

examples/configs/recipes/llm/grpo-qwen2.5-math-1.5b-instruct-1n8g-fsdp2tp1.v3.yaml

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -49,6 +49,10 @@ policy:
4949
sequence_length_round: 64
5050
sequence_packing:
5151
enabled: false
52+
train_mb_tokens: ${mul:${policy.max_total_sequence_length}, ${policy.train_micro_batch_size}}
53+
logprob_mb_tokens: ${mul:${policy.max_total_sequence_length}, ${policy.logprob_batch_size}}
54+
algorithm: "modified_first_fit_decreasing"
55+
sequence_length_round: 64
5256
make_sequence_length_divisible_by: 1
5357
max_grad_norm: 1
5458
optimizer:

examples/configs/recipes/llm/sft-llama3.1-8b-instruct-1n8g-fsdp2tp1-long.v2.yaml

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -35,6 +35,9 @@ policy:
3535
enabled: false
3636
sequence_packing:
3737
enabled: false
38+
train_mb_tokens: ${mul:${policy.max_total_sequence_length}, ${policy.train_micro_batch_size}}
39+
algorithm: "modified_first_fit_decreasing"
40+
sequence_length_round: 64
3841
make_sequence_length_divisible_by: 1
3942
max_grad_norm: 1
4043
optimizer:

examples/configs/recipes/llm/sft-llama3.1-8b-instruct-1n8g-fsdp2tp2sp.v2.yaml

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -35,6 +35,9 @@ policy:
3535
enabled: false
3636
sequence_packing:
3737
enabled: false
38+
train_mb_tokens: ${mul:${policy.max_total_sequence_length}, ${policy.train_micro_batch_size}}
39+
algorithm: "modified_first_fit_decreasing"
40+
sequence_length_round: 64
3841
make_sequence_length_divisible_by: 2
3942
max_grad_norm: 1
4043
optimizer:

0 commit comments

Comments
 (0)