Skip to content

Commit 14bbc70

Browse files
committed
optimize gpu memory
Signed-off-by: Huamin Chen <[email protected]>
1 parent a00717b commit 14bbc70

File tree

1 file changed

+3
-3
lines changed

1 file changed

+3
-3
lines changed

src/training/training_lora/mmlu_pro_solver_lora/ft_qwen3_mmlu_solver_lora_no_leakage.py

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -146,10 +146,10 @@
146146
], # NVIDIA's high-quality math with detailed CoT
147147
"description": "Advanced math problem-solving with chain-of-thought reasoning",
148148
"target_mmlu_categories": ["math", "physics", "engineering"],
149-
"max_length": 3584, # Optimized for multi-GPU with batch_size=2 + BF16
149+
"max_length": 3584, # Optimized for multi-GPU with batch_size=1 + BF16
150150
"max_new_tokens": 1536, # Matching shorter CoT for consistency
151-
"batch_size": 2, # Multi-GPU with BF16 - 2 samples per GPU
152-
"gradient_accumulation_steps": 8, # Effective batch = 2 × 8 × 4 GPUs = 64
151+
"batch_size": 1, # Reduced from 2 to avoid OOM with 3-4B models and long sequences
152+
"gradient_accumulation_steps": 16, # Effective batch = 1 × 16 × 4 GPUs = 64 (same effective batch)
153153
"filter_long_sequences": True, # Filter out samples > max_length to avoid truncated CoT
154154
"max_cot_char_length": 12000, # Pre-filter dataset to shorter CoT samples (~3000 tokens)
155155
"max_samples_multiplier": 20, # Load 20x more to compensate for char length filtering

0 commit comments

Comments
 (0)