-
Notifications
You must be signed in to change notification settings - Fork 170
Reduce eagle example test memory usage from 28 to 1 GB #299
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Keval Morabia <[email protected]>
WalkthroughAdds TensorBoard reporting to the speculative decoding launch script. Updates a speculative decoding test to use a tiny EAGLE config passed via file and adjusts training sequence length and output path. Removes a skip guard in a Megatron export test so eagle-path tests run regardless of optional dependency presence. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant T as PyTest
participant M as Unified Export Test
participant D as Optional Dep (megatron.core.post_training)
participant C as Conversion Logic
rect rgba(200,200,255,0.2)
note right of T: Old flow (before)
T->>M: Start eagle-path test
M->>D: Attempt import
alt ImportError
M-->>T: skip test
else Import OK
M->>C: Run conversion
C-->>M: Result
M-->>T: Assert outcomes
end
end
rect rgba(200,255,200,0.2)
note right of T: New flow (now)
T->>M: Start eagle-path test
M->>C: Run conversion (no pre-check)
alt Dependency missing
C-->>M: Error raised
M-->>T: Test fails (error)
else OK
C-->>M: Result
M-->>T: Assert outcomes
end
end
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Poem
✨ Finishing Touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (3)
examples/speculative_decoding/launch.sh (1)
92-95
: Guard division-by-zero when no GPUs are visibleIf
torch.cuda.device_count()
returns 0,192 / GPU_COUNT
errors. Add a safe fallback.-GPU_COUNT=$(python -c "import torch; print(torch.cuda.device_count())") -DEFAULT_SAVE_STEPS=$((192 / GPU_COUNT)) +GPU_COUNT=$(python -c "import torch; print(torch.cuda.device_count())") +if [[ "$GPU_COUNT" -gt 0 ]]; then + DEFAULT_SAVE_STEPS=$((192 / GPU_COUNT)) +else + DEFAULT_SAVE_STEPS=192 +fitests/examples/speculative_decoding/test_eagle.py (2)
33-36
: Make the config file easier to debugPretty-print JSON for quick inspection on failures.
- with open(config_file, "w") as f: - json.dump(tiny_eagle_config, f) + with open(config_file, "w") as f: + json.dump(tiny_eagle_config, f, indent=2, sort_keys=True)
45-50
: Stabilize test memory and launch mode across environments
- Force single-GPU to keep memory predictable (CI hosts with >1 GPU currently trigger
--multi_gpu
via launch.sh).- Lower train batch size to 1 to further cap memory.
- "--num_gpu", str(num_gpus), + "--num_gpu", "1", "--mode", "eagle3", "--eagle_config", str(config_file), - "--output_dir", tmp_path / "eagle-tinyllama", - "--training_seq_len", "128", # Match max_position_embeddings + "--output_dir", tmp_path / "eagle-tinyllama", + "--training_seq_len", "128", # Match max_position_embeddings + "--train_bs", "1",If you want to keep multi-GPU testing, consider clamping to min(1, num_gpus) only in CI: detect via env (e.g., CI=true) and pass 1 there.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
examples/speculative_decoding/launch.sh
(1 hunks)tests/examples/speculative_decoding/test_eagle.py
(2 hunks)tests/gpu/torch/export/test_unified_export_megatron.py
(0 hunks)
💤 Files with no reviewable changes (1)
- tests/gpu/torch/export/test_unified_export_megatron.py
🧰 Additional context used
🧬 Code graph analysis (1)
tests/examples/speculative_decoding/test_eagle.py (3)
tests/_test_utils/examples/run_command.py (1)
run_example_command
(35-37)tests/examples/conftest.py (2)
tiny_llama_path
(33-41)num_gpus
(23-24)tests/examples/speculative_decoding/conftest.py (1)
tiny_daring_anteater_path
(23-36)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: wait-checks / wait
- GitHub Check: code-quality
- GitHub Check: build-docs
🔇 Additional comments (2)
tests/examples/speculative_decoding/test_eagle.py (2)
16-16
: Import is correct and scoped to the new JSON config usage
22-31
: Tiny EAGLE config aligns with goal to cut memory footprintGood call on 128 seq len and single-layer/low-width knobs to reduce activation and KV cache sizes.
Please confirm these keys match what
--eagle_config
loader expects; if additional required fields emerge upstream, the test may start failing.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #299 +/- ##
=======================================
Coverage 73.93% 73.93%
=======================================
Files 172 172
Lines 17408 17408
=======================================
Hits 12870 12870
Misses 4538 4538 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
What does this PR do?
Type of change: Test speedup
Overview: Eagle example tests were using default config which had large dimenstions hence it was using ~28GB per GPU causing OOM on A5000 servers. Optimized to just use 1GB now!
Testing
Summary by CodeRabbit