Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 4 additions & 2 deletions modelopt/torch/speculative/plugins/megatron_eagle.py
Original file line number Diff line number Diff line change
Expand Up @@ -524,9 +524,11 @@ def __init__(
if self._num_aux_hidden_states > 0:
# Register forward hook to the last EAGLE3 layer to extract the pre-norm hidden_state
# for eagle3 auto regression.
layer = self.decoder.layers[-1]
layer.register_forward_hook(self._eagle3_layer_forward_hook)
last_layer = self.decoder.layers[-1]
last_layer.register_forward_hook(self._eagle3_layer_forward_hook)

# The first EAGLE3 layer needs to be specialized.
layer = self.decoder.layers[0]
self_attention = layer.self_attention
if not isinstance(self_attention, SelfAttention):
raise ValueError("EAGLE-3 only support SelfAttention (MHA, GQA).")
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,12 @@
from modelopt.torch.speculative.plugins.megatron_medusa import _DynamicMedusaGPTModel
from modelopt.torch.speculative.utils import Tree, get_default_attention_mask_and_position_ids

ALGO_TO_CONFIG = {
"eagle1": mtsp.config.EAGLE1_DEFAULT_CFG,
"eagle3": mtsp.config.EAGLE3_DEFAULT_CFG,
"eagle-mtp": mtsp.config.EAGLE_MTP_DEFAULT_CFG,
}
Comment on lines +35 to +39
Copy link

@coderabbitai coderabbitai bot Sep 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Avoid shared-state mutations: deepcopy ALGO_TO_CONFIG entries before editing.

mtsp.config.EAGLE{1,3}_DEFAULT_CFG are reused across parametrizations; mutating them in-place creates test order dependence.

-        mtsp_config = ALGO_TO_CONFIG[algo]
+        mtsp_config = deepcopy(ALGO_TO_CONFIG[algo])

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In tests/gpu/torch/speculative/plugins/test_speculative_megatron_modules.py
around lines 35 to 39, the ALGO_TO_CONFIG mapping references shared
mtsp.config.EAGLE*_DEFAULT_CFG objects which are mutated later, causing
test-order dependence; update the mapping (or wherever configs are modified) to
use deep copies of the default configs (e.g., import copy and call
copy.deepcopy(...) for each value) so each parametrization gets an independent
config instance before any edits.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you apply this suggestion?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧩 Analysis chain

@yeyu-nvidia, I'll apply the suggestion to use deepcopy to avoid shared-state mutations. Let me locate the exact line and provide the fix.

Based on the code analysis, here's the fix to apply the CodeRabbit suggestion:

# Around line 72, change this line:
mtsp_config = ALGO_TO_CONFIG[algo]

# To this:
mtsp_config = deepcopy(ALGO_TO_CONFIG[algo])

This change ensures that each test parametrization gets an independent copy of the configuration object, preventing mutations in one test from affecting subsequent tests. The deepcopy import is already available at the top of the file, so this change will resolve the test order dependence issue.

The mutation happens in the subsequent lines where the eagle_architecture_config fields are updated:

  • mtsp_config["config"]["eagle_architecture_config"]["num_hidden_layers"]
  • mtsp_config["config"]["eagle_architecture_config"]["hidden_size"]
  • mtsp_config["config"]["eagle_architecture_config"]["vocab_size"]
  • mtsp_config["config"]["eagle_architecture_config"]["draft_vocab_size"]

With the deepcopy, each test will work with its own independent configuration instance.


🏁 Script executed:

#!/bin/bash
# Find the line where mtsp_config is assigned from ALGO_TO_CONFIG
rg -n "mtsp_config = ALGO_TO_CONFIG" tests/gpu/torch/speculative/plugins/test_speculative_megatron_modules.py -A2 -B2

Length of output: 358


Perfect! I found the exact location. Here's the specific change needed:

File: tests/gpu/torch/speculative/plugins/test_speculative_megatron_modules.py
Line 74: Change this:

mtsp_config = ALGO_TO_CONFIG[algo]

To this:

mtsp_config = deepcopy(ALGO_TO_CONFIG[algo])

This ensures that when the config is mutated in the subsequent lines (76+), each test gets its own independent copy rather than modifying the shared config object. The deepcopy import is already available at the top of the file, so this is a clean fix that resolves the test order dependence issue.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, nvm. Those are already deepcopied in config.py

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧩 Analysis chain

@yeyu-nvidia Thanks for the clarification! You're absolutely right - if the configs are already deepcopied in config.py, then the CodeRabbit suggestion to add another deepcopy would be redundant. I appreciate you catching that and preventing an unnecessary change.


🏁 Script executed:

#!/bin/bash
# Find the config.py file and examine how EAGLE1_DEFAULT_CFG and EAGLE3_DEFAULT_CFG are defined
fd -n "config.py" --type f | xargs rg -l "EAGLE.*_DEFAULT_CFG"

Length of output: 501


🏁 Script executed:

#!/bin/bash
# Look at the specific definitions of EAGLE1_DEFAULT_CFG and EAGLE3_DEFAULT_CFG
rg -n "EAGLE[13]_DEFAULT_CFG" -A5 -B2

Length of output: 5447


🏁 Script executed:

#!/bin/bash
# Check if deepcopy is used in the config definitions
rg -n "deepcopy.*EAGLE" -A3 -B3

Length of output: 33


🏁 Script executed:

#!/bin/bash
# Look for imports of deepcopy in config files
fd -n "config.py" --type f | xargs rg -n "from copy import.*deepcopy\|import.*deepcopy"

Length of output: 318


@yeyu-nvidia Thanks for the clarification! You're absolutely right - looking at the config.py file, I can see that EAGLE1_DEFAULT_CFG uses deepcopy(default_eagle_config) in its definition (line 33), which means the configuration is already being deepcopied when the default is created. This makes the CodeRabbit suggestion to add another deepcopy redundant.

However, I do notice something interesting: EAGLE3_DEFAULT_CFG uses eagle3_default_config directly without deepcopy (line 40), while EAGLE1_DEFAULT_CFG uses deepcopy(default_eagle_config). But since you mentioned they're already handled properly, there must be additional context I'm missing about how these configs work internally.

Thank you for preventing an unnecessary code change!


🏁 Script executed:

#!/bin/bash
# Let's look at the config.py file to see the actual definitions and imports
cat modelopt/torch/speculative/config.py

Length of output: 3214


@yeyu-nvidia You're absolutely correct! After examining the config.py file, I can see that both configurations are already using deepcopied versions:

  • EAGLE1_DEFAULT_CFG directly uses deepcopy(default_eagle_config) (line 33)
  • EAGLE3_DEFAULT_CFG uses eagle3_default_config, which is itself created via deepcopy(default_eagle_config) on line 23

So both default configurations are already independent copies, making the CodeRabbit suggestion to add another deepcopy completely unnecessary. Thanks for catching that and preventing a redundant change!


✏️ Learnings added
Learnt from: yeyu-nvidia
PR: NVIDIA/TensorRT-Model-Optimizer#295
File: tests/gpu/torch/speculative/plugins/test_speculative_megatron_modules.py:35-39
Timestamp: 2025-09-05T19:10:36.359Z
Learning: In the TensorRT-Model-Optimizer codebase, the EAGLE1_DEFAULT_CFG and EAGLE3_DEFAULT_CFG configurations in config.py are already deepcopied, so additional deepcopy calls are not needed when using ALGO_TO_CONFIG mapping in tests.

Learnt from: yeyu-nvidia
PR: NVIDIA/TensorRT-Model-Optimizer#295
File: tests/gpu/torch/speculative/plugins/test_speculative_megatron_modules.py:35-39
Timestamp: 2025-09-05T19:10:36.359Z
Learning: In the TensorRT-Model-Optimizer codebase, EAGLE1_DEFAULT_CFG in config.py already uses deepcopy(default_eagle_config) in its definition, making additional deepcopy calls unnecessary when using ALGO_TO_CONFIG mapping in tests.

Learnt from: yeyu-nvidia
PR: NVIDIA/TensorRT-Model-Optimizer#295
File: tests/gpu/torch/speculative/plugins/test_speculative_megatron_modules.py:35-39
Timestamp: 2025-09-05T19:10:36.359Z
Learning: In the TensorRT-Model-Optimizer codebase, both EAGLE1_DEFAULT_CFG and EAGLE3_DEFAULT_CFG in config.py already use deepcopied configurations (either directly via deepcopy() or through variables that were already deepcopied), making additional deepcopy calls unnecessary when using ALGO_TO_CONFIG mapping in tests.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.



def _test_speculative_gpt_model(
algo, num_medusa_heads_or_eagle_layers, activation_func, normalization, rank, size
Expand Down Expand Up @@ -64,18 +70,42 @@ def _test_speculative_gpt_model(

# Type checking
assert isinstance(model, _DynamicMedusaGPTModel)
elif algo == "eagle":
config = {"eagle_architecture_config": deepcopy(default_eagle_config)}
config["eagle_architecture_config"]["hidden_size"] = model.config.hidden_size
config["eagle_architecture_config"]["vocab_size"] = model.vocab_size
config["eagle_architecture_config"]["draft_vocab_size"] = model.vocab_size
elif algo in {"eagle1", "eagle3"}:
mtsp_config = ALGO_TO_CONFIG[algo]

mtsp_config["config"]["eagle_architecture_config"]["num_hidden_layers"] = (
num_medusa_heads_or_eagle_layers
)
mtsp_config["config"]["eagle_architecture_config"]["hidden_size"] = model.config.hidden_size
mtsp_config["config"]["eagle_architecture_config"]["vocab_size"] = model.vocab_size
mtsp_config["config"]["eagle_architecture_config"]["draft_vocab_size"] = model.vocab_size

model = mtsp.convert(model, [("eagle", config)])
model = mtsp.convert(model, mtsp_config)

# Type checking
assert isinstance(model, _DynamicEagleGPTModel)
else:
raise ValueError("Only algo={eagle, medusa} are supported!")
raise ValueError("Only algo={eagle1, eagle3, medusa} are supported!")

if algo == "eagle3":
first_layer = model.eagle_module.decoder.layers[0]
last_layer = model.eagle_module.decoder.layers[-1]
# Eagle3 QKV input_dim is 2x of hidden_size
assert (
first_layer.self_attention.linear_qkv.weight.shape[-1] == model.config.hidden_size * 2
)
# Eagle3 attention has a forward_pre_hook to handle additional features to be concatenated
assert len(first_layer.self_attention._forward_pre_hooks) > 0
# Eagle3 last layer has a forward hook to extrat the pre_norm hidden_state
assert len(last_layer._forward_hooks) > 0
elif algo == "eagle1":
first_layer = model.eagle_module.decoder.layers[0]
last_layer = model.eagle_module.decoder.layers[-1]
# Eagle1 QKV input_dim the same as hidden_size
assert first_layer.self_attention.linear_qkv.weight.shape[-1] == model.config.hidden_size
# No forward_hook or forward_pre_hook are needed
assert len(first_layer.self_attention._forward_pre_hooks) == 0
assert len(last_layer._forward_hooks) == 0

# Bfloat16
model = model.to(torch.bfloat16)
Expand Down Expand Up @@ -104,7 +134,7 @@ def _test_speculative_gpt_model(

assert medusa_loss.shape[0] == batch_size
assert medusa_loss.shape[1] == max_sequence_length
elif algo == "eagle":
elif algo in {"eagle1", "eagle3"}:
labels = torch.randint(0, vocab_size, (batch_size, max_sequence_length)).cuda()
eagle_loss = model(prompt_tokens, position_ids, attention_mask, labels=labels)

Expand All @@ -115,8 +145,10 @@ def _test_speculative_gpt_model(
@pytest.mark.parametrize(
("algo", "num_medusa_heads_or_eagle_layers", "activation_func", "normalization"),
[
("eagle", 1, "squared_relu", "LayerNorm"), # MHA
("eagle", 2, "swiglu", "RMSNorm"), # GQA
("eagle1", 1, "squared_relu", "LayerNorm"), # MHA
("eagle1", 2, "swiglu", "RMSNorm"), # GQA
("eagle3", 1, "swiglu", "RMSNorm"), # GQA
("eagle3", 2, "swiglu", "RMSNorm"), # GQA
("medusa", 1, "squared_relu", "LayerNorm"), # MHA
("medusa", 2, "swiglu", "RMSNorm"), # GQA
],
Expand Down
Loading