Skip to content

Conversation

i-riyad
Copy link
Contributor

@i-riyad i-riyad commented Aug 29, 2025

What does this PR do?

Type of change: Bug fix

Overview: Training mode removal from a ONNX node was incomplete.

Usage

from modelopt.onnx.utils import remove_node_training_mode

result_model = remove_node_training_mode(model, "BatchNormalization")

Testing

python -m pytest tests/unit/torch/deploy/utils/test_torch_onnx_utils.py::test_remove_node_training_mode_attribute -v
python -m pytest tests/unit/torch/deploy/utils/test_torch_onnx_utils.py::test_remove_node_extra_training_outputs -v

Before your PR is "Ready for review"

  • Make sure you read and follow Contributor guidelines and your commits are signed.
  • Is this change backward compatible?: Yes
  • Did you write any new necessary tests?: Yes
  • Did you add or update any necessary documentation?: No
  • Did you update Changelog?: No

Additional Information

@i-riyad i-riyad requested review from a team as code owners August 29, 2025 19:05
@i-riyad i-riyad requested review from cjluo-nv and ajrasane August 29, 2025 19:05
Copy link

copy-pr-bot bot commented Aug 29, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@i-riyad i-riyad force-pushed the rislam/training-mode-remove branch from 0a3fd29 to d0f071c Compare August 29, 2025 20:54
break

# If node has extra training outputs, keep only the first
if len(node.output) > 1:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way we can check that that this output that we remove will be the training output?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, the number of training outputs is operator specific and removing them specifically is a bit more convoluted. I have implemented the same functionality by removing all the unused outputs from a given node if training_mode is on. Please review.

return make_tensor(name, onnx.TensorProto.FLOAT, shape, data.flatten())


def _make_batchnorm_model(bn_node, extra_value_infos=None):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a comment to show how this model will look after creation?

assert "training_mode" not in attr_names


def test_remove_node_extra_training_outputs():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you also add another test to check if there are multiple outputs of a node that are not training outputs, they are not removed?

Copy link
Contributor Author

@i-riyad i-riyad Aug 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Modified the existing test to cover this.

Copy link

codecov bot commented Aug 30, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 73.97%. Comparing base (dba0b37) to head (ff5f6df).
⚠️ Report is 3 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #277      +/-   ##
==========================================
+ Coverage   73.83%   73.97%   +0.14%     
==========================================
  Files         173      172       -1     
  Lines       17402    17351      -51     
==========================================
- Hits        12849    12836      -13     
+ Misses       4553     4515      -38     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@i-riyad i-riyad force-pushed the rislam/training-mode-remove branch 4 times, most recently from 1b85154 to 626d99d Compare August 30, 2025 21:30
@i-riyad i-riyad requested a review from ajrasane August 30, 2025 21:32
Signed-off-by: Riyad Islam <[email protected]>
Signed-off-by: Riyad Islam <[email protected]>
@i-riyad i-riyad force-pushed the rislam/training-mode-remove branch from 626d99d to 062224a Compare August 30, 2025 21:37
@i-riyad i-riyad self-assigned this Aug 30, 2025
@kevalmorabia97 kevalmorabia97 removed the request for review from cjluo-nv September 3, 2025 05:39
@i-riyad i-riyad merged commit d0372f4 into main Sep 4, 2025
19 checks passed
@i-riyad i-riyad deleted the rislam/training-mode-remove branch September 4, 2025 18:27
@i-riyad i-riyad mentioned this pull request Sep 4, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants