Skip to content

Conversation

Y-T-G
Copy link
Contributor

@Y-T-G Y-T-G commented Sep 29, 2025

What does this PR do?

Type of change: ? Bug fix

Overview: Fixes FLOPs calculations. Closes #387

Usage

from modelopt.torch.nas.utils import inference_flops
import torchvision
model = torchvision.models.get_model("resnet18")
import torch
inference_flops(model, torch.randn(1,3,224,224))

Testing

Before your PR is "Ready for review"

  • Make sure you read and follow Contributor guidelines and your commits are signed.
  • Is this change backward compatible?: No. Changes FLOPs output.
  • Did you write any new necessary tests?: No
  • Did you add or update any necessary documentation?: No
  • Did you update Changelog?: No

Additional Information

Summary by CodeRabbit

  • Bug Fixes
    • FLOPs reporting now returns doubled values; displayed/returned FLOPs and derived thresholds reflect the new scale.
  • Documentation
    • Guides updated to show doubled FLOPs figures and revised FLOPs constraints in examples, tables, and evaluation outputs.
  • Examples
    • Pruning example adjusted to higher FLOPs ceilings and updated reported metrics and narrative to match the new scale.
  • Tests
    • Unit tests updated to expect doubled FLOPs calculations and related assertions.

Signed-off-by: Mohammed Yasin <[email protected]>
@Y-T-G Y-T-G requested a review from a team as a code owner September 29, 2025 14:37
Copy link

copy-pr-bot bot commented Sep 29, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

Copy link

coderabbitai bot commented Sep 29, 2025

Walkthrough

inference_flops in modelopt/torch/nas/utils.py now returns FLOPs = 2 × MACs. Documentation, an example notebook, and unit tests were updated to reflect doubled FLOPs; no input validation, model wrapping, or control-flow changes.

Changes

Cohort / File(s) Summary
FLOPs computation
modelopt/torch/nas/utils.py
inference_flops now returns 2 * profile.profile_macs(...), so FLOPs = 2 × MACs; no other logic or API changes.
Documentation updates
docs/source/guides/3_pruning.rst, docs/source/guides/7_nas.rst
Reported/profiled FLOPs values and constraint numbers updated to reflect doubled FLOPs (examples, profiling tables, and constraint upper bounds).
Example notebook
examples/pruning/cifar_resnet.ipynb
Pruning FLOPs constraints, displayed metrics, narrative, and reported subnet FLOPs updated (~30M → ~60M). Also added imports: modelopt.torch.opt as mto, modelopt.torch.prune as mtp.
Tests — FLOPs expectations
tests/unit/torch/nas/test_nas_utils.py, tests/unit/torch/nas/test_evaluate_constraints.py
Tests adjusted to expect doubled FLOPs (assertions changed to multiply expected MACs by 2).

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant Caller as user code
  participant Utils as inference_flops
  participant Profiler as profile.profile_macs

  Caller->>Utils: request FLOPs(model, inputs)
  Utils->>Profiler: profile_macs(model, inputs)
  Profiler-->>Utils: macs_count
  note right of Utils #E6F0FF: New step multiplies MACs by 2
  Utils-->>Caller: return 2 * macs_count
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

I nibble at MACs beneath the log,
two hops per multiply, twice the jog.
FLOPs now stretch, a carrot twice as bright,
I tally, I hop, through day and night. 🥕

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The title “Fix FLOPs calculation” succinctly and accurately describes the primary change in the pull request—correcting the FLOPs computation—without including extraneous details or vague terminology.
Linked Issues Check ✅ Passed The changes address the objective from issue #387 by updating the inference_flops implementation to multiply MACs by two and propagating this correction through the documentation examples, example notebook outputs, and unit test expectations, thereby aligning all coding artifacts with the repository’s FLOPs definition.
Out of Scope Changes Check ✅ Passed All modifications are directly focused on adjusting FLOPs calculations and reflecting that adjustment across code, documentation, examples, and tests, with no unrelated or extraneous code changes introduced outside the scope of correcting the FLOPs definition.
✨ Finishing touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ba0f0ff and 0589c61.

📒 Files selected for processing (5)
  • docs/source/guides/3_pruning.rst (2 hunks)
  • docs/source/guides/7_nas.rst (4 hunks)
  • examples/pruning/cifar_resnet.ipynb (7 hunks)
  • tests/unit/torch/nas/test_evaluate_constraints.py (2 hunks)
  • tests/unit/torch/nas/test_nas_utils.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • docs/source/guides/7_nas.rst
  • docs/source/guides/3_pruning.rst
🔇 Additional comments (9)
tests/unit/torch/nas/test_nas_utils.py (1)

39-40: LGTM! FLOPs expectation correctly doubled.

The test now expects FLOPs = 2 × MACs, which aligns with the standard definition where each MAC involves both a multiplication and an addition. The formula correctly computes the expected FLOPs for Conv2d operations.

tests/unit/torch/nas/test_evaluate_constraints.py (2)

50-50: LGTM! Expected FLOPs correctly doubled.

The expected FLOPs calculation now uses 2 * profile_macs(model, args), consistent with the FLOPs = 2 × MACs definition.


86-86: LGTM! Max FLOPs baseline correctly doubled.

The max_flops baseline for percentage-based constraints is now correctly calculated as 2 * profile_macs(model, args), ensuring percentage limits are computed against the corrected FLOPs definition.

examples/pruning/cifar_resnet.ipynb (6)

492-492: LGTM! Documentation constraint updated to 60M FLOPs.

The narrative correctly reflects the doubled FLOPs constraint, matching the code change on line 659.


532-542: LGTM! Profiling tables regenerated with corrected FLOPs.

The profiling output shows:

  • Max FLOPs: 81.10M (doubled from original)
  • Constraint upper bound: 60.00M (matches code on line 659)

Tables are consistent with the corrected FLOPs calculation.


621-621: LGTM! Best subnet FLOPs correctly updated.

The search result shows flops: '59.28M', which is approximately double the original value and satisfies the 60M constraint.


659-659: LGTM! FLOPs constraint correctly doubled to 60M.

The constraint parameter is now 60e6, consistent with the corrected FLOPs = 2 × MACs calculation and matching all documentation and output references.


679-679: LGTM! Narrative text correctly updated.

The text correctly states "59.3M FLOPs" for the best subnet and references the "60M FLOPs" constraint, consistent with the corrected calculations throughout the notebook.


798-799: LGTM! Final comparison table updated with corrected FLOPs.

The comparison table correctly shows:

  • ResNet20: 81.2M FLOPs (doubled)
  • FastNAS subnet: 59.2M FLOPs (doubled)

These values are consistent with the corrected FLOPs calculation and all other references in the notebook.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🧪 Early access (Sonnet 4.5): enabled

We are currently testing the Sonnet 4.5 model, which is expected to improve code review quality. However, this model may lead to increased noise levels in the review comments. Please disable the early access features if the noise level causes any inconvenience.

Note:

  • Public repositories are always opted into early access features.
  • You can enable or disable early access features from the CodeRabbit UI or by updating the CodeRabbit configuration file.

Comment @coderabbitai help to get the list of available commands and usage tips.

@kevalmorabia97
Copy link
Collaborator

@Y-T-G can you also update the code block outputs where flops are calculated?

Make sure to run the code blocks and copy the updated outputs

@kevalmorabia97
Copy link
Collaborator

kevalmorabia97 commented Sep 29, 2025

This notebook needs to be updated as well:
https://github.com/Y-T-G/TensorRT-Model-Optimizer/blob/patch-1/examples/pruning/cifar_resnet.ipynb

constraints={"flops": 30e6} will change to constraints={"flops": 60e6}, and the cell's output will also need to be fixed

@kevalmorabia97
Copy link
Collaborator

@Y-T-G
Copy link
Contributor Author

Y-T-G commented Sep 29, 2025

@kevalmorabia97 Sorry. Thanks for going through. Updated. But I just doubled the values. Didn't run the notebook myself.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)
docs/source/guides/3_pruning.rst (1)

88-90: Typo in dummy input shape (244) — use 224 for square ImageNet inputs

224×244 is likely unintended and can skew FLOPs vs. other examples that use 224×224.

-    dummy_input = torch.randn(1, 3, 224, 244)
+    dummy_input = torch.randn(1, 3, 224, 224)
examples/pruning/cifar_resnet.ipynb (1)

659-666: Fix checkpoint filename typo (“seaarch” → “search”)

This breaks resume/re-run behavior.

-        "        \"checkpoint\": \"modelopt_seaarch_checkpoint_fastnas.pth\",\n",
+        "        \"checkpoint\": \"modelopt_search_checkpoint_fastnas.pth\",\n",
tests/unit/torch/nas/test_nas_utils.py (1)

39-41: Update remaining tests using new FLOPs definition
In tests/unit/torch/nas/test_evaluate_constraints.py (lines 50, 86), replace profile_macs calls with inference_flops and assert 2×MACs for FLOPs. Add a bias=True test and a stride>1 scenario to cover output‐shape math.

🧹 Nitpick comments (3)
docs/source/guides/7_nas.rst (1)

112-114: Constraint doubled to 4 GFLOPs — add a brief definition note nearby

Recommend inserting an admonition clarifying that FLOPs are now reported as 2×MACs to preempt confusion versus older docs and external tools.

     import torch
-    # Looking for a subnet with at most 4 GFLOPs
+    # Looking for a subnet with at most 4 GFLOPs
+    #
+    # .. important::
+    #    ModelOpt reports FLOPs as 2 × MACs (multiply + add). Older docs/releases
+    #    and some profilers may show MACs; adjust comparisons accordingly.
docs/source/guides/3_pruning.rst (1)

193-196: Tables updated to doubled FLOPs — good; add definition note once near first FLOPs mention

The updated numbers and unchanged ratios look correct. Suggest adding a single note in this guide (e.g., right before “Profiling the search space…”) stating FLOPs = 2×MACs to align expectations across guides.

Also applies to: 202-203

examples/pruning/cifar_resnet.ipynb (1)

60-61: Prefer CPU fallback for device selection in notebooks

Improves out-of-the-box execution on CPU-only environments/CI.

-        "device = torch.device(\"cuda\")"
+        "device = torch.device(\"cuda\" if torch.cuda.is_available() else \"cpu\")"
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 81826ed and 71f6e78.

📒 Files selected for processing (4)
  • docs/source/guides/3_pruning.rst (2 hunks)
  • docs/source/guides/7_nas.rst (4 hunks)
  • examples/pruning/cifar_resnet.ipynb (7 hunks)
  • tests/unit/torch/nas/test_nas_utils.py (1 hunks)
🔇 Additional comments (5)
docs/source/guides/7_nas.rst (2)

245-247: Search constraints doubled — good

The example constraint now matches the 4 GFLOPs narrative.


128-135: No stale FLOPs references — grep shows only 4.00G at line 141 (no 2.00G matches).

examples/pruning/cifar_resnet.ipynb (2)

796-800: Conclusion table reflects doubled FLOPs — good

Values for ResNet20 and the FastNAS subnet align with the new counting.


621-623: No stale FLOPs references found
Verified no '30M' or '2 GFLOPs' remnants in examples/pruning/cifar_resnet.ipynb; outputs are up to date under the new FLOPs definition.

docs/source/guides/3_pruning.rst (1)

1-10: No stale FLOPs references detected
Verified absence of “2.75G” or “2 GFLOP” in docs/source/guides/3_pruning.rst.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
docs/source/guides/3_pruning.rst (1)

87-89: Typo in dummy_input shape.

Use 224×224 (not 224×244) to match standard ImageNet shapes and your other guides.

-    dummy_input = torch.randn(1, 3, 224, 244)
+    dummy_input = torch.randn(1, 3, 224, 224)
examples/pruning/cifar_resnet.ipynb (1)

659-666: Constraint raised to 60e6—good. Minor typo in checkpoint filename.

-        "checkpoint": "modelopt_seaarch_checkpoint_fastnas.pth",
+        "checkpoint": "modelopt_search_checkpoint_fastnas.pth",
🧹 Nitpick comments (4)
docs/source/guides/3_pruning.rst (2)

55-56: Confusing absolute-number example (“3e-6”).

For FLOPs, an absolute upper bound like 3e9 (3 GFLOPs) is the typical unit scale. Replace 3e-6.

-   specifying an upper bound in terms of absolute number (``3e-6``) or a percentage (``"60%"``).
+   specifying an upper bound in terms of absolute number (``3e9``) or a percentage (``"60%"``).

61-63: Add a one-line note clarifying FLOPs semantics (MACs×2).

To prevent future confusion, state the definition once here.

 #. Please see the API reference of :meth:`mtp.prune() <modelopt.torch.prune.pruning.prune>` for more details.
+
+.. note::
+
+   In ModelOpt docs and APIs, FLOPs are reported as 2 × MACs for conv/linear ops.
docs/source/guides/7_nas.rst (1)

100-108: Document FLOPs=2×MACs once in NAS guide too.

 To better understand the performance and the range of the resulting search space, you can profile
 the search space together with your deployment constraints using
 :meth:`mtn.profile() <modelopt.torch.nas.algorithms.profile>`:
+
+.. note::
+
+   Throughout ModelOpt, FLOPs are reported as 2 × MACs for conv/linear operations.
examples/pruning/cifar_resnet.ipynb (1)

60-60: Optional: add CPU fallback for wider reproducibility (Colab without GPU).

-device = torch.device("cuda")
+device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 71f6e78 and ba0f0ff.

📒 Files selected for processing (4)
  • docs/source/guides/3_pruning.rst (2 hunks)
  • docs/source/guides/7_nas.rst (4 hunks)
  • examples/pruning/cifar_resnet.ipynb (7 hunks)
  • tests/unit/torch/nas/test_nas_utils.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • tests/unit/torch/nas/test_nas_utils.py
🔇 Additional comments (9)
docs/source/guides/7_nas.rst (4)

141-142: Upper-bound row: verify via executed output.

Same ask as above—paste the actual “Constraints Evaluation” printout after running.


245-247: Search constraint raised to 4 GFLOPs—consistent.

LGTM once verified outputs are regenerated to match.


132-134: Profile table values need live execution
Run the “Profiling a search space” block locally (with torch installed) and paste the exact flops/params output to replace the current table.


112-114: Constraint updated to 4 GFLOPs — ensure no remaining 2 GFLOPs references.
Repository search for '2e9'/'2 GFLOPs' returned no matches. Run: rg -nP '\b2(?:.0)?e9\b|2 GFLOPs' -C2 and update any occurrences.

examples/pruning/cifar_resnet.ipynb (4)

621-623: best_subnet_constraints must be produced by the search, not edited.

Re-run and keep whatever the actual best subnet prints (values may vary slightly by seed/hardware).


678-682: Narrative matches 60M bound; ensure it aligns with executed results.

Reword if actual best FLOPs differ after running.


798-800: Summary table: verify FLOPs after execution.

Numbers should reflect the run artifacts (rounding OK).


532-535: Profiling table and constraints block should be captured from an executed run.

Author noted they didn’t execute the notebook—please “Restart & Run All” and commit the fresh outputs to prevent divergence.

You can automate locally:

Also applies to: 541-543

docs/source/guides/3_pruning.rst (1)

193-195: Regenerate FLOPs/params table from code output
Please rerun the snippet below in an environment with torch installed and paste the exact printed rows for lines 193–195 and 202–203 to ensure the documentation matches the real output.

python - <<'PY'
import torch
import modelopt.torch.prune as mtp
from torchvision.models import resnet50
model = resnet50()
dummy_input = torch.randn(1,3,224,224)
def score_func(m): return 0.0
_ = mtp.prune(
    model=model, mode="fastnas",
    constraints={"flops":"60%"},
    dummy_input=dummy_input,
    config={"data_loader": [], "score_func": score_func, "checkpoint": "/tmp/_chk.pth"},
)
PY

@kevalmorabia97
Copy link
Collaborator

kevalmorabia97 commented Sep 29, 2025

@Y-T-G You need to sign your commits with an SSH key. Please take a look at https://github.com/NVIDIA/TensorRT-Model-Optimizer/blob/main/CONTRIBUTING.md#%EF%B8%8F-signing-your-work and amend your commits

@kevalmorabia97
Copy link
Collaborator

Signed-off-by: Y-T-G <[email protected]>
@Y-T-G
Copy link
Contributor Author

Y-T-G commented Sep 29, 2025

@kevalmorabia97 Fixed

@kevalmorabia97
Copy link
Collaborator

/ok to test 0589c61

Copy link

codecov bot commented Sep 30, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 73.86%. Comparing base (c9db0ce) to head (0589c61).
⚠️ Report is 4 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #388   +/-   ##
=======================================
  Coverage   73.86%   73.86%           
=======================================
  Files         171      171           
  Lines       17629    17629           
=======================================
  Hits        13021    13021           
  Misses       4608     4608           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@kevalmorabia97 kevalmorabia97 enabled auto-merge (squash) September 30, 2025 06:00
@kevalmorabia97 kevalmorabia97 merged commit 70abfb4 into NVIDIA:main Sep 30, 2025
27 checks passed
@Y-T-G Y-T-G deleted the patch-1 branch September 30, 2025 19:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Is the FLOPs calculation correct?
2 participants