-
Notifications
You must be signed in to change notification settings - Fork 169
Fix FLOPs calculation #388
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Mohammed Yasin <[email protected]>
Walkthroughinference_flops in Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant Caller as user code
participant Utils as inference_flops
participant Profiler as profile.profile_macs
Caller->>Utils: request FLOPs(model, inputs)
Utils->>Profiler: profile_macs(model, inputs)
Profiler-->>Utils: macs_count
note right of Utils #E6F0FF: New step multiplies MACs by 2
Utils-->>Caller: return 2 * macs_count
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✨ Finishing touches
🧪 Generate unit tests
📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: CHILL Plan: Pro 📒 Files selected for processing (5)
🚧 Files skipped from review as they are similar to previous changes (2)
🔇 Additional comments (9)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🧪 Early access (Sonnet 4.5): enabledWe are currently testing the Sonnet 4.5 model, which is expected to improve code review quality. However, this model may lead to increased noise levels in the review comments. Please disable the early access features if the noise level causes any inconvenience. Note:
Comment |
@Y-T-G can you also update the code block outputs where flops are calculated?
Make sure to run the code blocks and copy the updated outputs |
This notebook needs to be updated as well:
|
Unit tests needs to be fixed as well https://github.com/NVIDIA/TensorRT-Model-Optimizer/actions/runs/18100710495/job/51505702432?pr=388 |
@kevalmorabia97 Sorry. Thanks for going through. Updated. But I just doubled the values. Didn't run the notebook myself. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (3)
docs/source/guides/3_pruning.rst (1)
88-90
: Typo in dummy input shape (244) — use 224 for square ImageNet inputs224×244 is likely unintended and can skew FLOPs vs. other examples that use 224×224.
- dummy_input = torch.randn(1, 3, 224, 244) + dummy_input = torch.randn(1, 3, 224, 224)examples/pruning/cifar_resnet.ipynb (1)
659-666
: Fix checkpoint filename typo (“seaarch” → “search”)This breaks resume/re-run behavior.
- " \"checkpoint\": \"modelopt_seaarch_checkpoint_fastnas.pth\",\n", + " \"checkpoint\": \"modelopt_search_checkpoint_fastnas.pth\",\n",tests/unit/torch/nas/test_nas_utils.py (1)
39-41
: Update remaining tests using new FLOPs definition
In tests/unit/torch/nas/test_evaluate_constraints.py (lines 50, 86), replace profile_macs calls with inference_flops and assert 2×MACs for FLOPs. Add a bias=True test and a stride>1 scenario to cover output‐shape math.
🧹 Nitpick comments (3)
docs/source/guides/7_nas.rst (1)
112-114
: Constraint doubled to 4 GFLOPs — add a brief definition note nearbyRecommend inserting an admonition clarifying that FLOPs are now reported as 2×MACs to preempt confusion versus older docs and external tools.
import torch - # Looking for a subnet with at most 4 GFLOPs + # Looking for a subnet with at most 4 GFLOPs + # + # .. important:: + # ModelOpt reports FLOPs as 2 × MACs (multiply + add). Older docs/releases + # and some profilers may show MACs; adjust comparisons accordingly.docs/source/guides/3_pruning.rst (1)
193-196
: Tables updated to doubled FLOPs — good; add definition note once near first FLOPs mentionThe updated numbers and unchanged ratios look correct. Suggest adding a single note in this guide (e.g., right before “Profiling the search space…”) stating FLOPs = 2×MACs to align expectations across guides.
Also applies to: 202-203
examples/pruning/cifar_resnet.ipynb (1)
60-61
: Prefer CPU fallback for device selection in notebooksImproves out-of-the-box execution on CPU-only environments/CI.
- "device = torch.device(\"cuda\")" + "device = torch.device(\"cuda\" if torch.cuda.is_available() else \"cpu\")"
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
docs/source/guides/3_pruning.rst
(2 hunks)docs/source/guides/7_nas.rst
(4 hunks)examples/pruning/cifar_resnet.ipynb
(7 hunks)tests/unit/torch/nas/test_nas_utils.py
(1 hunks)
🔇 Additional comments (5)
docs/source/guides/7_nas.rst (2)
245-247
: Search constraints doubled — goodThe example constraint now matches the 4 GFLOPs narrative.
128-135
: No stale FLOPs references — grep shows only 4.00G at line 141 (no 2.00G matches).examples/pruning/cifar_resnet.ipynb (2)
796-800
: Conclusion table reflects doubled FLOPs — goodValues for ResNet20 and the FastNAS subnet align with the new counting.
621-623
: No stale FLOPs references found
Verified no '30M' or '2 GFLOPs' remnants in examples/pruning/cifar_resnet.ipynb; outputs are up to date under the new FLOPs definition.docs/source/guides/3_pruning.rst (1)
1-10
: No stale FLOPs references detected
Verified absence of “2.75G” or “2 GFLOP” in docs/source/guides/3_pruning.rst.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
docs/source/guides/3_pruning.rst (1)
87-89
: Typo in dummy_input shape.Use 224×224 (not 224×244) to match standard ImageNet shapes and your other guides.
- dummy_input = torch.randn(1, 3, 224, 244) + dummy_input = torch.randn(1, 3, 224, 224)examples/pruning/cifar_resnet.ipynb (1)
659-666
: Constraint raised to 60e6—good. Minor typo in checkpoint filename.- "checkpoint": "modelopt_seaarch_checkpoint_fastnas.pth", + "checkpoint": "modelopt_search_checkpoint_fastnas.pth",
🧹 Nitpick comments (4)
docs/source/guides/3_pruning.rst (2)
55-56
: Confusing absolute-number example (“3e-6”).For FLOPs, an absolute upper bound like 3e9 (3 GFLOPs) is the typical unit scale. Replace 3e-6.
- specifying an upper bound in terms of absolute number (``3e-6``) or a percentage (``"60%"``). + specifying an upper bound in terms of absolute number (``3e9``) or a percentage (``"60%"``).
61-63
: Add a one-line note clarifying FLOPs semantics (MACs×2).To prevent future confusion, state the definition once here.
#. Please see the API reference of :meth:`mtp.prune() <modelopt.torch.prune.pruning.prune>` for more details. + +.. note:: + + In ModelOpt docs and APIs, FLOPs are reported as 2 × MACs for conv/linear ops.docs/source/guides/7_nas.rst (1)
100-108
: Document FLOPs=2×MACs once in NAS guide too.To better understand the performance and the range of the resulting search space, you can profile the search space together with your deployment constraints using :meth:`mtn.profile() <modelopt.torch.nas.algorithms.profile>`: + +.. note:: + + Throughout ModelOpt, FLOPs are reported as 2 × MACs for conv/linear operations.examples/pruning/cifar_resnet.ipynb (1)
60-60
: Optional: add CPU fallback for wider reproducibility (Colab without GPU).-device = torch.device("cuda") +device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
docs/source/guides/3_pruning.rst
(2 hunks)docs/source/guides/7_nas.rst
(4 hunks)examples/pruning/cifar_resnet.ipynb
(7 hunks)tests/unit/torch/nas/test_nas_utils.py
(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- tests/unit/torch/nas/test_nas_utils.py
🔇 Additional comments (9)
docs/source/guides/7_nas.rst (4)
141-142
: Upper-bound row: verify via executed output.Same ask as above—paste the actual “Constraints Evaluation” printout after running.
245-247
: Search constraint raised to 4 GFLOPs—consistent.LGTM once verified outputs are regenerated to match.
132-134
: Profile table values need live execution
Run the “Profiling a search space” block locally (with torch installed) and paste the exact flops/params output to replace the current table.
112-114
: Constraint updated to 4 GFLOPs — ensure no remaining 2 GFLOPs references.
Repository search for '2e9'/'2 GFLOPs' returned no matches. Run: rg -nP '\b2(?:.0)?e9\b|2 GFLOPs' -C2 and update any occurrences.examples/pruning/cifar_resnet.ipynb (4)
621-623
: best_subnet_constraints must be produced by the search, not edited.Re-run and keep whatever the actual best subnet prints (values may vary slightly by seed/hardware).
678-682
: Narrative matches 60M bound; ensure it aligns with executed results.Reword if actual best FLOPs differ after running.
798-800
: Summary table: verify FLOPs after execution.Numbers should reflect the run artifacts (rounding OK).
532-535
: Profiling table and constraints block should be captured from an executed run.Author noted they didn’t execute the notebook—please “Restart & Run All” and commit the fresh outputs to prevent divergence.
You can automate locally:
Also applies to: 541-543
docs/source/guides/3_pruning.rst (1)
193-195
: Regenerate FLOPs/params table from code output
Please rerun the snippet below in an environment with torch installed and paste the exact printed rows for lines 193–195 and 202–203 to ensure the documentation matches the real output.python - <<'PY' import torch import modelopt.torch.prune as mtp from torchvision.models import resnet50 model = resnet50() dummy_input = torch.randn(1,3,224,224) def score_func(m): return 0.0 _ = mtp.prune( model=model, mode="fastnas", constraints={"flops":"60%"}, dummy_input=dummy_input, config={"data_loader": [], "score_func": score_func, "checkpoint": "/tmp/_chk.pth"}, ) PY
@Y-T-G You need to sign your commits with an SSH key. Please take a look at https://github.com/NVIDIA/TensorRT-Model-Optimizer/blob/main/CONTRIBUTING.md#%EF%B8%8F-signing-your-work and amend your commits |
Signed-off-by: Y-T-G <[email protected]>
Signed-off-by: Y-T-G <[email protected]>
@kevalmorabia97 Fixed |
/ok to test 0589c61 |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #388 +/- ##
=======================================
Coverage 73.86% 73.86%
=======================================
Files 171 171
Lines 17629 17629
=======================================
Hits 13021 13021
Misses 4608 4608 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
What does this PR do?
Type of change: ? Bug fix
Overview: Fixes FLOPs calculations. Closes #387
Usage
Testing
Before your PR is "Ready for review"
Additional Information
Summary by CodeRabbit