Skip to content

Conversation

@kevalmorabia97
Copy link
Collaborator

@kevalmorabia97 kevalmorabia97 commented Dec 2, 2025

Summary by CodeRabbit

  • New Features

    • Added --trust_calibration_data CLI flag for secure ONNX quantization with pickle data files.
  • Improvements

    • Enhanced security validation for generated quantization code.
    • Simplified data loading by removing pickle-based caching—data is now always loaded fresh.
    • Added security guidance throughout model state loading operations.
  • Documentation

    • Updated guides with security best practices for model state handling.

✏️ Tip: You can customize this high-level summary in your review settings.

@kevalmorabia97 kevalmorabia97 force-pushed the kmorabia/security-concerns branch from 45b288f to 8a6c466 Compare December 2, 2025 12:10
@codecov
Copy link

codecov bot commented Dec 2, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 74.64%. Comparing base (d0b0c0f) to head (8a6c466).

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #626   +/-   ##
=======================================
  Coverage   74.64%   74.64%           
=======================================
  Files         183      183           
  Lines       18542    18542           
=======================================
  Hits        13840    13840           
  Misses       4702     4702           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@kevalmorabia97
Copy link
Collaborator Author

@coderabbitai review

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 2, 2025

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 2, 2025

Walkthrough

This pull request adds security documentation and validation across the ModelOpt codebase. Changes include security comments clarifying safe deserialization of ModelOpt-generated states, removal of pickle-based caching from data loading, addition of a --trust_calibration_data CLI flag for ONNX quantization with corresponding load-time validation, refactoring of code generation security checks in the quantization plugin, and minor documentation updates.

Changes

Cohort / File(s) Summary
Security documentation comments
docs/source/guides/2_save_load.rst, examples/llm_qat/export.py, modelopt/torch/export/distribute.py, modelopt/torch/opt/conversion.py, modelopt/torch/opt/plugins/huggingface.py, modelopt/torch/opt/plugins/mcore_dist_checkpointing.py, modelopt/torch/opt/plugins/peft.py, modelopt/torch/opt/searcher.py, modelopt/torch/quantization/plugins/transformers_trainer.py, modelopt/torch/utils/distributed.py
Added inline security notes preceding deserialization calls with weights_only=False, clarifying that ModelOpt-generated state dictionaries are trusted and not untrusted user input.
Data loading and caching
examples/llm_sparsity/finetune.py, examples/llm_sparsity/README.md
Removed pickle-based data caching in SupervisedDataset.__init__ to always load fresh data; updated README to remove pickle serialization and duration notes.
ONNX calibration data handling
modelopt/onnx/quantization/__main__.py
Added --trust_calibration_data CLI flag; reworked calibration loading to conditionally enable pickle deserialization (allow_pickle) based on flag; introduced security-aware error handling with guidance for enabling trusted deserialization.
Code generation security validation
modelopt/torch/quantization/plugins/attention.py
Removed model_module parameter from _create_quantized_class_from_ast; added explicit type hints; introduced security validation step inspecting generated AST code for suspicious patterns (__import__, eval, exec, compile, open, os.system); added explanatory comments on safety considerations.
Minor documentation update
modelopt/torch/opt/plugins/megatron.py
Changed documentation link format from GitHub commits URL to blob URL in comment.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

  • Attention area: modelopt/torch/quantization/plugins/attention.py — security validation logic for generated code requires careful inspection of pattern matching and AST manipulation; verify trusted case handling for torch.compile is sound.
  • Attention area: modelopt/onnx/quantization/__main__.py — new calibration data loading flow with conditional pickle handling and custom error messaging needs verification that error messages accurately guide users and exception handling is correct.
  • Attention area: examples/llm_sparsity/finetune.py — verify that removing pickle caching does not impact performance expectations or downstream dependencies on cached data behavior.
  • Repetitive changes: Multiple files contain similar security comment patterns; spot-check a few for consistency and accuracy of context.

Poem

🐰 With whiskers twitching, we secure the way,
No pickle tricks or untrusted play!
AST guards dance, and calibration flows trust,
Generated code checked—security's a must!
Hopping forward with safer ModelOpt today! 🔒

Pre-merge checks and finishing touches

❌ Failed checks (1 warning, 1 inconclusive)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 70.59% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
Title check ❓ Inconclusive The title 'Address security concerns in code' is vague and generic, using non-descriptive language that doesn't convey the specific nature of the changes. Replace with a more specific title that describes the main security improvements, such as 'Add security notes for safe pickle deserialization' or 'Document safe weights_only=False usage in ModelOpt state loading'.
✅ Passed checks (1 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch kmorabia/security-concerns

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
modelopt/onnx/quantization/__main__.py (1)

50-59: Fix mutual-exclusion: --trust_calibration_data cannot be used with --calibration_data_path

Placing --trust_calibration_data in the same mutually exclusive group as --calibration_data_path and --calibration_cache_path prevents users from enabling trusted pickle deserialization for a provided calibration file. The loading code at lines 271–287 explicitly expects both flags to be combinable, but argparse will reject --calibration_data_path ... --trust_calibration_data due to the mutual exclusion constraint.

The loading logic itself is sound—defaulting to allow_pickle=False, converting .npz to a dict, and emitting a clear error when pickles are needed without the trust flag. The issue is purely the argparse wiring.

Move --trust_calibration_data out of the mutually exclusive group:

-    group.add_argument(
-        "--trust_calibration_data",
-        action="store_true",
-        help="If True, trust the calibration data and allow pickle deserialization.",
-    )
+    parser.add_argument(
+        "--trust_calibration_data",
+        action="store_true",
+        help=(
+            "If True, trust the calibration data and allow pickle deserialization when "
+            "using --calibration_data_path."
+        ),
+    )

The same issue applies to lines 271–287, where the code attempts to use both flags together.

🧹 Nitpick comments (1)
modelopt/torch/quantization/plugins/attention.py (1)

210-221: AST-based security validation is reasonable but may be overly strict

The refactor to _create_quantized_class_from_ast (typed signature, optional temp_file_name, and the explicit note around compile() on an internally generated AST) looks good, and the compile-based path avoids exec on generated code.

The new string-based scan for suspicious patterns is conservative, but it can raise ValueError for otherwise safe classes whose source or docstrings merely mention tokens like eval, compile, or os.system. That would fail registration at import time.

If you start seeing false positives in practice, consider tightening this by:

  • Inspecting the AST for actual ast.Call nodes to these symbols instead of scanning the unparsed source string, or
  • Downgrading to a warning and skipping registration for that class (returning False from register_attention_for_kv_quant) rather than raising.

As-is, this is acceptable from a security standpoint; the above would just make it more robust against benign occurrences of these substrings.

Also applies to: 238-249, 271-275

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d0b0c0f and 8a6c466.

📒 Files selected for processing (15)
  • docs/source/guides/2_save_load.rst (1 hunks)
  • examples/llm_qat/export.py (1 hunks)
  • examples/llm_sparsity/README.md (1 hunks)
  • examples/llm_sparsity/finetune.py (1 hunks)
  • modelopt/onnx/quantization/__main__.py (2 hunks)
  • modelopt/torch/export/distribute.py (1 hunks)
  • modelopt/torch/opt/conversion.py (1 hunks)
  • modelopt/torch/opt/plugins/huggingface.py (1 hunks)
  • modelopt/torch/opt/plugins/mcore_dist_checkpointing.py (1 hunks)
  • modelopt/torch/opt/plugins/megatron.py (1 hunks)
  • modelopt/torch/opt/plugins/peft.py (2 hunks)
  • modelopt/torch/opt/searcher.py (1 hunks)
  • modelopt/torch/quantization/plugins/attention.py (3 hunks)
  • modelopt/torch/quantization/plugins/transformers_trainer.py (1 hunks)
  • modelopt/torch/utils/distributed.py (1 hunks)
🧰 Additional context used
🧠 Learnings (4)
📚 Learning: 2025-09-15T20:46:29.252Z
Learnt from: realAsma
Repo: NVIDIA/TensorRT-Model-Optimizer PR: 318
File: modelopt/torch/quantization/plugins/transformers_trainer.py:170-189
Timestamp: 2025-09-15T20:46:29.252Z
Learning: In modelopt/torch/quantization/plugins/transformers_trainer.py, the restore_from_modelopt_state function can accept modelopt_state["modelopt_state_dict"] directly without needing to wrap it in a full dict structure or include modelopt_version.

Applied to files:

  • docs/source/guides/2_save_load.rst
  • modelopt/torch/opt/plugins/mcore_dist_checkpointing.py
  • modelopt/torch/quantization/plugins/transformers_trainer.py
  • modelopt/torch/opt/conversion.py
  • examples/llm_qat/export.py
  • modelopt/torch/opt/plugins/peft.py
  • modelopt/torch/opt/plugins/huggingface.py
📚 Learning: 2025-09-16T21:46:46.344Z
Learnt from: realAsma
Repo: NVIDIA/TensorRT-Model-Optimizer PR: 318
File: modelopt/torch/quantization/plugins/transformers_trainer.py:206-212
Timestamp: 2025-09-16T21:46:46.344Z
Learning: In modelopt/torch/quantization/plugins/transformers_trainer.py, the mtq.quantize function calls the forward_loop under a no_grad context, so wrapping the forward_loop in inference_mode or no_grad is not needed.

Applied to files:

  • modelopt/torch/quantization/plugins/transformers_trainer.py
📚 Learning: 2025-09-16T20:14:34.768Z
Learnt from: realAsma
Repo: NVIDIA/TensorRT-Model-Optimizer PR: 318
File: modelopt/torch/quantization/plugins/transformers_trainer.py:191-191
Timestamp: 2025-09-16T20:14:34.768Z
Learning: The TensorRT-Model-Optimizer project only supports PyTorch >= 2.6, so using the `weights_only` parameter in torch.load calls is acceptable and doesn't require backward compatibility handling.

Applied to files:

  • modelopt/torch/quantization/plugins/transformers_trainer.py
  • modelopt/torch/opt/conversion.py
📚 Learning: 2025-09-15T16:40:12.799Z
Learnt from: realAsma
Repo: NVIDIA/TensorRT-Model-Optimizer PR: 318
File: modelopt/torch/quantization/plugins/transformers_trainer.py:206-208
Timestamp: 2025-09-15T16:40:12.799Z
Learning: In modelopt/torch/quantization/plugins/transformers_trainer.py, the forward_loop function in _quantize_model should use self.model(**batch) rather than the model parameter passed to forward_loop. The model parameter should not be used for the forward pass.

Applied to files:

  • modelopt/torch/quantization/plugins/transformers_trainer.py
🧬 Code graph analysis (1)
examples/llm_sparsity/finetune.py (1)
modelopt/torch/utils/logging.py (1)
  • print_rank_0 (92-95)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: example-tests-pr (llm_ptq)
  • GitHub Check: gpu-tests-pr
🔇 Additional comments (13)
modelopt/torch/opt/searcher.py (1)

252-253: LGTM! Security note clarifies safe deserialization context.

The security comment appropriately documents that weights_only=False is used for ModelOpt-generated checkpoints, not untrusted input.

modelopt/torch/quantization/plugins/transformers_trainer.py (1)

191-192: LGTM! Security note clarifies safe deserialization context.

The security comment appropriately documents that weights_only=False is used for ModelOpt-generated state_dict, not untrusted input.

modelopt/torch/utils/distributed.py (1)

90-91: LGTM! Security note clarifies safe deserialization context.

The security comment appropriately documents that weights_only=False is used for internally-generated buffers within the distributed process group, not untrusted input.

examples/llm_sparsity/finetune.py (1)

235-244: LGTM! Pickle caching removal improves security posture.

Removing pickle-based caching eliminates a potential security risk by ensuring data is always loaded fresh from the trusted source. This aligns with the PR's security objectives.

modelopt/torch/opt/plugins/megatron.py (1)

79-79: Note: Pickle replacement is pending per PR objectives.

This file only updates a comment URL, but the PR description explicitly states that pickle usage replacement in this file is still pending. The file continues to use pickle.dumps (line 59) and pickle.loads (line 80) with # nosec suppressions.

Consider either:

  1. Completing the pickle replacement in this PR following the TransformerEngine approach (referenced in PR description), or
  2. Removing this file from the current PR and tracking the pickle replacement separately.

Including incomplete work may cause confusion about the PR's security improvements.

modelopt/torch/opt/plugins/huggingface.py (1)

82-83: LGTM! Security note clarifies safe deserialization context.

The security comment appropriately documents that weights_only=False is used for ModelOpt-generated state_dict, not untrusted input.

docs/source/guides/2_save_load.rst (1)

132-133: LGTM! Documentation update educates users on safe deserialization.

The security comment in the documentation example helps users understand the appropriate context for using weights_only=False with ModelOpt-generated state_dicts.

modelopt/torch/opt/conversion.py (1)

529-532: LGTM! Docstring update educates users on safe deserialization.

The security comment in the function docstring example helps users understand the appropriate context for using weights_only=False with ModelOpt-generated state_dicts.

modelopt/torch/opt/plugins/mcore_dist_checkpointing.py (1)

244-248: Clarifying security note looks good

The added comment correctly scopes the weights_only=False usage to NVIDIA-generated checkpoints and doesn’t affect behavior.

examples/llm_qat/export.py (1)

52-56: Security note around modelopt_state load is consistent

The comment clearly documents that weights_only=False is only used on ModelOpt-generated state in a trusted path; no code changes are introduced.

Please ensure the minimum supported PyTorch version in this repo accepts the weights_only argument on torch.load.

modelopt/torch/opt/plugins/peft.py (1)

71-78: Documented trust boundary for PEFT state loads is appropriate

Both comments correctly explain that weights_only=False is used only for ModelOpt-generated adapter and quantizer state, not arbitrary user files; this aligns with the PR’s TAVA guidance.

Please confirm the project’s PyTorch version supports the weights_only kwarg everywhere it is now passed to torch.load.

Also applies to: 86-92

modelopt/torch/export/distribute.py (1)

90-96: NFSWorkspace security comment is clear and non-intrusive

The note accurately states that weights_only=False is used only on internally produced ModelOpt checkpoints in the shared NFS directory; runtime behavior is preserved.

As with other call sites, please ensure supported PyTorch versions accept weights_only in torch.load.

examples/llm_sparsity/README.md (1)

85-88: README update aligns with removal of pickle-based caching

Simplifying the SAT data-prep note to just describe tokenization length avoids mentioning the removed pickle cache behavior and keeps docs accurate.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants