Update inference code to reflect Megatron-LM API changes#2263
Update inference code to reflect Megatron-LM API changes#2263santhnm2 wants to merge 1 commit intoNVIDIA-NeMo:mainfrom
Conversation
Signed-off-by: Keshav Santhanam <ksanthanam@nvidia.com>
📝 WalkthroughWalkthroughThis PR refactors VLM inference modules by replacing Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 2 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/megatron/bridge/inference/vlm/vlm_engine.py (1)
50-51:⚠️ Potential issue | 🟡 MinorSeed guard is always true given
random_seed or 1234.
self.random_seedis always truthy (either the user's value or 1234), so thisifnever skips seeding. If theor 1234is intentional, this guard is dead code. If it's not, see the prior comment about the default.
🤖 Fix all issues with AI agents
In `@src/megatron/bridge/inference/vlm/qwenvl_inference_wrapper.py`:
- Around line 34-35: The __init__ method in QwenVLInferenceWrapper currently
accepts an untyped model parameter; add the declared type hint to match the
documented type by changing the signature of __init__ to accept model:
"Qwen2VLModel" and leave the super().__init__(model) call unchanged so tooling
and linters recognize the model type (refer to the __init__ method and the model
parameter in QwenVLInferenceWrapper).
In `@src/megatron/bridge/inference/vlm/vlm_engine.py`:
- Around line 29-38: The __init__ currently uses "self.random_seed = random_seed
or 1234", which coerces None and 0 to 1234 and causes deterministic seeding;
change it to preserve None and allow 0 by assigning "self.random_seed =
random_seed" (no fallback) and update any checks (e.g., in generate or other
methods that use self.random_seed) from "if self.random_seed:" to "if
self.random_seed is not None:" so seeding only occurs when an explicit seed is
provided and 0 is treated as a valid seed.
🧹 Nitpick comments (3)
src/megatron/bridge/inference/vlm/base.py (1)
165-165: Consider usingSamplingParams | Noneinstead ofOptional[SamplingParams].The coding guidelines prefer
T | NoneoverOptional[T]for nullable types (Python 3.10+). This applies to line 165 and the function signature.As per coding guidelines, "Use 'T | None' for nullable types instead of 'Optional[T]'".
♻️ Proposed fix
def generate( wrapped_model: AbstractModelInferenceWrapper, tokenizer, image_processor, prompts: List[str], images: List[Union[Image, List[Image]]], processor=None, max_batch_size: int = 4, random_seed: Optional[int] = None, - sampling_params: Optional[SamplingParams] = None, + sampling_params: SamplingParams | None = None, ) -> dict:Also applies to: 202-203
tests/unit_tests/inference/vlm/test_vlm_engine.py (1)
33-36: Consider adding a test case that passessampling_paramstogenerate.The current test calls
engine.generate(["prompt"], ["image"])withoutsampling_params. Since thesampling_paramspathway is the primary change in this PR, a test verifying it is forwarded toscheduler.add_requestwould strengthen coverage.tests/unit_tests/inference/vlm/test_base.py (1)
346-346: Consider moving theSamplingParamsimport to the top of the file.The inline import on line 346 works but is inconsistent with the rest of the file where imports are at the module level. Since
SamplingParamsis used in production code already imported in this test's scope, a top-level import is cleaner.♻️ Proposed fix
Add to the top-level imports:
from megatron.core.inference.sampling_params import SamplingParamsThen remove the inline import at line 346.
| def __init__(self, model): | ||
| super().__init__(model) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# First, let's examine the actual file and verify the content at lines 34-35
cat -n src/megatron/bridge/inference/vlm/qwenvl_inference_wrapper.py | head -50Repository: NVIDIA-NeMo/Megatron-Bridge
Length of output: 2079
🏁 Script executed:
# Check the parent class AbstractModelInferenceWrapper to verify __init__ signature
find . -name "*.py" -type f | xargs rg -l "class AbstractModelInferenceWrapper" | head -3Repository: NVIDIA-NeMo/Megatron-Bridge
Length of output: 53
🏁 Script executed:
# Look for InferenceWrapperConfig usage in the file to understand the changes
rg -n "InferenceWrapperConfig" src/megatron/bridge/inference/vlm/qwenvl_inference_wrapper.pyRepository: NVIDIA-NeMo/Megatron-Bridge
Length of output: 53
🏁 Script executed:
# Find the AbstractModelInferenceWrapper class definition
rg -n "class AbstractModelInferenceWrapper" --type pyRepository: NVIDIA-NeMo/Megatron-Bridge
Length of output: 53
🏁 Script executed:
# Get the signature of AbstractModelInferenceWrapper.__init__
rg -A 5 "def __init__" megatron/core/inference/model_inference_wrappers/abstract_model_inference_wrapper.py 2>/dev/null | head -20Repository: NVIDIA-NeMo/Megatron-Bridge
Length of output: 53
🏁 Script executed:
# Let's search for this parent class in the repository
find . -path "*/megatron/core/inference*" -name "*.py" -type f | head -10Repository: NVIDIA-NeMo/Megatron-Bridge
Length of output: 53
🏁 Script executed:
# Check if there's a type hint missing for model parameter in __init__
rg -B 2 -A 2 "def __init__\(self, model\)" src/megatron/bridge/inference/vlm/qwenvl_inference_wrapper.pyRepository: NVIDIA-NeMo/Megatron-Bridge
Length of output: 143
🏁 Script executed:
# Look for other inference wrapper classes to check type hint patterns
find . -path "*/inference*" -name "*wrapper.py" -type f | head -10Repository: NVIDIA-NeMo/Megatron-Bridge
Length of output: 200
🏁 Script executed:
# Check other similar wrapper classes for __init__ type hints
rg -B 2 -A 2 "class.*InferenceWrapper.*:" src/megatron/bridge/ -A 8 | grep -A 8 "def __init__"Repository: NVIDIA-NeMo/Megatron-Bridge
Length of output: 53
🏁 Script executed:
# Check the exact parent class definition to understand the interface
fd -e py | xargs rg -l "class AbstractModelInferenceWrapper" 2>/dev/nullRepository: NVIDIA-NeMo/Megatron-Bridge
Length of output: 55
🏁 Script executed:
# Let's look at the imports to understand the typing - check if there's type hints elsewhere
rg -n "from typing import" src/megatron/bridge/inference/vlm/qwenvl_inference_wrapper.py -A 2Repository: NVIDIA-NeMo/Megatron-Bridge
Length of output: 128
🏁 Script executed:
# Check if the model parameter should have a type hint by looking at similar __init__ methods
rg "def __init__\(self, model.*\)" src/megatron/bridge/ -A 1 | head -30Repository: NVIDIA-NeMo/Megatron-Bridge
Length of output: 893
🏁 Script executed:
# Check if there are other __init__ methods in the codebase with type hints to establish the pattern
rg "def __init__\(self, model.*:" src/megatron/bridge/inference/ -A 1 | head -20Repository: NVIDIA-NeMo/Megatron-Bridge
Length of output: 257
🏁 Script executed:
# Look at the parent class to understand if it has type hints
rg -B 5 -A 10 "class AbstractModelInferenceWrapper" . --type pyRepository: NVIDIA-NeMo/Megatron-Bridge
Length of output: 53
🏁 Script executed:
# Check test file to understand expected usage
cat -n tests/unit_tests/inference/vlm/test_qwenvl_inference_wrapper.py | head -50Repository: NVIDIA-NeMo/Megatron-Bridge
Length of output: 2467
🏁 Script executed:
# Verify if model parameter type hint is required by checking similar files
find src/megatron/bridge/inference -name "*.py" -exec grep -l "def __init__" {} \; | xargs grep -A 1 "def __init__" | head -40Repository: NVIDIA-NeMo/Megatron-Bridge
Length of output: 1222
Add type hint for model parameter.
The constructor simplification is sound, but the model parameter lacks a type hint. Per coding guidelines, add the type hint matching the documented type:
def __init__(self, model: "Qwen2VLModel"):
super().__init__(model)🤖 Prompt for AI Agents
In `@src/megatron/bridge/inference/vlm/qwenvl_inference_wrapper.py` around lines
34 - 35, The __init__ method in QwenVLInferenceWrapper currently accepts an
untyped model parameter; add the declared type hint to match the documented type
by changing the signature of __init__ to accept model: "Qwen2VLModel" and leave
the super().__init__(model) call unchanged so tooling and linters recognize the
model type (refer to the __init__ method and the model parameter in
QwenVLInferenceWrapper).
| def __init__( | ||
| self, text_generation_controller: TextGenerationController, | ||
| max_batch_size: Optional[int] = None, | ||
| random_seed: Optional[int] = None, | ||
| ): | ||
| self.controller = text_generation_controller | ||
| self.inference_wrapped_model = self.controller.inference_wrapped_model | ||
| self.config = self.inference_wrapped_model.config | ||
| self.random_seed = random_seed or 1234 | ||
| self.scheduler = Scheduler(max_batch_size=max_batch_size) |
There was a problem hiding this comment.
random_seed or 1234 silently coerces None and 0 to 1234, making inference always deterministically seeded.
Line 37: self.random_seed = random_seed or 1234 means:
- Passing
random_seed=None(the default frombase.py) → seeds with 1234 instead of not seeding. - Passing
random_seed=0→ seeds with 1234 instead of 0.
Combined with line 50 (if self.random_seed: — always truthy), every call to generate will set a deterministic seed. This is inconsistent with base.py line 164 where random_seed defaults to None, implying "no fixed seed."
Proposed fix
- self.random_seed = random_seed or 1234
+ self.random_seed = random_seed📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def __init__( | |
| self, text_generation_controller: TextGenerationController, | |
| max_batch_size: Optional[int] = None, | |
| random_seed: Optional[int] = None, | |
| ): | |
| self.controller = text_generation_controller | |
| self.inference_wrapped_model = self.controller.inference_wrapped_model | |
| self.config = self.inference_wrapped_model.config | |
| self.random_seed = random_seed or 1234 | |
| self.scheduler = Scheduler(max_batch_size=max_batch_size) | |
| def __init__( | |
| self, text_generation_controller: TextGenerationController, | |
| max_batch_size: Optional[int] = None, | |
| random_seed: Optional[int] = None, | |
| ): | |
| self.controller = text_generation_controller | |
| self.inference_wrapped_model = self.controller.inference_wrapped_model | |
| self.config = self.inference_wrapped_model.config | |
| self.random_seed = random_seed | |
| self.scheduler = Scheduler(max_batch_size=max_batch_size) |
🤖 Prompt for AI Agents
In `@src/megatron/bridge/inference/vlm/vlm_engine.py` around lines 29 - 38, The
__init__ currently uses "self.random_seed = random_seed or 1234", which coerces
None and 0 to 1234 and causes deterministic seeding; change it to preserve None
and allow 0 by assigning "self.random_seed = random_seed" (no fallback) and
update any checks (e.g., in generate or other methods that use self.random_seed)
from "if self.random_seed:" to "if self.random_seed is not None:" so seeding
only occurs when an explicit seed is provided and 0 is treated as a valid seed.
What does this PR do ?
Updates the inference code to reflect Megatron-LM API changes (in particular, the removal of
InferenceWrapperConfig).Changelog
GitHub Actions CI
See the CI sectionin the Contributing doc for how to trigger the CI. A Nvidia developer will need to approve and trigger the CI for external contributors.
Before your PR is "Ready for review"
Pre checks:
If you haven't finished some of the above items you can still open "Draft" PR.
Additional Information
Summary by CodeRabbit
Release Notes
Refactor
Tests