-
Notifications
You must be signed in to change notification settings - Fork 139
Introduced vLLM_multimodal model to save multimodal outputs #1136
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
📝 WalkthroughWalkthroughThe PR extends the inference system to support audio responses. It propagates Changes
Sequence Diagram(s)sequenceDiagram
participant gen as generate.py
participant model as BaseModel
participant mm as VLLMMultimodalModel
participant api as vLLM API
participant fs as File System
gen->>gen: Extract data_dir, output_dir<br/>from eval_config
gen->>mm: Instantiate with data_dir,<br/>output_dir
mm->>mm: Create output_dir/audio
mm->>api: Get chat completion
api-->>mm: Return response with audio
mm->>mm: _parse_chat_completion_response()
mm->>mm: Call base _parse_chat_completion_response()
mm->>mm: Check for audio in response
alt Audio present
mm->>mm: _process_audio_response()
mm->>mm: Decode base64 audio
mm->>fs: Save as WAV file
fs-->>mm: File saved
mm->>mm: Return audio metadata
else No audio or no output_dir
mm->>mm: Include base64 in metadata
end
mm-->>gen: Return parsed response<br/>with audio dict
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
nemo_skills/inference/model/vllm_multimodal.py (1)
52-82: Consider using dynamic file extension based on format.The filename is hardcoded as
{response_id}.wav(line 67), but the format is extracted fromaudio_data.format(line 55) which may not always be "wav". This creates a mismatch between the file extension and the actual audio format.🔎 Proposed fix
if self.output_audio_dir: try: audio_bytes = base64.b64decode(audio_base64) - filename = f"{response_id}.wav" + audio_format = audio_info["format"] + filename = f"{response_id}.{audio_format}" filepath = os.path.join(self.output_audio_dir, filename)nemo_skills/inference/generate.py (1)
402-406: Simplify the None check.The condition on line 403 uses
isinstance(self.cfg.eval_config.get("data_dir"), type(None)), which is verbose and unconventional. Consider using a more idiomatic approach.🔎 Proposed refactor
self.data_dir = None - if "data_dir" in self.cfg.eval_config and not isinstance(self.cfg.eval_config.get("data_dir"), type(None)): + if self.cfg.eval_config.get("data_dir") is not None: self.data_dir = self.cfg.eval_config["data_dir"]
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
nemo_skills/inference/generate.py(2 hunks)nemo_skills/inference/model/__init__.py(2 hunks)nemo_skills/inference/model/base.py(1 hunks)nemo_skills/inference/model/vllm_multimodal.py(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (3)
nemo_skills/inference/generate.py (1)
nemo_skills/inference/model/__init__.py (2)
get_code_execution_model(85-91)get_model(62-82)
nemo_skills/inference/model/vllm_multimodal.py (3)
nemo_skills/utils.py (1)
get_logger_name(39-43)nemo_skills/inference/model/vllm.py (1)
VLLMModel(27-148)nemo_skills/inference/model/base.py (1)
_parse_chat_completion_response(356-391)
nemo_skills/inference/model/__init__.py (1)
nemo_skills/inference/model/vllm_multimodal.py (1)
VLLMMultimodalModel(26-82)
🪛 Ruff (0.14.8)
nemo_skills/inference/model/vllm_multimodal.py
76-76: Do not catch blind exception: Exception
(BLE001)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: pre-commit
- GitHub Check: unit-tests
🔇 Additional comments (6)
nemo_skills/inference/model/vllm_multimodal.py (3)
34-40: LGTM!The initialization logic correctly creates the audio output directory when
output_diris provided. The use ofexist_ok=Trueensures safe directory creation.
76-78: Broad exception handling is appropriate here.While static analysis flags the broad
Exceptioncatch, this is intentional to ensure graceful degradation. The code logs a warning and falls back to including base64 data, which preserves functionality when file operations fail. This pattern is acceptable for non-critical operations with a safe fallback.
42-50: Add defensive programming forresponse.idaccess.The code assumes
response.idexists without checking. While vLLM is OpenAI-compatible and should provide this field as standard, defensive programming would prevent failures if the field is missing in edge cases or non-standard configurations. Consider usinggetattr(response, 'id', fallback_value)or add error handling to gracefully handle cases whereresponse.idis unavailable.nemo_skills/inference/model/base.py (1)
78-85: LGTM!The addition of
data_dirandoutput_dirparameters is clean and follows Python conventions. The defaults ensure backward compatibility, and the type hints are clear.nemo_skills/inference/model/__init__.py (1)
42-42: LGTM!The import and registration of
VLLMMultimodalModelfollow the established pattern for other model types. This enables instantiation viaserver_type="vllm_multimodal"through the standard model loading pathway.Also applies to: 55-55
nemo_skills/inference/generate.py (1)
408-430: LGTM!The propagation of
data_dirandoutput_dirto all model creation paths is consistent and correct. The patternself.data_dir or ""ensures the default empty string is used whendata_diris None, aligning with theBaseModelsignature.
Signed-off-by: Valentin Mendelev <[email protected]>
246fa01 to
7bd17f2
Compare
karpnv
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
…12_vllm_multimodal
Greptile SummaryAdded a new Key changes:
Confidence Score: 4/5
Important Files Changed
Sequence DiagramsequenceDiagram
participant User
participant GenerationTask
participant VLLMMultimodalModel
participant OpenAI_API
participant FileSystem
User->>GenerationTask: Initialize with config
GenerationTask->>GenerationTask: setup_llm()
GenerationTask->>GenerationTask: Extract data_dir from eval_config
GenerationTask->>GenerationTask: Determine output_dir from output_file
GenerationTask->>VLLMMultimodalModel: get_model(data_dir, output_dir, ...)
VLLMMultimodalModel->>FileSystem: Create audio output directory
User->>VLLMMultimodalModel: generate_async(prompt)
VLLMMultimodalModel->>OpenAI_API: Send request
OpenAI_API-->>VLLMMultimodalModel: Response with audio + debug_info
VLLMMultimodalModel->>VLLMMultimodalModel: _parse_chat_completion_response()
VLLMMultimodalModel->>VLLMMultimodalModel: Extract debug_info from content
VLLMMultimodalModel->>VLLMMultimodalModel: _process_audio_response()
VLLMMultimodalModel->>VLLMMultimodalModel: Base64 decode audio
VLLMMultimodalModel->>FileSystem: Save audio as WAV file
VLLMMultimodalModel->>VLLMMultimodalModel: Strip audio data from serialized_output
VLLMMultimodalModel->>VLLMMultimodalModel: Strip debug_info from content
VLLMMultimodalModel-->>User: Return result with audio path
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
4 files reviewed, 2 comments
| self.sandbox = get_sandbox(**self.cfg.sandbox) if self.cfg.sandbox is not None else None | ||
|
|
||
| self.data_dir = None | ||
| if "data_dir" in self.cfg.eval_config and not isinstance(self.cfg.eval_config.get("data_dir"), type(None)): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style: Use is not None instead of isinstance(x, type(None)) for null checks
| if "data_dir" in self.cfg.eval_config and not isinstance(self.cfg.eval_config.get("data_dir"), type(None)): | |
| if "data_dir" in self.cfg.eval_config and self.cfg.eval_config.get("data_dir") is not None: |
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
| try: | ||
| audio_bytes = base64.b64decode(audio_base64) | ||
| filename = f"{response_id}.wav" | ||
| filepath = os.path.join(self.output_audio_dir, filename) | ||
|
|
||
| with open(filepath, "wb") as f: | ||
| f.write(audio_bytes) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style: Base64 decode wrapped in try/catch at line 94, but decoded data isn't validated before writing to file - malformed audio data could be written
…eMo#1136) Signed-off-by: Valentin Mendelev <[email protected]> Co-authored-by: Nikolay Karpov <[email protected]> Signed-off-by: dlord <[email protected]>
Signed-off-by: Valentin Mendelev <[email protected]> Co-authored-by: Nikolay Karpov <[email protected]> Signed-off-by: Cheng-Ping Hsieh <[email protected]>
This model class is needed to enable parsing of the OpenAI API server output to get non-text (audio in this case) data and saving them into separate files in order to keep output.jsonl small.
Summary by CodeRabbit
Release Notes
New Features
Improvements
✏️ Tip: You can customize this high-level summary in your review settings.