-
Notifications
You must be signed in to change notification settings - Fork 474
feat: add llava_onevision1_5 #825
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add llava_onevision1_5 #825
Conversation
|
Caution Review failedThe pull request is closed. WalkthroughAdds a new Llava_OneVision1_5 simple model wrapper (multi-modal generation, image/video handling, generate_until), registers it in the model registry, adds an example Accelerate launch script exporting Changes
Sequence Diagram(s)sequenceDiagram
autonumber
actor User
participant Shell as examples/.../llava_onevision1_5.sh
participant Accelerate
participant CLI as lmms_eval CLI
participant Registry as Model Registry
participant Model as Llava_OneVision1_5
participant HF as HuggingFace Hub
participant Proc as AutoProcessor/Tokenizer
User->>Shell: run script
Shell->>Accelerate: launch (8 procs, port=12399)
Accelerate->>CLI: invoke lmms_eval with model & tasks
CLI->>Registry: resolve "llava_onevision1_5"
Registry-->>CLI: Llava_OneVision1_5
CLI->>Model: construct with model_args
Model->>HF: from_pretrained (weights/config & processor/tokenizer)
HF-->>Model: model + processor/tokenizer assets
CLI->>Model: generate_until(requests)
Model->>Proc: preprocess messages + visuals
Model->>Model: model.generate(...)
Model-->>CLI: decoded outputs
CLI-->>User: evaluation results
sequenceDiagram
autonumber
participant Evaluator
participant Model as Llava_OneVision1_5
participant Collator
participant Vision
participant Proc as Processor
participant HFModel as CausalLM
Evaluator->>Model: generate_until(requests)
Model->>Collator: group/sort by token length
loop per batch
Model->>Vision: resolve images/videos & sample frames
Model->>Proc: build inputs (messages + visuals)
Model->>HFModel: generate (max_new_tokens, sampling/greedy)
HFModel-->>Model: token ids
Model->>Proc: decode + trim until-stop
Model-->>Evaluator: batch outputs
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested reviewers
Poem
Pre-merge checks and finishing touches❌ Failed checks (2 warnings)
✅ Passed checks (1 passed)
📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: CHILL Plan: Pro 📒 Files selected for processing (1)
Tip 👮 Agentic pre-merge checks are now available in preview!Pro plan users can now enable pre-merge checks in their settings to enforce checklists before merging PRs.
Please see the documentation for more information. Example: reviews:
pre_merge_checks:
custom_checks:
- name: "Undocumented Breaking Changes"
mode: "warning"
instructions: |
Pass/fail criteria: All breaking changes to public APIs, CLI flags, environment variables, configuration keys, database schemas, or HTTP/GraphQL endpoints must be documented in the "Breaking Change" section of the PR description and in CHANGELOG.md. Exclude purely internal or private changes (e.g., code not exported from package entry points or explicitly marked as internal).Please share your feedback with us on this Discord post. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
| class Llava_OneVision1_5(lmms): | ||
| """ | ||
| Llava_OneVision1_5 Model | ||
| "https://huggingface.co/Deep-VLM/LLaVA-One-Vision-1.5-8B-Instruct" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
error link
|
|
||
| def __init__( | ||
| self, | ||
| pretrained: str = "Deep-VLM/LLaVA-One-Vision-1.5-8B-Instruct", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as above
| from transformers import ( | ||
| AutoProcessor, | ||
| AutoTokenizer, | ||
| Qwen2_5_VLForConditionalGeneration, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
useless class
|
|
||
| accelerate launch --num_processes=8 --main_process_port 12399 -m lmms_eval \ | ||
| --model=llava_onevision1_5 \ | ||
| --model_args=pretrained=Deep-VLM/LLaVA-One-Vision-1.5-8B-Instruct,,attn_implementation=flash_attention_2,interleave_visuals=False,max_pixels=3240000 \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
interleave_visuals has been set False, don't set in this line again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
🧹 Nitpick comments (8)
lmms_eval/models/simple/llava_onevision1_5.py (7)
238-243: Avoid decoding the first video frame just to read size.This allocates unnecessarily and isn’t used. Remove to reduce I/O and memory pressure.
- vr = decord.VideoReader(visual) - first_frame = vr[0].asnumpy() - height, width = first_frame.shape[:2] - # max_pixels = height * width processed_visuals.append({"type": "video", "video": visual, "max_pixels": self.max_pixels, "min_pixels": self.min_pixels})
210-214: Raise TypeError for invalid 'until' type and trim message (Ruff TRY004/TRY003).- elif not isinstance(until, list): - raise ValueError(f"Expected `gen_kwargs['until']` to be of type Union[str, list], but got {type(until)}") + elif not isinstance(until, list): + raise TypeError("gen_kwargs['until'] must be str or list[str]")
251-257: Prefer iterable unpacking over concatenation (RUF005).- "content": processed_visuals + [{"type": "text", "text": context}], + "content": [*processed_visuals, {"type": "text", "text": context}],
221-229: Remove duplicateplaceholder stripping.
You strip "
" twice. Keep the per-item cleanup, drop the pre-loop.
- for i in range(len(contexts)): - if "<image>" in contexts[i]: - contexts[i] = contexts[i].replace("<image>", "") - batched_messages = [] for i, context in enumerate(contexts): if "<image>" in context: context = context.replace("<image>", "")
231-234: Safer reasoning prompt join.Preserve whitespace with an explicit newline.
- if self.reasoning_prompt: - context = context.strip() + self.reasoning_prompt - contexts[i] = context + if self.reasoning_prompt: + context = f"{context.strip()}\n{self.reasoning_prompt}" + contexts[i] = context
4-4: Type hints for public APIs (Pyright).Add Any import; type the flatten param/return and requests type for multi-round.
-from typing import List, Optional, Tuple, Union +from typing import Any, List, Optional, Tuple, Union @@ - def flatten(self, input): + def flatten(self, input: List[List[Any]]) -> List[Any]: @@ - def generate_until_multi_round(self, requests) -> List[str]: + def generate_until_multi_round(self, requests: List[Instance]) -> List[str]:Also consider short docstrings for public methods per guidelines.
Also applies to: 174-180, 346-347
61-65: Exception text length (TRY003).Messages are long; consider shorter text or custom exception types if you want verbose context from their str.
examples/models/llava_onevision1_5.sh (1)
1-1: Add shebang and fix HF_HOME (SC2148, SC2088).+#!/usr/bin/env bash +set -euo pipefail -export HF_HOME="~/.cache/huggingface" +export HF_HOME="$HOME/.cache/huggingface"
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
examples/models/llava_onevision1_5.sh(1 hunks)lmms_eval/models/__init__.py(1 hunks)lmms_eval/models/simple/llava_onevision1_5.py(1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: Type hints are required for all Python code
Public APIs must have docstrings
Maximum line length is 88 characters
Use PEP 8 naming: snake_case for functions/variables
Class names must use PascalCase
Constants should be in UPPER_SNAKE_CASE
Use f-strings for string formatting
Use early returns to avoid nested conditions
Use descriptive names; prefix handler functions with 'handle'
Prefer constants over functions where possible
Prefer functional, immutable approaches when not verbose
Define composing (higher-level) functions before their components
Mark issues in existing code with TODO: prefix in comments
Use functional and stateless approaches where they improve clarity
Use Ruff to enforce: import sorting (I001) and no unused imports
For long strings, wrap using parentheses rather than backslashes
Format long function calls over multiple lines with proper indentation
Split long import lists across multiple lines
Use Pyright type checking: add explicit None checks for Optional values
Use Pyright type narrowing for strings where applicable
Use Ruff (via pre-commit) to format and lint Python files
Document public APIs and test thoroughly
Files:
lmms_eval/models/__init__.pylmms_eval/models/simple/llava_onevision1_5.py
🧬 Code graph analysis (1)
lmms_eval/models/simple/llava_onevision1_5.py (4)
lmms_eval/api/instance.py (2)
Instance(6-29)args(25-29)lmms_eval/api/model.py (1)
lmms(26-331)lmms_eval/api/registry.py (1)
register_model(11-24)lmms_eval/utils.py (3)
Collator(865-1020)chunks(136-171)get_batched(891-913)
🪛 Shellcheck (0.10.0)
examples/models/llava_onevision1_5.sh
[error] 1-1: Tips depend on target shell and yours is unknown. Add a shebang or a 'shell' directive.
(SC2148)
[warning] 1-1: Tilde does not expand in quotes. Use $HOME.
(SC2088)
🪛 Ruff (0.12.2)
lmms_eval/models/simple/llava_onevision1_5.py
64-64: Avoid specifying long messages outside the exception class
(TRY003)
72-72: Avoid specifying long messages outside the exception class
(TRY003)
213-213: Prefer TypeError exception for invalid type
(TRY004)
213-213: Avoid specifying long messages outside the exception class
(TRY003)
255-255: Consider iterable unpacking instead of concatenation
Replace with iterable unpacking
(RUF005)
🔇 Additional comments (2)
lmms_eval/models/simple/llava_onevision1_5.py (1)
93-101: Accelerate/device-map usage looks sound.Model init, device selection, and distributed prepare flows are consistent. Good use of unwrap in property.
If you plan to support CPU fallback, add a guard for device="cpu" when device_map="auto".
Also applies to: 102-106, 112-121
lmms_eval/models/__init__.py (1)
39-40: Approve — registry mapping verifiedFound @register_model("llava_onevision1_5") and class Llava_OneVision1_5 in lmms_eval/models/simple/llava_onevision1_5.py, and lmms_eval/models/init.py contains "llava_onevision1_5": "Llava_OneVision1_5". No changes required.
| accelerate launch --num_processes=8 --main_process_port 12399 -m lmms_eval \ | ||
| --model=llava_onevision1_5 \ | ||
| --model_args=pretrained=Deep-VLM/LLaVA-One-Vision-1.5-8B-Instruct,,attn_implementation=flash_attention_2,interleave_visuals=False,max_pixels=3240000 \ | ||
| --tasks=mmerealworld,mmerealworld_cn,chartqa,docvqa_val,infovqa_val,mmstar,ocrbench \ | ||
| --batch_size=1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix double comma and quote model_args to prevent shell splitting.
There’s an extra comma after pretrained that will break parsing.
- --model_args=pretrained=Deep-VLM/LLaVA-One-Vision-1.5-8B-Instruct,,attn_implementation=flash_attention_2,interleave_visuals=False,max_pixels=3240000 \
+ --model_args="pretrained=Deep-VLM/LLaVA-One-Vision-1.5-8B-Instruct,attn_implementation=flash_attention_2,interleave_visuals=False,max_pixels=3240000" \Optional: make tasks a var and pass via --tasks "$TASKS" for readability.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| accelerate launch --num_processes=8 --main_process_port 12399 -m lmms_eval \ | |
| --model=llava_onevision1_5 \ | |
| --model_args=pretrained=Deep-VLM/LLaVA-One-Vision-1.5-8B-Instruct,,attn_implementation=flash_attention_2,interleave_visuals=False,max_pixels=3240000 \ | |
| --tasks=mmerealworld,mmerealworld_cn,chartqa,docvqa_val,infovqa_val,mmstar,ocrbench \ | |
| --batch_size=1 | |
| accelerate launch --num_processes=8 --main_process_port 12399 -m lmms_eval \ | |
| --model=llava_onevision1_5 \ | |
| --model_args="pretrained=Deep-VLM/LLaVA-One-Vision-1.5-8B-Instruct,attn_implementation=flash_attention_2,interleave_visuals=False,max_pixels=3240000" \ | |
| --tasks=mmerealworld,mmerealworld_cn,chartqa,docvqa_val,infovqa_val,mmstar,ocrbench \ | |
| --batch_size=1 |
🤖 Prompt for AI Agents
In examples/models/llava_onevision1_5.sh around lines 6 to 10, the --model_args
string contains a double comma after pretrained which will break parsing and the
whole argument should be quoted to avoid shell splitting; remove the extra comma
so the list is valid (e.g., change
"pretrained=Deep-VLM/...,,attn_implementation" to a single comma) and wrap the
entire --model_args value in quotes, and optionally define
TASKS="mmerealworld,mmerealworld_cn,chartqa,docvqa_val,infovqa_val,mmstar,ocrbench"
and pass it as --tasks "$TASKS" for readability and safe expansion.
| Qwen2_5_VLForConditionalGeneration, | ||
| AutoModelForCausalLM | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix Qwen-specific leftovers and vision pipeline mismatch (will break LLaVA).
- Qwen class import is unused, and qwen_vl_utils is optional but unguarded at callsite.
- Error message/comments reference Qwen2.5 VL.
- More importantly, process_vision_info and the base64 data-URL image schema are Qwen-specific. LLaVA-One-Vision expects raw PIL images (and its own chat/processor semantics). As-is, this will likely fail at runtime or silently degrade.
Action:
- Remove Qwen import, guard qwen_vl_utils usage, and provide a processor-only fallback collecting PIL images. Update messages/comments to LLaVA.
Apply focused fixes:
@@
-from transformers import (
- AutoProcessor,
- AutoTokenizer,
- Qwen2_5_VLForConditionalGeneration,
- AutoModelForCausalLM
-)
+from transformers import (
+ AutoProcessor,
+ AutoTokenizer,
+ AutoModelForCausalLM,
+)
@@
-try:
- from qwen_vl_utils import process_vision_info
-except ImportError:
- eval_logger.warning("Failed to import qwen_vl_utils; Please install it via `pip install qwen-vl-utils`")
+try:
+ from qwen_vl_utils import process_vision_info # type: ignore
+except ImportError:
+ process_vision_info = None
+ eval_logger.warning(
+ "qwen_vl_utils not installed; falling back to processor-only vision handling"
+ )
@@
- def loglikelihood(self, requests: List[Instance]) -> List[Tuple[float, bool]]:
- raise NotImplementedError("Loglikelihood is not implemented for Qwen2.5_VL")
+ def loglikelihood(self, requests: List[Instance]) -> List[Tuple[float, bool]]:
+ raise NotImplementedError(
+ "Loglikelihood is not implemented for LLaVA-One-Vision yet"
+ )
@@
- # Avoid using '\n\n' as a stopper for Qwen2.5VL to prevent truncation, which can lead to incorrect results
+ # Avoid overly generic newline stoppers; they can truncate valuable content
+ # in some vision LMs. Prefer explicit task-specific stoppers.
until = [item for item in until if item != "\n\n"]Then replace the hard dependency on qwen_vl_utils during batching with a safe fallback (images only):
@@
- texts = [self.processor.apply_chat_template(msg, tokenize=False, add_generation_prompt=True) for msg in batched_messages]
- image_inputs, video_inputs = process_vision_info(batched_messages)
+ texts = [
+ self.processor.apply_chat_template(
+ msg, tokenize=False, add_generation_prompt=True
+ )
+ for msg in batched_messages
+ ]
+ if process_vision_info is not None:
+ image_inputs, video_inputs = process_vision_info(batched_messages)
+ else:
+ # Fallback: collect PIL images from the assembled messages for LLaVA
+ image_inputs, video_inputs = [], None
+ for msg in batched_messages:
+ user = next(p for p in msg if p["role"] == "user")["content"]
+ imgs = [
+ part["image"]
+ for part in user
+ if isinstance(part, dict)
+ and part.get("type") == "image"
+ and isinstance(part.get("image"), Image.Image)
+ ]
+ image_inputs.append(imgs or None)Note: If video is required without qwen_vl_utils, wire a LLaVA-native video path (or raise for now). I can provide that if you want it in this PR.
Also applies to: 25-29, 171-173, 215-217
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
♻️ Duplicate comments (6)
examples/models/llava_onevision1_5.sh (1)
6-10: Fix double comma and quote model_args; optionally factor TASKS.Unquoted --model_args will split on spaces, and there’s a stray “,,” after pretrained.
accelerate launch --num_processes=8 --main_process_port 12399 -m lmms_eval \ --model=llava_onevision1_5 \ - --model_args=pretrained=Deep-VLM/LLaVA-One-Vision-1.5-8B-Instruct,,attn_implementation=flash_attention_2,max_pixels=3240000 \ - --tasks=mmerealworld,mmerealworld_cn,chartqa,docvqa_val,infovqa_val,mmstar,ocrbench \ + --model_args="pretrained=Deep-VLM/LLaVA-One-Vision-1.5-8B-Instruct,attn_implementation=flash_attention_2,max_pixels=3240000" \ + --tasks="${TASKS:-mmerealworld,mmerealworld_cn,chartqa,docvqa_val,infovqa_val,mmstar,ocrbench}" \ --batch_size=1lmms_eval/models/simple/llava_onevision1_5.py (5)
170-172: Fix model name in error; it references Qwen.Update to LLaVA to avoid confusion.
- raise NotImplementedError("Loglikelihood is not implemented for Qwen2.5_VL") + raise NotImplementedError( + "Loglikelihood is not implemented for LLaVA-One-Vision yet" + )
24-28: Import fallback bug: process_vision_info may be undefined (NameError).If qwen_vl_utils isn’t installed, later use at Line 282 will crash.
try: from qwen_vl_utils import process_vision_info except ImportError: - eval_logger.warning("Failed to import qwen_vl_utils; Please install it via `pip install qwen-vl-utils`") + process_vision_info = None # type: ignore[assignment] + eval_logger.warning( + "qwen_vl_utils not installed; falling back to processor-only vision handling" + )
281-291: Guard Qwen helper and add LLaVA-compatible fallback (PIL images).Unconditionally calling process_vision_info will fail when absent and mismatches LLaVA’s expected PIL images.
- texts = [self.processor.apply_chat_template(msg, tokenize=False, add_generation_prompt=True) for msg in batched_messages] - image_inputs, video_inputs = process_vision_info(batched_messages) + texts = [ + self.processor.apply_chat_template( + msg, tokenize=False, add_generation_prompt=True + ) + for msg in batched_messages + ] + if process_vision_info is not None: + image_inputs, video_inputs = process_vision_info(batched_messages) + else: + # Fallback: collect PIL images from message content for LLaVA + image_inputs, video_inputs = [], None + for msg in batched_messages: + user = next(p for p in msg if p["role"] == "user")["content"] + imgs = [ + part["image"] + for part in user + if isinstance(part, dict) + and part.get("type") == "image" + and isinstance(part.get("image"), Image.Image) + ] + image_inputs.append(imgs or None)
234-249: Use raw PIL images; drop base64 data-URL path.Base64 is Qwen-style and wastes memory; LLaVA processors accept PIL directly.
for visual in visual_list[i]: @@ - elif isinstance(visual, Image.Image): # Handle both single and multiple images - base64_image = visual.convert("RGB") - buffer = BytesIO() - base64_image.save(buffer, format="JPEG") - base64_bytes = base64.b64encode(buffer.getvalue()) - base64_string = base64_bytes.decode("utf-8") - processed_visuals.append({"type": "image", "image": f"data:image/jpeg;base64,{base64_string}", "max_pixels": self.max_pixels, "min_pixels": self.min_pixels}) + elif isinstance(visual, Image.Image): + processed_visuals.append( + {"type": "image", "image": visual.convert("RGB")} + )
297-325: Never pass None into generate; add inference_mode and pad fallback.Passing temperature/top_p=None can break transformers; also ensure pad_token_id fallback and use inference mode.
- pad_token_id = self.tokenizer.pad_token_id - - if current_gen_kwargs["temperature"] > 0: - current_gen_kwargs["do_sample"] = True - else: - current_gen_kwargs["do_sample"] = False - current_gen_kwargs["temperature"] = None - current_gen_kwargs["top_p"] = None - - cont = self.model.generate( - **inputs, - eos_token_id=self.tokenizer.eos_token_id, - pad_token_id=pad_token_id, - do_sample=current_gen_kwargs["do_sample"], - temperature=current_gen_kwargs["temperature"], - top_p=current_gen_kwargs["top_p"], - num_beams=current_gen_kwargs["num_beams"], - max_new_tokens=current_gen_kwargs["max_new_tokens"], - use_cache=self.use_cache, - ) + pad_token_id = self.tokenizer.pad_token_id or self.tokenizer.eos_token_id + do_sample = bool( + current_gen_kwargs.get("temperature", 0) + and current_gen_kwargs["temperature"] > 0 + ) + gen_args = { + **inputs, + "eos_token_id": self.tokenizer.eos_token_id, + "pad_token_id": pad_token_id, + "num_beams": int(current_gen_kwargs["num_beams"]), + "max_new_tokens": int(current_gen_kwargs["max_new_tokens"]), + "use_cache": self.use_cache, + } + if do_sample: + gen_args.update( + do_sample=True, + temperature=float(current_gen_kwargs.get("temperature", 1.0)), + top_p=float(current_gen_kwargs.get("top_p", 1.0)), + ) + with torch.inference_mode(): + cont = self.model.generate(**gen_args)
🧹 Nitpick comments (6)
examples/models/llava_onevision1_5.sh (1)
1-1: Add shebang; fix HF_HOME per shellcheck.Declare bash, enable strict mode, and avoid quoted tilde which doesn’t expand.
+#!/usr/bin/env bash +set -euo pipefail -export HF_HOME="~/.cache/huggingface" +export HF_HOME="${HF_HOME:-$HOME/.cache/huggingface}"lmms_eval/models/simple/llava_onevision1_5.py (5)
173-179: Type hints + small cleanup for flatten; avoid shadowing builtins.Rename param and add typing; list comprehension is simpler.
- def flatten(self, input): - new_list = [] - for i in input: - for j in i: - new_list.append(j) - return new_list + def flatten(self, items: List[List[object]]) -> List[object]: + return [j for i in items for j in i]
214-216: Generalize the stopper comment (remove Qwen-specific wording).Keep guidance model-agnostic.
- # Avoid using '\n\n' as a stopper for Qwen2.5VL to prevent truncation, which can lead to incorrect results + # Avoid overly generic newline stoppers; they can truncate useful content
283-289: Only the first video stream is sampled.If batch contains multiple videos, others aren’t frame-sampled. Iterate over all video_inputs.
- if video_inputs is not None: - total_frames = video_inputs[0].shape[0] - indices = np.linspace(0, total_frames - 1, self.max_num_frames, dtype=int) - if total_frames - 1 not in indices: - indices = np.append(indices, total_frames - 1) - video_inputs[0] = video_inputs[0][indices] + if video_inputs is not None: + for vi in range(len(video_inputs)): + total_frames = video_inputs[vi].shape[0] + indices = np.linspace(0, total_frames - 1, self.max_num_frames, dtype=int) + if total_frames - 1 not in indices: + indices = np.append(indices, total_frames - 1) + video_inputs[vi] = video_inputs[vi][indices]
229-233: Add a separator before reasoning_prompt.Avoid concatenating words when context doesn’t end with whitespace.
- if self.reasoning_prompt: - context = context.strip() + self.reasoning_prompt + if self.reasoning_prompt: + context = context.rstrip() + "\n" + self.reasoning_prompt.lstrip()
1-7: Drop now-unused imports if you switch to PIL-only images.base64 and BytesIO become dead after the suggested change.
-import base64 @@ -from io import BytesIO
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
examples/models/llava_onevision1_5.sh(1 hunks)lmms_eval/models/simple/llava_onevision1_5.py(1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: Type hints are required for all Python code
Public APIs must have docstrings
Maximum line length is 88 characters
Use PEP 8 naming: snake_case for functions/variables
Class names must use PascalCase
Constants should be in UPPER_SNAKE_CASE
Use f-strings for string formatting
Use early returns to avoid nested conditions
Use descriptive names; prefix handler functions with 'handle'
Prefer constants over functions where possible
Prefer functional, immutable approaches when not verbose
Define composing (higher-level) functions before their components
Mark issues in existing code with TODO: prefix in comments
Use functional and stateless approaches where they improve clarity
Use Ruff to enforce: import sorting (I001) and no unused imports
For long strings, wrap using parentheses rather than backslashes
Format long function calls over multiple lines with proper indentation
Split long import lists across multiple lines
Use Pyright type checking: add explicit None checks for Optional values
Use Pyright type narrowing for strings where applicable
Use Ruff (via pre-commit) to format and lint Python files
Document public APIs and test thoroughly
Files:
lmms_eval/models/simple/llava_onevision1_5.py
🧬 Code graph analysis (1)
lmms_eval/models/simple/llava_onevision1_5.py (4)
lmms_eval/api/instance.py (2)
Instance(6-29)args(25-29)lmms_eval/api/model.py (2)
lmms(26-331)add_partial(348-352)lmms_eval/api/registry.py (1)
register_model(11-24)lmms_eval/utils.py (3)
Collator(865-1020)chunks(136-171)get_batched(891-913)
🪛 Shellcheck (0.10.0)
examples/models/llava_onevision1_5.sh
[error] 1-1: Tips depend on target shell and yours is unknown. Add a shebang or a 'shell' directive.
(SC2148)
[warning] 1-1: Tilde does not expand in quotes. Use $HOME.
(SC2088)
🪛 Ruff (0.12.2)
lmms_eval/models/simple/llava_onevision1_5.py
63-63: Avoid specifying long messages outside the exception class
(TRY003)
71-71: Avoid specifying long messages outside the exception class
(TRY003)
212-212: Prefer TypeError exception for invalid type
(TRY004)
212-212: Avoid specifying long messages outside the exception class
(TRY003)
254-254: Consider iterable unpacking instead of concatenation
Replace with iterable unpacking
(RUF005)
| @property | ||
| def config(self): | ||
| # return the associated transformers.AutoConfig for the given pretrained model. | ||
| return self._config | ||
|
|
||
| @property | ||
| def tokenizer(self): | ||
| return self._tokenizer | ||
|
|
||
| @property | ||
| def model(self): | ||
| # returns the model, unwrapping it if using Accelerate | ||
| if hasattr(self, "accelerator"): | ||
| return self.accelerator.unwrap_model(self._model) | ||
| else: | ||
| return self._model | ||
|
|
||
| @property | ||
| def eot_token_id(self): | ||
| return self.tokenizer.eos_token_id | ||
|
|
||
| @property | ||
| def max_length(self): | ||
| return self._max_length | ||
|
|
||
| @property | ||
| def batch_size(self): | ||
| return self.batch_size_per_gpu | ||
|
|
||
| @property | ||
| def device(self): | ||
| return self._device | ||
|
|
||
| @property | ||
| def rank(self): | ||
| return self._rank | ||
|
|
||
| @property | ||
| def world_size(self): | ||
| return self._world_size | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
🧩 Analysis chain
Add return type annotations to public properties.
config/tokenizer/model/eot_token_id/max_length/batch_size/device/rank/world_size should declare return types to satisfy “Type hints required for all Python code.”
Run this to list untyped defs/props:
🏁 Script executed:
#!/usr/bin/env bash
rg -nP '^\s*(def|@property)\s+\w+\([^)]*\)(?!\s*->)' -n lmms_eval/models/simple/llava_onevision1_5.pyLength of output: 83
🏁 Script executed:
#!/usr/bin/env bash
set -euo pipefail
python3 - <<'PY'
import ast, sys, os
path = "lmms_eval/models/simple/llava_onevision1_5.py"
if not os.path.exists(path):
print(f"ERROR: {path} not found", file=sys.stderr)
sys.exit(2)
with open(path, "r", encoding="utf-8") as f:
src = f.read()
tree = ast.parse(src, filename=path)
def ann_to_str(n):
if n is None:
return None
try:
return ast.unparse(n)
except Exception:
try:
import astunparse
return astunparse.unparse(n).strip()
except Exception:
return repr(n)
entries = []
for node in tree.body:
if isinstance(node, ast.FunctionDef):
entries.append((node.name, node.lineno, any(isinstance(d, ast.Name) and d.id == "property" for d in node.decorator_list), ann_to_str(node.returns)))
if isinstance(node, ast.ClassDef):
for n in node.body:
if isinstance(n, ast.FunctionDef):
has_prop = any(
(isinstance(d, ast.Name) and d.id == "property")
or (isinstance(d, ast.Attribute) and getattr(d, "attr", None) == "property")
for d in n.decorator_list
)
entries.append((f"{node.name}.{n.name}", n.lineno, has_prop, ann_to_str(n.returns)))
entries.sort(key=lambda x: x[1])
for name, lineno, is_prop, ann in entries:
print(f"{name} (line {lineno}) {'@property' if is_prop else 'def'} -> return: {ann}")
missing = [e for e in entries if e[3] is None]
print()
print(f"Total functions/properties: {len(entries)}")
print(f"Without return annotations: {len(missing)}")
for name, lineno, is_prop, ann in missing:
print(f" - {name} (line {lineno}) {'@property' if is_prop else 'def'}")
PYLength of output: 1522
Add return type annotations to public properties (required)
These @Property methods and the flatten function in lmms_eval/models/simple/llava_onevision1_5.py lack return type annotations — add explicit return types to satisfy the repo rule "Type hints required for all Python code."
- Llava_OneVision1_5.config — line 130 — e.g. transformers.PretrainedConfig / transformers.AutoConfig
- Llava_OneVision1_5.tokenizer — line 135 — e.g. transformers.PreTrainedTokenizerBase
- Llava_OneVision1_5.model — line 139 — e.g. torch.nn.Module | transformers.PreTrainedModel
- Llava_OneVision1_5.eot_token_id — line 147 — int
- Llava_OneVision1_5.max_length — line 151 — int
- Llava_OneVision1_5.batch_size — line 155 — int
- Llava_OneVision1_5.device — line 159 — torch.device | str
- Llava_OneVision1_5.rank — line 163 — int
- Llava_OneVision1_5.world_size — line 167 — int
- Llava_OneVision1_5.flatten — line 173 — add explicit return annotation (pick the precise type from implementation)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (2)
examples/models/llava_onevision1_5.sh (2)
8-8: Quote --model_args to avoid accidental word splitting; content otherwise looks good.Quoting is safer and mirrors earlier feedback; your previous double-comma issue is resolved.
- --model_args=pretrained=lmms-lab/LLaVA-OneVision-1.5-8B-Instruct,attn_implementation=flash_attention_2,max_pixels=3240000 \ + --model_args="pretrained=lmms-lab/LLaVA-OneVision-1.5-8B-Instruct,attn_implementation=flash_attention_2,max_pixels=3240000" \
8-8: Resolved: removed redundant interleave_visuals override noted earlier.No duplicate interleave_visuals flag anymore. Looks good.
🧹 Nitpick comments (3)
examples/models/llava_onevision1_5.sh (3)
1-1: Add a shebang and strict mode for predictable execution.Without a shebang, the target shell is unknown (ShellCheck SC2148). Add bash shebang and strict flags.
+#!/usr/bin/env bash +set -euo pipefail export HF_HOME="~/.cache/huggingface"
1-1: Fix tilde expansion in HF_HOME.Quoted ~ does not expand (ShellCheck SC2088). Use $HOME.
-export HF_HOME="~/.cache/huggingface" +export HF_HOME="$HOME/.cache/huggingface"
9-10: Optional: factor tasks into a variable and quote for readability and safe expansion.Keeps the command cleaner and prevents surprises if a task name ever includes spaces.
- --tasks=mmerealworld,mmerealworld_cn,chartqa,docvqa_val,infovqa_val,mmstar,ocrbench \ +TASKS="mmerealworld,mmerealworld_cn,chartqa,docvqa_val,infovqa_val,mmstar,ocrbench" +accelerate launch --num_processes=8 --main_process_port 12399 -m lmms_eval \ + --model=llava_onevision1_5 \ + --model_args="pretrained=lmms-lab/LLaVA-OneVision-1.5-8B-Instruct,attn_implementation=flash_attention_2,max_pixels=3240000" \ + --tasks="$TASKS" \ --batch_size=1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
examples/models/llava_onevision1_5.sh(1 hunks)
🧰 Additional context used
🪛 Shellcheck (0.10.0)
examples/models/llava_onevision1_5.sh
[error] 1-1: Tips depend on target shell and yours is unknown. Add a shebang or a 'shell' directive.
(SC2148)
[warning] 1-1: Tilde does not expand in quotes. Use $HOME.
(SC2088)
| accelerate launch --num_processes=8 --main_process_port 12399 -m lmms_eval \ | ||
| --model=llava_onevision1_5 \ | ||
| --model_args=pretrained=lmms-lab/LLaVA-OneVision-1.5-8B-Instruct,attn_implementation=flash_attention_2,max_pixels=3240000 \ | ||
| --tasks=mmerealworld,mmerealworld_cn,chartqa,docvqa_val,infovqa_val,mmstar,ocrbench \ | ||
| --batch_size=1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
Sanity check: ensure flash_attention_2 is available on the target machines.
If FA2 isn’t installed or supported for the GPU/driver stack, runs will fail. Verify ahead of time.
🏁 Script executed:
#!/bin/bash
# Verify flash-attn import and CUDA compatibility.
python - <<'PY'
try:
import flash_attn # type: ignore
print("flash_attn import: OK")
except Exception as e:
print("flash_attn import failed:", e)
PYLength of output: 54
Critical: FlashAttention2 missing — install or guard attn_implementation usage (examples/models/llava_onevision1_5.sh: lines 6–10)
Local check failed: python import error "No module named 'flash_attn'". Runs with attn_implementation=flash_attention_2 will fail.
- Fix A: Install FlashAttention2 on target machines and document required CUDA/PyTorch/driver versions.
- Fix B: Detect availability at runtime in the script and fall back to a supported/default attention implementation or remove the flag.
🤖 Prompt for AI Agents
In examples/models/llava_onevision1_5.sh around lines 6 to 10, the script forces
attn_implementation=flash_attention_2 which will fail if FlashAttention2 is not
installed (python import error: No module named 'flash_attn'); either document
and instruct installing FlashAttention2 with required CUDA/PyTorch/driver
versions, or modify the script to detect FlashAttention2 availability at runtime
and if absent remove or replace the attn_implementation flag with a supported
default (e.g., None or a safe implementation) so the launch command won’t crash
on machines without flash_attn.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
♻️ Duplicate comments (3)
lmms_eval/models/simple/llava_onevision1_5.py (3)
215-217: Generalize Qwen-specific stopper comment.Keep guidance but remove model-specific wording.
- # Avoid using '\n\n' as a stopper for Qwen2.5VL to prevent truncation, which can lead to incorrect results + # Avoid overly generic newline stoppers; they can truncate useful content.
24-28: Remove Qwen dependency and add safe vision fallback (prevents NameError and LLaVA mismatch).
qwen_vl_utilsis optional but later called unguarded; this will crash if not installed and is Qwen‑specific. Guard the import and provide a processor‑only PIL image fallback aligned with LLaVA.Apply:
@@ try: from qwen_vl_utils import process_vision_info except ImportError: - eval_logger.warning("Failed to import qwen_vl_utils; Please install it via `pip install qwen-vl-utils`") + process_vision_info = None # type: ignore[assignment] + eval_logger.warning( + "qwen_vl_utils not installed; falling back to processor-only vision handling" + ) @@ - texts = [self.processor.apply_chat_template(msg, tokenize=False, add_generation_prompt=True) for msg in batched_messages] - image_inputs, video_inputs = process_vision_info(batched_messages) + texts = [ + self.processor.apply_chat_template( + msg, tokenize=False, add_generation_prompt=True + ) + for msg in batched_messages + ] + if process_vision_info is not None: + image_inputs, video_inputs = process_vision_info(batched_messages) + else: + # LLaVA: collect PIL images directly; no video support without qwen_vl_utils + image_inputs, video_inputs = [], None + for msg in batched_messages: + user = next(p for p in msg if p["role"] == "user")["content"] + imgs = [ + part["image"] + for part in user + if isinstance(part, dict) + and part.get("type") == "image" + and isinstance(part.get("image"), Image.Image) + ] + image_inputs.append(imgs or None)Also applies to: 279-289
296-323: Never pass None to generate; add pad_token fallback and inference_mode.Avoid None for temperature/top_p, ensure pad_token fallback, and wrap in inference mode.
@@ - pad_token_id = self.tokenizer.pad_token_id - - if current_gen_kwargs["temperature"] > 0: - current_gen_kwargs["do_sample"] = True - else: - current_gen_kwargs["do_sample"] = False - current_gen_kwargs["temperature"] = None - current_gen_kwargs["top_p"] = None - - cont = self.model.generate( - **inputs, - eos_token_id=self.tokenizer.eos_token_id, - pad_token_id=pad_token_id, - do_sample=current_gen_kwargs["do_sample"], - temperature=current_gen_kwargs["temperature"], - top_p=current_gen_kwargs["top_p"], - num_beams=current_gen_kwargs["num_beams"], - max_new_tokens=current_gen_kwargs["max_new_tokens"], - use_cache=self.use_cache, - ) + pad_token_id = self.tokenizer.pad_token_id or self.tokenizer.eos_token_id + do_sample = bool(current_gen_kwargs.get("temperature", 0) and current_gen_kwargs["temperature"] > 0) + gen_args = { + **inputs, + "eos_token_id": self.tokenizer.eos_token_id, + "pad_token_id": pad_token_id, + "num_beams": current_gen_kwargs["num_beams"], + "max_new_tokens": current_gen_kwargs["max_new_tokens"], + "use_cache": self.use_cache, + } + if do_sample: + gen_args.update( + do_sample=True, + temperature=float(current_gen_kwargs.get("temperature", 1.0)), + top_p=float(current_gen_kwargs.get("top_p", 1.0)), + ) + with torch.inference_mode(): + cont = self.model.generate(**gen_args)
🧹 Nitpick comments (6)
lmms_eval/models/simple/llava_onevision1_5.py (6)
1-4: Remove unused imports.
base64andBytesIOare unused.-import base64 import re -from io import BytesIO -from typing import List, Optional, Tuple, Union +from typing import Any, Iterable, List, Optional, Tuple, Union, TypeVar
171-173: Fix stale Qwen reference in error.Message should reference LLaVA.
- raise NotImplementedError("Loglikelihood is not implemented for Qwen2.5_VL") + raise NotImplementedError( + "Loglikelihood is not implemented for LLaVA-One-Vision yet" + )
221-224: Remove duplicate '' placeholder stripping.
The context cleanup happens twice.
- for i in range(len(contexts)): - if "<image>" in contexts[i]: - contexts[i] = contexts[i].replace("<image>", "") - batched_messages = [] for i, context in enumerate(contexts): - if "<image>" in context: - context = context.replace("<image>", "") - message = [{"role": "system", "content": self.system_prompt}]Also applies to: 226-230
237-246: Avoid unnecessary video I/O on hot path.Reading the first frame only to compute dims is wasted; you don’t use it.
- if isinstance(visual, str) and visual.endswith((".mp4", ".avi", ".mov")): # Video file - vr = decord.VideoReader(visual) - first_frame = vr[0].asnumpy() - height, width = first_frame.shape[:2] - # max_pixels = height * width - processed_visuals.append({"type": "video", "video": visual, "max_pixels": self.max_pixels, "min_pixels": self.min_pixels}) + if isinstance(visual, str) and visual.endswith((".mp4", ".avi", ".mov")): # Video file + processed_visuals.append({"type": "video", "video": visual})
252-253: Ruff RUF005: prefer iterable unpacking over list concat.- "content": processed_visuals + [{"type": "text", "text": context}], + "content": [*processed_visuals, {"type": "text", "text": context}],
30-35: Optional: rename class to PascalCase.PEP 8: class names shouldn’t use underscores. Suggested:
LlavaOneVision15. The registry alias can stay"llava_onevision1_5".-@register_model("llava_onevision1_5") -class Llava_OneVision1_5(lmms): +@register_model("llava_onevision1_5") +class LlavaOneVision15(lmms):
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
lmms_eval/models/simple/llava_onevision1_5.py(1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: Type hints are required for all Python code
Public APIs must have docstrings
Maximum line length is 88 characters
Use PEP 8 naming: snake_case for functions/variables
Class names must use PascalCase
Constants should be in UPPER_SNAKE_CASE
Use f-strings for string formatting
Use early returns to avoid nested conditions
Use descriptive names; prefix handler functions with 'handle'
Prefer constants over functions where possible
Prefer functional, immutable approaches when not verbose
Define composing (higher-level) functions before their components
Mark issues in existing code with TODO: prefix in comments
Use functional and stateless approaches where they improve clarity
Use Ruff to enforce: import sorting (I001) and no unused imports
For long strings, wrap using parentheses rather than backslashes
Format long function calls over multiple lines with proper indentation
Split long import lists across multiple lines
Use Pyright type checking: add explicit None checks for Optional values
Use Pyright type narrowing for strings where applicable
Use Ruff (via pre-commit) to format and lint Python files
Document public APIs and test thoroughly
Files:
lmms_eval/models/simple/llava_onevision1_5.py
🧬 Code graph analysis (1)
lmms_eval/models/simple/llava_onevision1_5.py (4)
lmms_eval/api/instance.py (2)
Instance(6-29)args(25-29)lmms_eval/api/model.py (2)
lmms(26-331)add_partial(348-352)lmms_eval/api/registry.py (1)
register_model(11-24)lmms_eval/utils.py (3)
Collator(865-1020)chunks(136-171)get_batched(891-913)
🪛 Ruff (0.12.2)
lmms_eval/models/simple/llava_onevision1_5.py
64-64: Avoid specifying long messages outside the exception class
(TRY003)
72-72: Avoid specifying long messages outside the exception class
(TRY003)
213-213: Prefer TypeError exception for invalid type
(TRY004)
213-213: Avoid specifying long messages outside the exception class
(TRY003)
252-252: Consider iterable unpacking instead of concatenation
Replace with iterable unpacking
(RUF005)
| except ImportError: | ||
| eval_logger.warning("Failed to import qwen_vl_utils; Please install it via `pip install qwen-vl-utils`") | ||
|
|
||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Add required type hints; avoid shadowing builtins in flatten.
Properties lack return annotations; flatten lacks types and uses input name.
@@
+T = TypeVar("T")
@@
- def config(self):
+ def config(self) -> Any:
@@
- def tokenizer(self):
+ def tokenizer(self) -> Any:
@@
- def model(self):
+ def model(self) -> torch.nn.Module:
@@
- def eot_token_id(self):
+ def eot_token_id(self) -> int:
@@
- def max_length(self):
+ def max_length(self) -> int:
@@
- def batch_size(self):
+ def batch_size(self) -> int:
@@
- def device(self):
+ def device(self) -> torch.device:
@@
- def rank(self):
+ def rank(self) -> int:
@@
- def world_size(self):
+ def world_size(self) -> int:
@@
- def flatten(self, input):
- new_list = []
- for i in input:
- for j in i:
- new_list.append(j)
- return new_list
+ def flatten(self, items: Iterable[Iterable[T]]) -> List[T]:
+ return [j for i in items for j in i]Also applies to: 130-169, 174-179
🤖 Prompt for AI Agents
In lmms_eval/models/simple/llava_onevision1_5.py around line 29 (and also for
the functions spanning lines 130-169 and 174-179), add explicit type hints for
property return types and function signatures and stop shadowing builtins:
change the flatten function parameter name from input to inputs (or similar),
add parameter and return types (e.g., inputs: Iterable[Any] -> List[Any] or
Sequence[Any] -> List[Any]), and annotate all properties with their concrete
return types (for example -> torch.Tensor or -> Sequence[torch.Tensor] as
appropriate) and add typing imports (from typing import Any, Iterable, Sequence,
List, Optional) at the top. Ensure signature types for the functions at 130-169
and 174-179 are updated consistently (parameters typed and return types
declared) to satisfy static type checkers.
| new_list.append(j) | ||
| return new_list | ||
|
|
||
| def generate_until(self, requests: List[Instance]) -> List[str]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Add docstrings for public APIs.
Repo rule: public APIs must have docstrings.
def generate_until(self, requests: List[Instance]) -> List[str]:
+ """
+ Generate responses until stop conditions for a batch of requests.
+ Expects each Instance to provide (context, gen_kwargs, doc_to_visual, doc_id, task, split).
+ """
@@
def generate_until_multi_round(self, requests) -> List[str]:
- raise NotImplementedError("TODO: Implement multi-round generation")
+ """
+ Multi-round chat generation for LLaVA-One-Vision.
+ """
+ raise NotImplementedError("TODO: Implement multi-round generation")Also applies to: 343-345
🤖 Prompt for AI Agents
In lmms_eval/models/simple/llava_onevision1_5.py around line 181 (method
generate_until) and lines 343-345 (other public APIs), add proper docstrings for
these public methods: include a one-line summary, parameter descriptions (types
and meanings for requests and any other args), the return value (type and
meaning), and any raised exceptions or side effects; follow the repo's docstring
style (e.g., Google or NumPy) used elsewhere in the project and keep it concise
and informative.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
♻️ Duplicate comments (2)
lmms_eval/models/simple/llava_onevision1_5.py (2)
171-173: Fix model name in NotImplemented message.- raise NotImplementedError("Loglikelihood is not implemented for Qwen2.5_VL") + raise NotImplementedError( + "Loglikelihood is not implemented for LLaVA-One-Vision yet" + )
207-217: Generalize stopper comment; remove Qwen-specific wording.- # Avoid using '\n\n' as a stopper for Qwen2.5VL to prevent truncation, which can lead to incorrect results + # Avoid overly generic newline stoppers; they can truncate valuable content + # in some vision LMs. Prefer explicit, task-specific stoppers. until = [item for item in until if item != "\n\n"]
🧹 Nitpick comments (14)
lmms_eval/models/simple/llava_onevision1_5.py (14)
1-5: Remove unused imports (ruff F401).base64 and BytesIO are unused.
-import base64 import re -from io import BytesIO -from typing import List, Optional, Tuple, Union +from typing import List, Optional, Tuple, Union
61-65: Promote valid attn implementations to a constant; wrap long message.Improves readability and respects 88c limit.
+VALID_ATTN_IMPLEMENTATIONS = (None, "flash_attention_2", "sdpa", "eager") @@ - valid_attn_implementations = [None, "flash_attention_2", "sdpa", "eager"] - if attn_implementation not in valid_attn_implementations: - raise ValueError(f"attn_implementation must be one of {valid_attn_implementations}, got {attn_implementation}") + if attn_implementation not in VALID_ATTN_IMPLEMENTATIONS: + raise ValueError( + f"attn_implementation must be one of " + f"{VALID_ATTN_IMPLEMENTATIONS}, got {attn_implementation}" + )
74-81: Add CUDA availability fallback.Avoids device errors on CPU-only hosts.
- if accelerator.num_processes > 1: + if accelerator.num_processes > 1: self._device = torch.device(f"cuda:{accelerator.local_process_index}") self.device_map = f"cuda:{accelerator.local_process_index}" else: - self._device = torch.device(device) + if device == "cuda" and not torch.cuda.is_available(): + eval_logger.warning("CUDA not available; falling back to CPU.") + device = "cpu" + self._device = torch.device(device) self.device_map = device_map if device_map else device
98-106: Avoid Optional for booleans; keep types tight.interleave_visuals and use_custom_video_loader need not be Optional.
- use_custom_video_loader: Optional[bool] = False, + use_custom_video_loader: bool = False, @@ - interleave_visuals: Optional[bool] = False, + interleave_visuals: bool = False,
130-170: Add required return type annotations for public properties.Repo rule: type hints required. Add concrete return types.
-from transformers import ( +from transformers import ( AutoProcessor, AutoTokenizer, - AutoModelForCausalLM + AutoModelForCausalLM, + PreTrainedModel, + PretrainedConfig, + PreTrainedTokenizerBase, )@property - def config(self): + def config(self) -> PretrainedConfig: @@ @property - def tokenizer(self): + def tokenizer(self) -> PreTrainedTokenizerBase: @@ @property - def model(self): + def model(self) -> PreTrainedModel: @@ @property - def eot_token_id(self): + def eot_token_id(self) -> int: @@ @property - def max_length(self): + def max_length(self) -> int: @@ @property - def batch_size(self): + def batch_size(self) -> int: @@ @property - def device(self): + def device(self) -> torch.device: @@ @property - def rank(self): + def rank(self) -> int: @@ @property - def world_size(self): + def world_size(self) -> int:
174-180: Avoid shadowing builtins; type the helper; simplify.- def flatten(self, input): - new_list = [] - for i in input: - for j in i: - new_list.append(j) - return new_list + from typing import Iterable, TypeVar, List + T = TypeVar("T") + + def flatten(self, items: Iterable[Iterable[T]]) -> List[T]: + return [j for i in items for j in i]
181-183: Add docstring for public API.- def generate_until(self, requests: List[Instance]) -> List[str]: + def generate_until(self, requests: List[Instance]) -> List[str]: + """ + Generate responses for a batch of Instances until stop tokens. + Args: + requests: List of Instances with (context, gen_kwargs, doc_to_visual, + doc_id, task, split). + Returns: + List[str]: Generated texts in the original request order. + """ res = []
218-224: Remove redundant placeholder stripping loop.Context replacement is already handled below.
- if isinstance(contexts, tuple): - contexts = list(contexts) - - for i in range(len(contexts)): - if "<image>" in contexts[i]: - contexts[i] = contexts[i].replace("<image>", "") + if isinstance(contexts, tuple): + contexts = list(contexts)
248-255: Prefer iterable unpacking over concatenation (ruff RUF005).- message.append( - { - "role": "user", - "content": processed_visuals + [{"type": "text", "text": context}], - } - ) + message.append( + { + "role": "user", + "content": [*processed_visuals, {"type": "text", "text": context}], + } + )
323-330: Prefer tokenizer for decoding.Processor may proxy, but tokenizer is clearer.
- answers = self.processor.batch_decode(generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False) + answers = self.tokenizer.batch_decode( + generated_ids_trimmed, + skip_special_tokens=True, + clean_up_tokenization_spaces=False, + )
341-343: Add docstring and type hints for generate_until_multi_round.- def generate_until_multi_round(self, requests) -> List[str]: - raise NotImplementedError("TODO: Implement multi-round generation") + def generate_until_multi_round(self, requests: List[Instance]) -> List[str]: + """ + Multi-round chat generation for LLaVA-One-Vision. + """ + raise NotImplementedError("TODO: Implement multi-round generation")
31-35: Class name style vs. repo ergonomics.Guideline says PascalCase; current name includes underscores. Consider adding a PascalCase alias to avoid churn.
@register_model("llava_onevision1_5") class Llava_OneVision1_5(lmms): @@ """And optionally:
class LlavaOneVision15(Llava_OneVision1_5): """PascalCase alias for style compliance.""" pass
210-214: Use TypeError for invalid type (ruff TRY004).- elif not isinstance(until, list): - raise ValueError(f"Expected `gen_kwargs['until']` to be of type Union[str, list], but got {type(until)}") + elif not isinstance(until, list): + raise TypeError( + f"Expected gen_kwargs['until'] to be Union[str, list], " + f"got {type(until)}" + )
288-294: Device move: use self.device consistentlyself._device is set earlier (torch.device(...)); moving inputs to the literal "cuda" when device_map == "auto" can mismatch model placement. Use inputs = inputs.to(self.device) (or inputs.to(next(self.model.parameters()).device) for absolute alignment).
File: lmms_eval/models/simple/llava_onevision1_5.py lines 288-294
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
lmms_eval/models/simple/llava_onevision1_5.py(1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: Type hints are required for all Python code
Public APIs must have docstrings
Maximum line length is 88 characters
Use PEP 8 naming: snake_case for functions/variables
Class names must use PascalCase
Constants should be in UPPER_SNAKE_CASE
Use f-strings for string formatting
Use early returns to avoid nested conditions
Use descriptive names; prefix handler functions with 'handle'
Prefer constants over functions where possible
Prefer functional, immutable approaches when not verbose
Define composing (higher-level) functions before their components
Mark issues in existing code with TODO: prefix in comments
Use functional and stateless approaches where they improve clarity
Use Ruff to enforce: import sorting (I001) and no unused imports
For long strings, wrap using parentheses rather than backslashes
Format long function calls over multiple lines with proper indentation
Split long import lists across multiple lines
Use Pyright type checking: add explicit None checks for Optional values
Use Pyright type narrowing for strings where applicable
Use Ruff (via pre-commit) to format and lint Python files
Document public APIs and test thoroughly
Files:
lmms_eval/models/simple/llava_onevision1_5.py
🧬 Code graph analysis (1)
lmms_eval/models/simple/llava_onevision1_5.py (3)
lmms_eval/api/instance.py (2)
Instance(6-29)args(25-29)lmms_eval/api/model.py (2)
lmms(26-331)add_partial(348-352)lmms_eval/utils.py (3)
Collator(865-1020)chunks(136-171)get_batched(891-913)
🪛 Ruff (0.12.2)
lmms_eval/models/simple/llava_onevision1_5.py
64-64: Avoid specifying long messages outside the exception class
(TRY003)
72-72: Avoid specifying long messages outside the exception class
(TRY003)
213-213: Prefer TypeError exception for invalid type
(TRY004)
213-213: Avoid specifying long messages outside the exception class
(TRY003)
252-252: Consider iterable unpacking instead of concatenation
Replace with iterable unpacking
(RUF005)
🔇 Additional comments (4)
lmms_eval/models/simple/llava_onevision1_5.py (4)
107-121: Accelerate wrapping looks correct.Unwrap in property is handled; logging is helpful.
237-247: Drop unused video shape inspection; keep minimal video descriptor.height/width are unused; opening the video just to read frame 0 is wasteful.
[suggest_minor_issue]- if isinstance(visual, str) and visual.endswith((".mp4", ".avi", ".mov")): # Video file - vr = decord.VideoReader(visual) - first_frame = vr[0].asnumpy() - height, width = first_frame.shape[:2] - # max_pixels = height * width - processed_visuals.append({"type": "video", "video": visual, "max_pixels": self.max_pixels, "min_pixels": self.min_pixels}) + if isinstance(visual, str) and visual.endswith((".mp4", ".avi", ".mov")): + processed_visuals.append( + { + "type": "video", + "video": visual, + "max_pixels": self.max_pixels, + "min_pixels": self.min_pixels, + } + )
295-321: Never pass None into generate; inference_mode added.Good fix: do_sample gating, pad_token fallback, inference_mode use.
14-17: Sort/expand imports to include types used in type hints.Import any types referenced by property/type hints (e.g. PreTrainedModel, PretrainedConfig, PreTrainedTokenizerBase, TypeVar, Iterable). The supplied search returned no matches in lmms_eval/models/simple/llava_onevision1_5.py — verify and add missing imports if you changed signatures.
Location: lmms_eval/models/simple/llava_onevision1_5.py (lines 14–17).
| try: | ||
| from qwen_vl_utils import process_vision_info | ||
| except ImportError: | ||
| eval_logger.warning("Failed to import qwen_vl_utils; Please install it via `pip install qwen-vl-utils`") | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix hard dependency on qwen_vl_utils; add safe fallback for images and guard videos.
As written, ImportError leaves process_vision_info undefined, causing a NameError at runtime. Provide a guarded import and a fallback path that collects PIL images; explicitly raise for videos when qwen_vl_utils is unavailable.
@@
-try:
- from qwen_vl_utils import process_vision_info
-except ImportError:
- eval_logger.warning("Failed to import qwen_vl_utils; Please install it via `pip install qwen-vl-utils`")
+try:
+ from qwen_vl_utils import process_vision_info # type: ignore
+except ImportError:
+ process_vision_info = None # fallback in generate_until
+ eval_logger.warning(
+ "qwen_vl_utils not installed; falling back to processor-only image handling."
+ )
@@
- texts = [self.processor.apply_chat_template(msg, tokenize=False, add_generation_prompt=True) for msg in batched_messages]
- image_inputs, video_inputs = process_vision_info(batched_messages)
+ texts = [
+ self.processor.apply_chat_template(
+ msg, tokenize=False, add_generation_prompt=True
+ )
+ for msg in batched_messages
+ ]
+ if process_vision_info is not None:
+ image_inputs, video_inputs = process_vision_info(batched_messages)
+ else:
+ # Fallback: collect PIL images only; videos unsupported without qwen_vl_utils
+ image_inputs, video_inputs = [], None
+ for msg in batched_messages:
+ user = next(p for p in msg if p["role"] == "user")["content"]
+ imgs = [
+ part["image"]
+ for part in user
+ if isinstance(part, dict)
+ and part.get("type") == "image"
+ and isinstance(part.get("image"), Image.Image)
+ ]
+ if any(isinstance(part, dict) and part.get("type") == "video" for part in user):
+ raise NotImplementedError(
+ "Video inputs require qwen_vl_utils or a native LLaVA video path."
+ )
+ image_inputs.append(imgs or None)Also applies to: 279-289
| self.use_custom_video_loader = use_custom_video_loader | ||
| self.fps = fps | ||
| # if self.fps and not self.use_custom_video_loader: | ||
| # raise ValueError("FPS is only applicable if use_custom_video_loader is True") | ||
| self.max_image_size = max_image_size | ||
| if self.max_image_size and not self.use_custom_video_loader: | ||
| raise ValueError("max_image_size is only applicable if use_custom_video_loader is True") | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Enforce fps/max_image_size coupling with use_custom_video_loader.
The fps check is commented out; enforce both constraints.
self.use_custom_video_loader = use_custom_video_loader
self.fps = fps
- # if self.fps and not self.use_custom_video_loader:
- # raise ValueError("FPS is only applicable if use_custom_video_loader is True")
+ if self.fps is not None and not self.use_custom_video_loader:
+ raise ValueError(
+ "fps is only applicable if use_custom_video_loader is True"
+ )
self.max_image_size = max_image_size
if self.max_image_size and not self.use_custom_video_loader:
raise ValueError("max_image_size is only applicable if use_custom_video_loader is True")📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| self.use_custom_video_loader = use_custom_video_loader | |
| self.fps = fps | |
| # if self.fps and not self.use_custom_video_loader: | |
| # raise ValueError("FPS is only applicable if use_custom_video_loader is True") | |
| self.max_image_size = max_image_size | |
| if self.max_image_size and not self.use_custom_video_loader: | |
| raise ValueError("max_image_size is only applicable if use_custom_video_loader is True") | |
| self.use_custom_video_loader = use_custom_video_loader | |
| self.fps = fps | |
| if self.fps is not None and not self.use_custom_video_loader: | |
| raise ValueError( | |
| "fps is only applicable if use_custom_video_loader is True" | |
| ) | |
| self.max_image_size = max_image_size | |
| if self.max_image_size and not self.use_custom_video_loader: | |
| raise ValueError("max_image_size is only applicable if use_custom_video_loader is True") |
🧰 Tools
🪛 Ruff (0.12.2)
72-72: Avoid specifying long messages outside the exception class
(TRY003)
🤖 Prompt for AI Agents
In lmms_eval/models/simple/llava_onevision1_5.py around lines 66 to 73, the fps
validation is currently commented out while max_image_size is enforced;
re-enable and enforce the same coupling for fps: if self.fps is set and not
self.use_custom_video_loader, raise a ValueError with a clear message like "fps
is only applicable if use_custom_video_loader is True"; keep the existing
max_image_size check as-is so both parameters require use_custom_video_loader to
be True.
| if video_inputs is not None: | ||
| total_frames = video_inputs[0].shape[0] | ||
| indices = np.linspace(0, total_frames - 1, self.max_num_frames, dtype=int) | ||
| # Append the last frame index if not already included | ||
| if total_frames - 1 not in indices: | ||
| indices = np.append(indices, total_frames - 1) | ||
| video_inputs[0] = video_inputs[0][indices] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Process all videos, not just index 0.
Only the first video is subsampled; others are ignored.
- if video_inputs is not None:
- total_frames = video_inputs[0].shape[0]
- indices = np.linspace(0, total_frames - 1, self.max_num_frames, dtype=int)
- # Append the last frame index if not already included
- if total_frames - 1 not in indices:
- indices = np.append(indices, total_frames - 1)
- video_inputs[0] = video_inputs[0][indices]
+ if video_inputs is not None:
+ for vi in range(len(video_inputs)):
+ total_frames = video_inputs[vi].shape[0]
+ indices = np.linspace(0, total_frames - 1, self.max_num_frames, dtype=int)
+ if total_frames - 1 not in indices:
+ indices = np.append(indices, total_frames - 1)
+ video_inputs[vi] = video_inputs[vi][indices]📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if video_inputs is not None: | |
| total_frames = video_inputs[0].shape[0] | |
| indices = np.linspace(0, total_frames - 1, self.max_num_frames, dtype=int) | |
| # Append the last frame index if not already included | |
| if total_frames - 1 not in indices: | |
| indices = np.append(indices, total_frames - 1) | |
| video_inputs[0] = video_inputs[0][indices] | |
| if video_inputs is not None: | |
| for vi in range(len(video_inputs)): | |
| total_frames = video_inputs[vi].shape[0] | |
| indices = np.linspace(0, total_frames - 1, self.max_num_frames, dtype=int) | |
| # Append the last frame index if not already included | |
| if total_frames - 1 not in indices: | |
| indices = np.append(indices, total_frames - 1) | |
| video_inputs[vi] = video_inputs[vi][indices] |
🤖 Prompt for AI Agents
In lmms_eval/models/simple/llava_onevision1_5.py around lines 281 to 287, the
current logic only subsamples video_inputs[0], ignoring other videos; update it
to iterate over all entries in video_inputs (e.g., for i in
range(len(video_inputs)) or for idx, vid in enumerate(video_inputs)) and for
each video compute total_frames = vid.shape[0], create indices = np.linspace(0,
total_frames - 1, self.max_num_frames, dtype=int), append total_frames - 1 if
not present, and then assign video_inputs[i] = vid[indices] so every video is
subsampled consistently. Ensure you handle None and empty videos the same way as
before.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
♻️ Duplicate comments (5)
lmms_eval/models/simple/llava_onevision1_5.py (5)
62-69: Enforce fps/max_image_size coupling to custom loader.Validate both only when use_custom_video_loader=True; use explicit None checks.
self.use_custom_video_loader = use_custom_video_loader self.fps = fps - # if self.fps and not self.use_custom_video_loader: - # raise ValueError("FPS is only applicable if use_custom_video_loader is True") + if self.fps is not None and not self.use_custom_video_loader: + raise ValueError("fps is only applicable if use_custom_video_loader is True") self.max_image_size = max_image_size - if self.max_image_size and not self.use_custom_video_loader: + if self.max_image_size is not None and not self.use_custom_video_loader: raise ValueError("max_image_size is only applicable if use_custom_video_loader is True")
285-312: LGTM: safer generate() args and inference_mode.Pad fallback, conditional sampling args, and inference context resolve prior pitfalls.
20-24: Guard qwen_vl_utils and add image-only fallback (prevents NameError and mismatched vision path).process_vision_info is unbound if qwen_vl_utils isn’t installed, and LLaVA can operate on raw PIL images without qwen helpers. Add a guarded import and a safe fallback that collects PIL images; explicitly raise for videos without a native path. Also keep processors’ video subsampling consistent.
Apply:
@@ -try: - from qwen_vl_utils import process_vision_info -except ImportError: - eval_logger.warning("Failed to import qwen_vl_utils; Please install it via `pip install qwen-vl-utils`") +try: + from qwen_vl_utils import process_vision_info # type: ignore +except ImportError: + process_vision_info = None # fallback in generate_until + eval_logger.warning( + "qwen_vl_utils not installed; falling back to processor-only image handling." + ) @@ - texts = [self.processor.apply_chat_template(msg, tokenize=False, add_generation_prompt=True) for msg in batched_messages] - image_inputs, video_inputs = process_vision_info(batched_messages) - if video_inputs is not None: - total_frames = video_inputs[0].shape[0] - indices = np.linspace(0, total_frames - 1, self.max_num_frames, dtype=int) - # Append the last frame index if not already included - if total_frames - 1 not in indices: - indices = np.append(indices, total_frames - 1) - video_inputs[0] = video_inputs[0][indices] - inputs = self.processor(text=texts, images=image_inputs, videos=video_inputs, padding=True, return_tensors="pt") + texts = [ + self.processor.apply_chat_template( + msg, tokenize=False, add_generation_prompt=True + ) + for msg in batched_messages + ] + if process_vision_info is not None: + image_inputs, video_inputs = process_vision_info(batched_messages) + else: + # Fallback: collect PIL images only; videos unsupported without qwen_vl_utils + image_inputs, video_inputs = [], None + for msg in batched_messages: + user = next(p for p in msg if p["role"] == "user")["content"] + imgs = [ + part["image"] + for part in user + if isinstance(part, dict) + and part.get("type") == "image" + and isinstance(part.get("image"), Image.Image) + ] + if any( + isinstance(part, dict) and part.get("type") == "video" + for part in user + ): + raise NotImplementedError( + "Video inputs require qwen_vl_utils or a native LLaVA video path." + ) + image_inputs.append(imgs or None) + if video_inputs is not None: + for vi in range(len(video_inputs)): + total_frames = video_inputs[vi].shape[0] + indices = np.linspace( + 0, total_frames - 1, self.max_num_frames, dtype=int + ) + if total_frames - 1 not in indices: + indices = np.append(indices, total_frames - 1) + video_inputs[vi] = video_inputs[vi][indices] + inputs = self.processor( + text=texts, + images=image_inputs, + videos=video_inputs, + padding=True, + return_tensors="pt", + )Also applies to: 269-279
122-162: Add required type hints for properties and fix flatten’s signature/name.Repo rule: type hints required; avoid shadowing builtins with parameter name input.
@property - def config(self): + def config(self) -> Any: @@ @property - def tokenizer(self): + def tokenizer(self) -> Any: @@ @property - def model(self): + def model(self) -> torch.nn.Module: @@ @property - def eot_token_id(self): + def eot_token_id(self) -> int: @@ @property - def max_length(self): + def max_length(self) -> int: @@ @property - def batch_size(self): + def batch_size(self) -> int: @@ @property - def device(self): + def device(self) -> torch.device: @@ @property - def rank(self): + def rank(self) -> int: @@ @property - def world_size(self): + def world_size(self) -> int: return self._world_size @@ - def flatten(self, input): - new_list = [] - for i in input: - for j in i: - new_list.append(j) - return new_list + def flatten(self, items: Iterable[Iterable[T]]) -> List[T]: + return [j for i in items for j in i]Also applies to: 166-171
173-175: Add docstrings for public APIs (repo requirement).Brief summaries and param/return docs are required.
def generate_until(self, requests: List[Instance]) -> List[str]: - res = [] + """ + Generate text for a batch of Instances until stop conditions. + + Args: + requests: Batch providing (context, gen_kwargs, doc_to_visual, + doc_id, task, split). + Returns: + Generated strings in the original request order. + """ + res = [] @@ - def generate_until_multi_round(self, requests) -> List[str]: - raise NotImplementedError("TODO: Implement multi-round generation") + def generate_until_multi_round(self, requests) -> List[str]: + """ + Multi-round chat generation for LLaVA-One-Vision. + """ + raise NotImplementedError("TODO: Implement multi-round generation")Also applies to: 331-333
🧹 Nitpick comments (6)
lmms_eval/models/simple/llava_onevision1_5.py (6)
1-4: Remove unused imports; expand typing for upcoming hints.base64 and BytesIO are unused; add Any/Iterable/TypeVar for annotations.
-import base64 -import re -from io import BytesIO -from typing import List, Optional, Tuple, Union +import re +from typing import Any, Iterable, List, Optional, Tuple, TypeVar, UnionAdd near the imports:
+T = TypeVar("T")
163-165: Fix copy/paste in NotImplemented message (Qwen → LLaVA).Message should reference this model.
- raise NotImplementedError("Loglikelihood is not implemented for Qwen2.5_VL") + raise NotImplementedError( + "Loglikelihood is not implemented for LLaVA-One-Vision yet" + )
199-206: Raise TypeError for invalid ‘until’ type, keep message concise.Aligns with Ruff TRY004 and clearer API erroring.
if isinstance(until, str): until = [until] elif not isinstance(until, list): - raise ValueError(f"Expected `gen_kwargs['until']` to be of type Union[str, list], but got {type(until)}") + raise TypeError( + "gen_kwargs['until'] must be a str or List[str]; " + f"got {type(until).__name__}" + )
238-244: Use iterable unpacking instead of list concatenation.Cleaner and satisfies Ruff RUF005.
- message.append( - { - "role": "user", - "content": processed_visuals + [{"type": "text", "text": context}], - } - ) + message.append( + { + "role": "user", + "content": [*processed_visuals, {"type": "text", "text": context}], + } + )
210-217: Remove duplicatedplaceholder cleanup (done again per-message).
Avoid double replace; keep per-message cleanup only.
- for i in range(len(contexts)): - if "<image>" in contexts[i]: - contexts[i] = contexts[i].replace("<image>", "") + # Defer <image> placeholder cleanup to the per-message loop below.
26-31: Class name style nit: consider PascalCase (no underscores).Repo guideline says class names use PascalCase. If rename would break registry/imports, consider adding a PascalCase alias.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
lmms_eval/models/simple/llava_onevision1_5.py(1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: Type hints are required for all Python code
Public APIs must have docstrings
Maximum line length is 88 characters
Use PEP 8 naming: snake_case for functions/variables
Class names must use PascalCase
Constants should be in UPPER_SNAKE_CASE
Use f-strings for string formatting
Use early returns to avoid nested conditions
Use descriptive names; prefix handler functions with 'handle'
Prefer constants over functions where possible
Prefer functional, immutable approaches when not verbose
Define composing (higher-level) functions before their components
Mark issues in existing code with TODO: prefix in comments
Use functional and stateless approaches where they improve clarity
Use Ruff to enforce: import sorting (I001) and no unused imports
For long strings, wrap using parentheses rather than backslashes
Format long function calls over multiple lines with proper indentation
Split long import lists across multiple lines
Use Pyright type checking: add explicit None checks for Optional values
Use Pyright type narrowing for strings where applicable
Use Ruff (via pre-commit) to format and lint Python files
Document public APIs and test thoroughly
Files:
lmms_eval/models/simple/llava_onevision1_5.py
🧬 Code graph analysis (1)
lmms_eval/models/simple/llava_onevision1_5.py (4)
lmms_eval/api/instance.py (2)
Instance(6-29)args(25-29)lmms_eval/api/model.py (2)
lmms(26-331)add_partial(348-352)lmms_eval/api/registry.py (1)
register_model(11-24)lmms_eval/utils.py (3)
Collator(865-1020)chunks(136-171)get_batched(891-913)
🪛 Ruff (0.12.2)
lmms_eval/models/simple/llava_onevision1_5.py
60-60: Avoid specifying long messages outside the exception class
(TRY003)
68-68: Avoid specifying long messages outside the exception class
(TRY003)
205-205: Prefer TypeError exception for invalid type
(TRY004)
205-205: Avoid specifying long messages outside the exception class
(TRY003)
242-242: Consider iterable unpacking instead of concatenation
Replace with iterable unpacking
(RUF005)
| inputs = inputs.to("cuda") | ||
| else: | ||
| inputs = inputs.to(self.device) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
Verify CPU path when device_map='auto'.
inputs.to("cuda") on the auto path may break CPU-only runs. Consider always using inputs.to(self.device) or detect the first model device.
🏁 Script executed:
#!/usr/bin/env bash
# Grep for other generate_until implementations to confirm device handling patterns.
rg -n -C2 'inputs\.to\(' lmms_eval/models | sed -n '1,120p'Length of output: 5931
Do not unconditionally use "cuda" when device_map == 'auto'.
inputs.to("cuda") breaks CPU-only runs — use inputs = inputs.to(self.device) or determine the model device (e.g. self.model.device or next(self.model.parameters()).device) and preserve dtype with .to(self.model.dtype) if needed. Affected: lmms_eval/models/simple/llava_onevision1_5.py:281-283 (same pattern found in whisper.py:180, qwen2_vl.py:353, qwen2_audio.py:242, qwen2_5_vl_interleave.py:355, qwen2_5_vl.py:300, chat/thyme.py:135, chat/qwen2_5_vl.py:81, chat/huggingface.py:242).
🤖 Prompt for AI Agents
In lmms_eval/models/simple/llava_onevision1_5.py around lines 281-283, the code
unconditionally moves inputs to "cuda" when device_map == 'auto', which breaks
CPU-only runs; change this to move inputs to the actual model/device (e.g. use
inputs = inputs.to(self.device) or determine the model device via
self.model.device or next(self.model.parameters()).device) and preserve dtype if
needed by using .to(device, dtype=self.model.dtype) so CPU-only environments
work and tensor dtype is consistent with the model.
| answers = self.processor.batch_decode(generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False) | ||
| for i, ans in enumerate(answers): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Decode with tokenizer, not processor (avoids AttributeError on some processors).
AutoProcessor may not expose batch_decode; tokenizer does.
- answers = self.processor.batch_decode(generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False)
+ answers = self.tokenizer.batch_decode(
+ generated_ids_trimmed,
+ skip_special_tokens=True,
+ clean_up_tokenization_spaces=False,
+ )📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| answers = self.processor.batch_decode(generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False) | |
| for i, ans in enumerate(answers): | |
| answers = self.tokenizer.batch_decode( | |
| generated_ids_trimmed, | |
| skip_special_tokens=True, | |
| clean_up_tokenization_spaces=False, | |
| ) | |
| for i, ans in enumerate(answers): |
🤖 Prompt for AI Agents
In lmms_eval/models/simple/llava_onevision1_5.py around lines 314-315, the code
calls self.processor.batch_decode which can raise AttributeError because
AutoProcessor may not implement batch_decode; replace this call to use the
tokenizer's batch_decode (e.g.,
self.tokenizer.batch_decode(generated_ids_trimmed, skip_special_tokens=True,
clean_up_tokenization_spaces=False)) or, if tokenizer might be missing, guard
with a fallback to processor by checking hasattr(self, "tokenizer") and
hasattr(self.tokenizer, "batch_decode") and using processor only as fallback.
| @@ -0,0 +1,10 @@ | |||
| export HF_HOME="~/.cache/huggingface" | |||
|
|
|||
| # pip install git+https://github.com/LLaVA-VL/LLaVA-NeXT.git | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this link correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The link should be fixed.
|
Dear authors, I notice that Huggingface transformers does not have Llava-OneVision-1.5 model currently. Is this PR ready to use for evaluation? |
Sure! We are working hard on it |
Before you open a pull-request, please check if a similar issue already exists or has been closed before.
When you open a pull-request, please be sure to include the following
If you meet the lint warnings, you can use following scripts to reformat code.
Thank you for your contributions!
Summary by CodeRabbit
New Features
Chores