Skip to content

Conversation

@ngquangtrung57
Copy link
Collaborator

Before you open a pull-request, please check if a similar issue already exists or has been closed before.

When you open a pull-request, please be sure to include the following

  • A descriptive title: [xxx] XXXX
  • A detailed description

If you meet the lint warnings, you can use following scripts to reformat code.

pip install pre-commit
pre-commit install
pre-commit run --all-files

Ask for review

Once you feel comfortable for your PR, feel free to @ one of the contributors to review

Thank you for your contributions!

Add support for InternVL3 and InternVL3.5 models including:
- InternVL3-8B (OpenGVLab/InternVL3-8B)
- InternVL3.5-30B-A3B (OpenGVLab/InternVL3_5-30B-A3B)

The implementation supports both single GPU and multi-GPU inference
with automatic device mapping.
Add internvl3_5.py as thin wrapper around InternVL3 with different
default pretrained model. Both share same logic since they have
identical interfaces.
@claude
Copy link

claude bot commented Dec 31, 2025

Code Review: InternVL3 and InternVL3.5 Implementation

Summary

This PR adds support for InternVL3 and InternVL3.5 models. The implementation follows existing patterns from InternVL2 and includes support for both image and video modalities. Overall, the code is well-structured, but there are several issues that need to be addressed before merging.

Critical Issues

1. Missing Type Hints

Location: Multiple functions throughout internvl3.py

The codebase requires type hints for all functions per CLAUDE.md guidelines. The following functions are missing type annotations:

  • build_transform() - line 32
  • find_closest_aspect_ratio() - line 45
  • dynamic_preprocess() - line 61
  • load_image() - line 92
  • get_index() - line 100
  • load_video() - line 112

Fix: Add proper type hints to all function signatures.

2. Line Length Violations

Location: internvl3.py:64 and other locations

The codebase enforces an 88-character line limit. Line 64 with the target_ratios set comprehension exceeds this.

Fix: Break long lines appropriately using parentheses and multi-line formatting.

3. Missing _config Property

Location: internvl3.py:216-217

The config property returns self._model.config, but InternVL2 returns self._config. The _config attribute should be set during initialization.

Fix: Add self._config = self._model.config in init

4. Unused Parameter

Location: internvl3.py:141, 143

The num_layers parameter is defined but never used in InternVL3.

Fix: Remove the num_layers parameter from the init signature.

5. Missing Docstrings

Location: All public functions and the class

Per CLAUDE.md: Public APIs must have docstrings. The InternVL3 class and all public functions lack docstrings.

Fix: Add comprehensive docstrings to the class and all public functions.

Positive Aspects

  • Consistent with existing patterns
  • Good parameter names
  • Proper error handling
  • Comprehensive distributed training support
  • Clean inheritance for InternVL3_5
  • Nice addition of auto device map support

Required Actions Before Merge

  1. Add type hints to all functions (Critical)
  2. Fix line length violations (Critical - will fail CI)
  3. Add docstrings to public APIs (Required by CLAUDE.md)
  4. Remove unused num_layers parameter (Code quality)
  5. Fix _config property (Bug)
  6. Add tests (Required by CLAUDE.md)
  7. Run: uv run ruff format . && uv run ruff check . --fix
  8. Run: uv run pyright

Overall Assessment

The implementation is solid and follows established patterns. The main issues are code style compliance that will cause CI failures. Once addressed and tests are added, this will be ready to merge.

Recommendation: Request changes - address critical issues and add tests before merging.

@ngquangtrung57
Copy link
Collaborator Author

ngquangtrung57 commented Dec 31, 2025

image Tested on ai2d, this is for intern 3.5vl 30B3A, reported in its paper is 86.8

@ngquangtrung57
Copy link
Collaborator Author

image Tested on ai2d, this is for intern 3vl 8B, reported in its paper is 85.2

@claude
Copy link

claude bot commented Dec 31, 2025

Pull Request Review: Add InternVL3 and InternVL3.5 Support

Thank you for contributing these new model implementations! Overall, the code is well-structured and follows good practices. Below is a detailed review with suggestions for improvement.

Code Quality and Best Practices ✅

Strengths:

  1. Excellent Type Annotations: All functions and methods have comprehensive type hints, which is required per CLAUDE.md
  2. Comprehensive Docstrings: Public APIs are well-documented with clear Args/Returns sections
  3. Good Code Organization: Logical separation of preprocessing functions and model class
  4. Consistent Formatting: Code appears to follow the 88-character line limit and PEP 8 conventions
  5. Smart Reuse: InternVL3_5 correctly inherits from InternVL3, avoiding code duplication

Areas for Improvement:

1. Inconsistent Variable Naming in load_image (Line 151)

# Current (line 151):
pixel_values = [transform(img) for img in images]

# Issue: 'img' shadows the parameter name 'image'
# Suggested fix:
pixel_values = [transform(tile) for tile in images]

This matches the pattern used in load_video (line 216) and is clearer.

2. Device Handling Inconsistency (Lines 402, 423)

The code uses self._device instead of self.device in some places:

# Lines 402, 423:
.to(self._device)

# Should be consistent with InternVL2 pattern:
.to(self.device)

Or use .cuda() like InternVL2 does (lines 307, 321 in internvl2.py). However, using the property is better for flexibility.

3. Missing Config Initialization

The config property returns self._config, but unlike InternVL2, this is only set from the model. Consider if config should be available even if model loading fails.

Potential Bugs and Issues ⚠️

1. Critical: Line 111 Potential Logic Error

# Line 111:
target_ratios: Set[Tuple[int, int]] = set(
    (i, j) for n in range(min_num, max_num + 1) 
    for i in range(1, n + 1) 
    for j in range(1, n + 1) 
    if min_num <= i * j <= max_num
)

Compare with InternVL2 (line 57):

target_ratios = set(
    (i, j) for n in range(min_num, max_num + 1) 
    for i in range(1, n + 1) 
    for j in range(1, n + 1) 
    if i * j <= max_num and i * j >= min_num
)

Question: Is the change from i * j <= max_num and i * j >= min_num to min_num <= i * j <= max_num intentional? While semantically similar, the condition in InternVL3 may produce different results when min_num > 1. Please verify this is correct for InternVL3's requirements.

2. Unused device Parameter

The __init__ method accepts a device parameter (line 245) but never uses it directly. It's only used in one place (line 272) in a specific conditional. Consider if this parameter is necessary or should be documented as deprecated.

3. Missing Error Handling

Unlike some other models in the codebase, there's no validation that:

  • Images/videos exist before processing
  • The modality parameter is valid (should be "image" or "video")
  • Visual inputs are in expected format

Consider adding basic validation.

Performance Considerations 🚀

1. Tensor Operations

The code efficiently uses:

  • torch.stack for batching tiles
  • torch.cat for concatenating frames
  • Proper dtype conversion (torch.bfloat16)

2. Memory Efficiency

Good practices observed:

  • low_cpu_mem_usage=True in model loading
  • Proper device placement
  • Flash attention support

3. Potential Optimization

The flatten method (line 363) could use a list comprehension:

def flatten(self, input: List[List]) -> List:
    """Flatten a nested list."""
    return [j for i in input for j in i]

More Pythonic and potentially faster.

Security Concerns 🔒

1. Trust Remote Code ⚠️

Lines 284, 288 use trust_remote_code=True:

self._model = AutoModel.from_pretrained(
    self.path,
    trust_remote_code=True,  # Security consideration
    ...
)

This is necessary for custom model architectures but be aware:

  • Only use with trusted model sources
  • Users should be warned about potential code execution
  • This is consistent with InternVL2, so acceptable for this codebase

2. Path Handling

video_path (line 421) is used directly without validation. Consider:

  • Path traversal prevention
  • File existence checks
  • Proper error messages for missing files

Test Coverage 📝

Missing Tests

The PR adds no tests for the new functionality. According to CLAUDE.md:

  • "New features require tests"
  • Framework should be: uv run pytest

Recommendation: Add at least basic tests:

  1. Test model initialization
  2. Test image preprocessing pipeline
  3. Test video preprocessing pipeline
  4. Test error cases (invalid modality, missing files)
  5. Mock tests for generate_until

Example test structure:

# tests/models/test_internvl3.py
import pytest
from lmms_eval.models.simple.internvl3 import InternVL3, build_transform, dynamic_preprocess

def test_build_transform():
    transform = build_transform(448)
    assert transform is not None
    # Add more assertions

def test_dynamic_preprocess():
    # Test with mock image
    ...

Additional Recommendations

1. Line Length (CLAUDE.md requirement: 88 chars max)

Line 111 exceeds this. Consider reformatting:

target_ratios: Set[Tuple[int, int]] = set(
    (i, j)
    for n in range(min_num, max_num + 1)
    for i in range(1, n + 1)
    for j in range(1, n + 1)
    if min_num <= i * j <= max_num
)

2. Run Pre-commit Hooks

According to CLAUDE.md, run:

uv run ruff format .
uv run ruff check . --fix
uv run pyright

3. Consider Adding Usage Example

A docstring example in the class showing how to use the model would be helpful.

Summary

Approval Status: Approve with minor changes suggested

Priority Issues to Address:

  1. ✅ Verify the logic change in dynamic_preprocess (line 111) is intentional
  2. ⚠️ Add basic tests (required per CLAUDE.md)
  3. 💡 Consider the minor improvements suggested above

What's Great:

  • Clean, well-documented code
  • Excellent type annotations
  • Follows existing patterns from InternVL2
  • Smart code reuse with InternVL3_5 inheriting from InternVL3

Great work overall! Once tests are added and the logic verification is confirmed, this will be ready to merge.

@kcz358 kcz358 merged commit 8da34f3 into main Jan 6, 2026
4 checks passed
@kcz358 kcz358 deleted the add_intern_vl branch January 6, 2026 09:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants