Add intern vl3 and internvl3_5 #963

ngquangtrung57 · 2025-12-31T04:27:28Z

Before you open a pull-request, please check if a similar issue already exists or has been closed before.

When you open a pull-request, please be sure to include the following

A descriptive title: [xxx] XXXX
A detailed description

If you meet the lint warnings, you can use following scripts to reformat code.

pip install pre-commit
pre-commit install
pre-commit run --all-files

Ask for review

Once you feel comfortable for your PR, feel free to @ one of the contributors to review

Thank you for your contributions!

Add support for InternVL3 and InternVL3.5 models including: - InternVL3-8B (OpenGVLab/InternVL3-8B) - InternVL3.5-30B-A3B (OpenGVLab/InternVL3_5-30B-A3B) The implementation supports both single GPU and multi-GPU inference with automatic device mapping.

Add internvl3_5.py as thin wrapper around InternVL3 with different default pretrained model. Both share same logic since they have identical interfaces.

claude · 2025-12-31T04:30:17Z

Code Review: InternVL3 and InternVL3.5 Implementation

Summary

This PR adds support for InternVL3 and InternVL3.5 models. The implementation follows existing patterns from InternVL2 and includes support for both image and video modalities. Overall, the code is well-structured, but there are several issues that need to be addressed before merging.

Critical Issues

1. Missing Type Hints

Location: Multiple functions throughout internvl3.py

The codebase requires type hints for all functions per CLAUDE.md guidelines. The following functions are missing type annotations:

build_transform() - line 32
find_closest_aspect_ratio() - line 45
dynamic_preprocess() - line 61
load_image() - line 92
get_index() - line 100
load_video() - line 112

Fix: Add proper type hints to all function signatures.

2. Line Length Violations

Location: internvl3.py:64 and other locations

The codebase enforces an 88-character line limit. Line 64 with the target_ratios set comprehension exceeds this.

Fix: Break long lines appropriately using parentheses and multi-line formatting.

3. Missing _config Property

Location: internvl3.py:216-217

The config property returns self._model.config, but InternVL2 returns self._config. The _config attribute should be set during initialization.

Fix: Add self._config = self._model.config in init

4. Unused Parameter

Location: internvl3.py:141, 143

The num_layers parameter is defined but never used in InternVL3.

Fix: Remove the num_layers parameter from the init signature.

5. Missing Docstrings

Location: All public functions and the class

Per CLAUDE.md: Public APIs must have docstrings. The InternVL3 class and all public functions lack docstrings.

Fix: Add comprehensive docstrings to the class and all public functions.

Positive Aspects

Consistent with existing patterns
Good parameter names
Proper error handling
Comprehensive distributed training support
Clean inheritance for InternVL3_5
Nice addition of auto device map support

Required Actions Before Merge

Add type hints to all functions (Critical)
Fix line length violations (Critical - will fail CI)
Add docstrings to public APIs (Required by CLAUDE.md)
Remove unused num_layers parameter (Code quality)
Fix _config property (Bug)
Add tests (Required by CLAUDE.md)
Run: uv run ruff format . && uv run ruff check . --fix
Run: uv run pyright

Overall Assessment

The implementation is solid and follows established patterns. The main issues are code style compliance that will cause CI failures. Once addressed and tests are added, this will be ready to merge.

Recommendation: Request changes - address critical issues and add tests before merging.

ngquangtrung57 · 2025-12-31T05:00:40Z

Tested on ai2d, this is for intern 3.5vl 30B3A, reported in its paper is 86.8

ngquangtrung57 · 2025-12-31T05:02:06Z

Tested on ai2d, this is for intern 3vl 8B, reported in its paper is 85.2

claude · 2025-12-31T05:16:16Z

Pull Request Review: Add InternVL3 and InternVL3.5 Support

Thank you for contributing these new model implementations! Overall, the code is well-structured and follows good practices. Below is a detailed review with suggestions for improvement.

Code Quality and Best Practices ✅

Strengths:

Excellent Type Annotations: All functions and methods have comprehensive type hints, which is required per CLAUDE.md
Comprehensive Docstrings: Public APIs are well-documented with clear Args/Returns sections
Good Code Organization: Logical separation of preprocessing functions and model class
Consistent Formatting: Code appears to follow the 88-character line limit and PEP 8 conventions
Smart Reuse: InternVL3_5 correctly inherits from InternVL3, avoiding code duplication

Areas for Improvement:

1. Inconsistent Variable Naming in `load_image` (Line 151)

# Current (line 151):
pixel_values = [transform(img) for img in images]

# Issue: 'img' shadows the parameter name 'image'
# Suggested fix:
pixel_values = [transform(tile) for tile in images]

This matches the pattern used in load_video (line 216) and is clearer.

2. Device Handling Inconsistency (Lines 402, 423)

The code uses self._device instead of self.device in some places:

# Lines 402, 423:
.to(self._device)

# Should be consistent with InternVL2 pattern:
.to(self.device)

Or use .cuda() like InternVL2 does (lines 307, 321 in internvl2.py). However, using the property is better for flexibility.

3. Missing Config Initialization

The config property returns self._config, but unlike InternVL2, this is only set from the model. Consider if config should be available even if model loading fails.

Potential Bugs and Issues ⚠️

1. Critical: Line 111 Potential Logic Error

# Line 111:
target_ratios: Set[Tuple[int, int]] = set(
    (i, j) for n in range(min_num, max_num + 1) 
    for i in range(1, n + 1) 
    for j in range(1, n + 1) 
    if min_num <= i * j <= max_num
)

Compare with InternVL2 (line 57):

target_ratios = set(
    (i, j) for n in range(min_num, max_num + 1) 
    for i in range(1, n + 1) 
    for j in range(1, n + 1) 
    if i * j <= max_num and i * j >= min_num
)

Question: Is the change from i * j <= max_num and i * j >= min_num to min_num <= i * j <= max_num intentional? While semantically similar, the condition in InternVL3 may produce different results when min_num > 1. Please verify this is correct for InternVL3's requirements.

2. Unused `device` Parameter

The __init__ method accepts a device parameter (line 245) but never uses it directly. It's only used in one place (line 272) in a specific conditional. Consider if this parameter is necessary or should be documented as deprecated.

3. Missing Error Handling

Unlike some other models in the codebase, there's no validation that:

Images/videos exist before processing
The modality parameter is valid (should be "image" or "video")
Visual inputs are in expected format

Consider adding basic validation.

Performance Considerations 🚀

1. Tensor Operations

The code efficiently uses:

torch.stack for batching tiles
torch.cat for concatenating frames
Proper dtype conversion (torch.bfloat16)

2. Memory Efficiency

Good practices observed:

low_cpu_mem_usage=True in model loading
Proper device placement
Flash attention support

3. Potential Optimization

The flatten method (line 363) could use a list comprehension:

def flatten(self, input: List[List]) -> List:
    """Flatten a nested list."""
    return [j for i in input for j in i]

More Pythonic and potentially faster.

Security Concerns 🔒

1. Trust Remote Code ⚠️

Lines 284, 288 use trust_remote_code=True:

self._model = AutoModel.from_pretrained(
    self.path,
    trust_remote_code=True,  # Security consideration
    ...
)

This is necessary for custom model architectures but be aware:

Only use with trusted model sources
Users should be warned about potential code execution
This is consistent with InternVL2, so acceptable for this codebase

2. Path Handling

video_path (line 421) is used directly without validation. Consider:

Path traversal prevention
File existence checks
Proper error messages for missing files

Test Coverage 📝

Missing Tests

The PR adds no tests for the new functionality. According to CLAUDE.md:

"New features require tests"
Framework should be: uv run pytest

Recommendation: Add at least basic tests:

Test model initialization
Test image preprocessing pipeline
Test video preprocessing pipeline
Test error cases (invalid modality, missing files)
Mock tests for generate_until

Example test structure:

# tests/models/test_internvl3.py
import pytest
from lmms_eval.models.simple.internvl3 import InternVL3, build_transform, dynamic_preprocess

def test_build_transform():
    transform = build_transform(448)
    assert transform is not None
    # Add more assertions

def test_dynamic_preprocess():
    # Test with mock image
    ...

Additional Recommendations

1. Line Length (CLAUDE.md requirement: 88 chars max)

Line 111 exceeds this. Consider reformatting:

target_ratios: Set[Tuple[int, int]] = set(
    (i, j)
    for n in range(min_num, max_num + 1)
    for i in range(1, n + 1)
    for j in range(1, n + 1)
    if min_num <= i * j <= max_num
)

2. Run Pre-commit Hooks

According to CLAUDE.md, run:

uv run ruff format .
uv run ruff check . --fix
uv run pyright

3. Consider Adding Usage Example

A docstring example in the class showing how to use the model would be helpful.

Summary

Approval Status: Approve with minor changes suggested

Priority Issues to Address:

✅ Verify the logic change in dynamic_preprocess (line 111) is intentional
⚠️ Add basic tests (required per CLAUDE.md)
💡 Consider the minor improvements suggested above

What's Great:

Clean, well-documented code
Excellent type annotations
Follows existing patterns from InternVL2
Smart code reuse with InternVL3_5 inheriting from InternVL3

Great work overall! Once tests are added and the logic verification is confirmed, this will be ready to merge.

ngquangtrung57 added 4 commits December 30, 2025 20:24

add InternVL3 model support

78e08cc

Add support for InternVL3 and InternVL3.5 models including: - InternVL3-8B (OpenGVLab/InternVL3-8B) - InternVL3.5-30B-A3B (OpenGVLab/InternVL3_5-30B-A3B) The implementation supports both single GPU and multi-GPU inference with automatic device mapping.

separate InternVL3 and InternVL3.5 models

673fd95

Add internvl3_5.py as thin wrapper around InternVL3 with different default pretrained model. Both share same logic since they have identical interfaces.

minor changes

4a3c68b

lint

c837f50

address review

bc18a25

kcz358 approved these changes Jan 6, 2026

View reviewed changes

kcz358 merged commit 8da34f3 into main Jan 6, 2026
4 checks passed

kcz358 deleted the add_intern_vl branch January 6, 2026 09:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add intern vl3 and internvl3_5 #963

Add intern vl3 and internvl3_5 #963

Uh oh!

ngquangtrung57 commented Dec 31, 2025

Uh oh!

claude bot commented Dec 31, 2025

Uh oh!

ngquangtrung57 commented Dec 31, 2025 •

edited

Loading

Uh oh!

ngquangtrung57 commented Dec 31, 2025

Uh oh!

claude bot commented Dec 31, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Add intern vl3 and internvl3_5 #963

Add intern vl3 and internvl3_5 #963

Uh oh!

Conversation

ngquangtrung57 commented Dec 31, 2025

When you open a pull-request, please be sure to include the following

Ask for review

Uh oh!

claude bot commented Dec 31, 2025

Code Review: InternVL3 and InternVL3.5 Implementation

Summary

Critical Issues

1. Missing Type Hints

2. Line Length Violations

3. Missing _config Property

4. Unused Parameter

5. Missing Docstrings

Positive Aspects

Required Actions Before Merge

Overall Assessment

Uh oh!

ngquangtrung57 commented Dec 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ngquangtrung57 commented Dec 31, 2025

Uh oh!

claude bot commented Dec 31, 2025

Pull Request Review: Add InternVL3 and InternVL3.5 Support

Code Quality and Best Practices ✅

Strengths:

Areas for Improvement:

1. Inconsistent Variable Naming in load_image (Line 151)

2. Device Handling Inconsistency (Lines 402, 423)

3. Missing Config Initialization

Potential Bugs and Issues ⚠️

1. Critical: Line 111 Potential Logic Error

2. Unused device Parameter

3. Missing Error Handling

Performance Considerations 🚀

1. Tensor Operations

2. Memory Efficiency

3. Potential Optimization

Security Concerns 🔒

1. Trust Remote Code ⚠️

2. Path Handling

Test Coverage 📝

Missing Tests

Additional Recommendations

1. Line Length (CLAUDE.md requirement: 88 chars max)

2. Run Pre-commit Hooks

3. Consider Adding Usage Example

Summary

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ngquangtrung57 commented Dec 31, 2025 •

edited

Loading

1. Inconsistent Variable Naming in `load_image` (Line 151)

2. Unused `device` Parameter