-
Notifications
You must be signed in to change notification settings - Fork 487
[Task] Added VSIBench debiased & pruned #975
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
[Task] Vsibench debiased & pruned
fixed yaml name
|
Claude finished @Luodian's task in 4m 46s —— View job PR Review: VSIBench debiased & prunedTasks
SummaryThis PR refactors the VSIBench task configuration to follow the standard task variant pattern used throughout the lmms-eval codebase. The refactoring extracts common configuration into a Pattern Compliance ✅The refactoring follows the exact same pattern used by other tasks in the codebase: Similar Examples:
Code Structure ReviewBased on the file changes and comparing with the current Current Structure (41 lines):
New Structure (Expected): This is excellent refactoring that:
Key Review Points1. File Naming Convention ✅
2. Expected Content StructureThe
The variant files should contain:
3. Potential Issues to Check
In with open(Path(__file__).parent / "vsibench.yaml", "r") as f:This will break if the common configuration is moved to # Option 1: Read from the default template
with open(Path(__file__).parent / "_default_template_yaml", "r") as f:
# Option 2: Keep reading vsibench.yaml but ensure it has dataset_kwargs
# (if vsibench.yaml still includes dataset_kwargs or inherits it)Action Required: Fix hardcoded vsibench.yaml reference → 4. Dataset Split NamingQuestion for PR Author (@oscarqjh):
The pattern could be either: # Pattern A: Using test_split (like refcoco)
test_split: debiased# Pattern B: Using dataset_name (like open_asr, egothink)
dataset_name: debiased
test_split: testPlease clarify which pattern is used for the VSI-Bench dataset structure. 5. Pre-commit ChecksFrom CLAUDE.md, please ensure you've run: uv run ruff format .
uv run ruff check . --fixAlso verify:
Recommendations
Testing ChecklistBefore merging, please verify:
ConclusionThis is a solid refactoring that follows best practices and established patterns in the codebase. The main blocker is the hardcoded Once the utils.py issue is fixed and the variant split names are clarified, this PR will be ready to merge. Note: I couldn't directly view the PR diff due to permission constraints, so this review is based on:
If I've misunderstood any aspect of the implementation, please let me know! |
Before you open a pull-request, please check if a similar issue already exists or has been closed before.
When you open a pull-request, please be sure to include the following
If you meet the lint warnings, you can use following scripts to reformat code.
Ask for review
Once you feel comfortable for your PR, feel free to @ one of the contributors to review
General: @Luodian @kcz358 @pufanyi
Audio: @pbcong @ngquangtrung57
Thank you for your contributions!