-
Notifications
You must be signed in to change notification settings - Fork 468
fix: filter multimodal content from log samples while preserving metadata #962
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
- Fix FileExistsError -> FileNotFoundError (correct exception type) - Replace print() with eval_logger for consistent logging - Add type hints to all functions - Fix missing comma bug in final_answer_patterns list - Remove redundant image_path = image_path assignment - Initialize op variable to prevent potential UnboundLocalError - Break long prompt string for readability (88 char line limit)
…data When using --log_samples, the previous implementation either saved all fields (causing serialization issues with images/audio) or filtered based on key names (missing useful metadata like image_id, image_path). This fix introduces is_multimodal_content() that detects actual multimodal data types (PIL.Image, numpy arrays, torch tensors, HuggingFace audio/image dicts) while preserving all scalar metadata fields for dataset traceability. Github-Issue:#943
Code Review: PR #962SummaryThis PR addresses filtering of multimodal content from log samples while preserving metadata. The overall approach is solid and well-thought-out, introducing a type-based detection system instead of key-name matching. Positive Aspects✅ Good Solution Design: Type-based detection via Issues & Recommendations🔴 Critical Issues
🟡 Medium Priority Issues
🟢 Minor Issues
Code Style Compliance✅ CLAUDE.md Compliance:
Performance Considerations
Security Considerations✅ No security concerns identified - the code doesn't execute user input or introduce injection risks Testing RecommendationsBased on CLAUDE.md requirements, please add: # tests/test_utils.py or similar
import numpy as np
import torch
from PIL import Image
from lmms_eval.utils import is_multimodal_content
def test_is_multimodal_content_with_images():
img = Image.new('RGB', (100, 100))
assert is_multimodal_content(img) is True
def test_is_multimodal_content_with_numpy():
arr = np.array([1, 2, 3])
assert is_multimodal_content(arr) is True
def test_is_multimodal_content_with_torch():
tensor = torch.tensor([1, 2, 3])
assert is_multimodal_content(tensor) is True
def test_is_multimodal_content_with_bytes():
assert is_multimodal_content(b'binary') is True
assert is_multimodal_content(bytearray(b'binary')) is True
def test_is_multimodal_content_with_hf_formats():
# HuggingFace audio format
assert is_multimodal_content({'array': np.array([1])}) is True
# HuggingFace image format
assert is_multimodal_content({'bytes': b'data'}) is True
def test_is_multimodal_content_preserves_metadata():
# These should NOT be filtered
assert is_multimodal_content('image_path.jpg') is False
assert is_multimodal_content({'image_id': 123}) is False
assert is_multimodal_content({'doc_id': 'abc'}) is False
assert is_multimodal_content(123) is False
assert is_multimodal_content([1, 2, 3]) is FalseOverall AssessmentThis is a well-intentioned fix that addresses a real problem. The main concerns are:
Recommendation: Request changes to address the critical issues before merging. Review based on CLAUDE.md guidelines and best practices |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| for key, value in doc.items(): | ||
| saved_doc[key] = value | ||
| if not is_multimodal_content(value): | ||
| saved_doc[key] = value |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Filter lists of multimodal content before logging
The new filter only checks the top-level value type, so a list of images/tensors still passes through unchanged. Several tasks populate doc["images"] with a list of PIL images (e.g., lmms_eval/tasks/stare/utils.py around lines 105–109 and live_bench/utils.py), meaning --log_samples will still attempt to serialize those lists and can hit the same serialization/size problems this change is meant to fix. Consider recursively filtering lists/tuples or stripping multimodal elements before saving.
Useful? React with 👍 / 👎.
Summary
is_multimodal_content()helper function to detect actual multimodal data types--log_samplesoutputdoc_id,image_id,image_path) for dataset traceabilityProblem
The previous implementation in #943 removed all filtering, which could cause:
The original filtering (
"image" not in key) was too aggressive and removed useful metadata fields.Solution
Type-based detection instead of key-name matching:
This preserves all ID/path/metadata fields while excluding actual binary content.
Test plan
--log_sampleson a multimodal taskCloses #943