-
Notifications
You must be signed in to change notification settings - Fork 472
[New Benchmark] Request for supporting TimeScope #756
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughNew YAML configuration files and utility modules were introduced for the "timescope" and "longtimescope" tasks in the Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant TaskConfig (YAML)
participant Utils
participant Model
User->>TaskConfig: Load task YAML (timescope/longtimescope)
TaskConfig->>Utils: Assign processing, aggregation functions
User->>Model: Run model with visual/text inputs
Model-->>User: Generate predictions
User->>Utils: Process results (extract answer, metadata)
Utils->>Utils: Aggregate results (compute accuracy)
Utils-->>User: Return evaluation metrics
Poem
✨ Finishing Touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🔭 Outside diff range comments (1)
lmms_eval/tasks/longtimescope/utils.py (1)
1-143: Significant code duplication with timescope/utils.py.This file is nearly identical to
lmms_eval/tasks/timescope/utils.pywith only minor differences (line 34: loadinglongtimescope.yamlvstimescope.yaml). This violates the DRY principle and creates maintenance overhead.Consider refactoring to share common code between both tasks by:
- Creating a shared base utility module with common functions
- Parameterizing the configuration file name
- Having task-specific modules inherit or import from the shared module
This would reduce code duplication from ~140 lines to potentially ~10 lines per task.
Would you like me to propose a refactored structure that eliminates this duplication?
♻️ Duplicate comments (1)
lmms_eval/tasks/longtimescope/utils.py (1)
1-15: Apply the same fixes as timescope/utils.py.This file has identical issues to the timescope utilities:
- Remove unused imports (lines 1-2, 6, 8, 10-11, 15)
- Fix unused loop variable (line 37)
- Fix f-string without placeholders (lines 50, 101)
- Remove redundant assignment (line 54)
- Replace sys.exit() with exception (line 60)
- Fix missing commas in answer_prefixes list (lines 71-72)
Please apply the same fixes as recommended for
lmms_eval/tasks/timescope/utils.py.Also applies to: 37-37, 50-50, 53-54, 60-60, 71-72, 101-101
🧹 Nitpick comments (7)
lmms_eval/tasks/timescope/timescope.yaml (1)
37-37: Fix trailing spaces.Remove the trailing spaces at the end of line 37 to comply with YAML formatting standards.
- # qwen_vl: + # qwen_vl:lmms_eval/tasks/longtimescope/longtimescope.yaml (2)
37-37: Fix trailing spaces.Remove the trailing spaces at the end of line 37 to comply with YAML formatting standards.
- # qwen_vl: + # qwen_vl:
11-11: Consider renaming shared utility functions.The longtimescope task references
utils.timescope_*functions, which may be confusing since these are shared between timescope and longtimescope. Consider renaming to more generic names likedoc_to_visual,process_results, andaggregate_resultsto reflect their shared usage.Also applies to: 21-21, 25-25
lmms_eval/tasks/timescope/utils.py (4)
37-37: Fix unused loop variable.The loop variable
iis not used within the loop body.- for i, line in enumerate(raw_data): + for _, line in enumerate(raw_data):
50-50: Fix f-string without placeholders.Remove the unnecessary
fprefix from the logging statement.- eval_logger.info(f"base_cache_dir", base_cache_dir, "cache_name", cache_name) + eval_logger.info("base_cache_dir", base_cache_dir, "cache_name", cache_name)
53-54: Remove redundant assignment.Line 54 performs a redundant assignment where
video_pathis assigned to itself.if os.path.exists(video_path): - video_path = video_path + pass
101-101: Fix f-string without placeholders.Remove the unnecessary
fprefix from the return statement.- return {f"timescope_perception_score": data_dict} + return {"timescope_perception_score": data_dict}
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
lmms_eval/tasks/longtimescope/longtimescope.yaml(1 hunks)lmms_eval/tasks/longtimescope/utils.py(1 hunks)lmms_eval/tasks/timescope/timescope.yaml(1 hunks)lmms_eval/tasks/timescope/utils.py(1 hunks)
🧰 Additional context used
🪛 YAMLlint (1.37.1)
lmms_eval/tasks/timescope/timescope.yaml
[error] 37-37: trailing spaces
(trailing-spaces)
lmms_eval/tasks/longtimescope/longtimescope.yaml
[error] 37-37: trailing spaces
(trailing-spaces)
🪛 Ruff (0.11.9)
lmms_eval/tasks/timescope/utils.py
1-1: datetime imported but unused
Remove unused import: datetime
(F401)
2-2: json imported but unused
Remove unused import: json
(F401)
6-6: collections.defaultdict imported but unused
Remove unused import: collections.defaultdict
(F401)
8-8: typing.Dict imported but unused
Remove unused import
(F401)
8-8: typing.List imported but unused
Remove unused import
(F401)
8-8: typing.Optional imported but unused
Remove unused import
(F401)
8-8: typing.Union imported but unused
Remove unused import
(F401)
10-10: cv2 imported but unused
Remove unused import: cv2
(F401)
11-11: numpy imported but unused
Remove unused import: numpy
(F401)
15-15: lmms_eval.tasks._task_utils.file_utils.generate_submission_file imported but unused
Remove unused import: lmms_eval.tasks._task_utils.file_utils.generate_submission_file
(F401)
37-37: Loop control variable i not used within loop body
Rename unused i to _i
(B007)
50-50: f-string without any placeholders
Remove extraneous f prefix
(F541)
101-101: f-string without any placeholders
Remove extraneous f prefix
(F541)
139-139: Loop control variable k not used within loop body
(B007)
lmms_eval/tasks/longtimescope/utils.py
1-1: datetime imported but unused
Remove unused import: datetime
(F401)
2-2: json imported but unused
Remove unused import: json
(F401)
6-6: collections.defaultdict imported but unused
Remove unused import: collections.defaultdict
(F401)
8-8: typing.Dict imported but unused
Remove unused import
(F401)
8-8: typing.List imported but unused
Remove unused import
(F401)
8-8: typing.Optional imported but unused
Remove unused import
(F401)
8-8: typing.Union imported but unused
Remove unused import
(F401)
10-10: cv2 imported but unused
Remove unused import: cv2
(F401)
11-11: numpy imported but unused
Remove unused import: numpy
(F401)
15-15: lmms_eval.tasks._task_utils.file_utils.generate_submission_file imported but unused
Remove unused import: lmms_eval.tasks._task_utils.file_utils.generate_submission_file
(F401)
37-37: Loop control variable i not used within loop body
Rename unused i to _i
(B007)
50-50: f-string without any placeholders
Remove extraneous f prefix
(F541)
101-101: f-string without any placeholders
Remove extraneous f prefix
(F541)
139-139: Loop control variable k not used within loop body
(B007)
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (2)
lmms_eval/tasks/timescope/utils.py (2)
1-15: Address unused imports flagged by static analysis.Multiple imports are not used in this module and should be removed to improve code clarity and reduce dependencies.
60-60: Replace sys.exit() with exception.Using
sys.exit()terminates the entire program, which may not be appropriate in a library context. Consider raising an exception instead to allow proper error handling.
🧹 Nitpick comments (6)
lmms_eval/tasks/timescope/utils.py (6)
20-28: Remove commented-out code.This commented-out code block appears to be leftover from development and should be removed to keep the codebase clean.
-# with open(Path(__file__).parent / "_default_template_yaml", "r") as f: -# raw_data = f.readlines() -# safe_data = [] -# for i, line in enumerate(raw_data): -# # remove function definition since yaml load cannot handle it -# if "!function" not in line: -# safe_data.append(line) - -# config = yaml.safe_load("".join(safe_data))
37-37: Use underscore for unused loop variable.The loop variable
iis not used within the loop body. Use_to indicate it's intentionally unused.- for i, line in enumerate(raw_data): + for _, line in enumerate(raw_data):
50-50: Remove unnecessary f-string prefix.This f-string doesn't contain any placeholders, so the
fprefix is unnecessary.- eval_logger.info(f"base_cache_dir", base_cache_dir, "cache_name", cache_name) + eval_logger.info("base_cache_dir", base_cache_dir, "cache_name", cache_name)
103-103: Remove unnecessary f-string prefix.This f-string doesn't contain any placeholders, so the
fprefix is unnecessary.- return {f"timescope_perception_score": data_dict} + return {"timescope_perception_score": data_dict}
141-141: Use underscore for unused loop variable.The loop variable
kis not used within the loop body. Use_to indicate it's intentionally unused.- for k, v in category2score.items(): + for _, v in category2score.items():
88-144: Consider adding type hints for better code documentation.The functions would benefit from type hints to improve code clarity and enable better IDE support.
+from typing import Dict, List, Any -def timescope_process_results(doc, results): +def timescope_process_results(doc: Dict[str, Any], results: List[str]) -> Dict[str, Dict[str, Any]]: -def timescope_aggregate_results(results): +def timescope_aggregate_results(results: List[Dict[str, Any]]) -> float:
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
lmms_eval/tasks/timescope/utils.py(1 hunks)
🧰 Additional context used
🪛 Ruff (0.11.9)
lmms_eval/tasks/timescope/utils.py
1-1: datetime imported but unused
Remove unused import: datetime
(F401)
2-2: json imported but unused
Remove unused import: json
(F401)
6-6: collections.defaultdict imported but unused
Remove unused import: collections.defaultdict
(F401)
8-8: typing.Dict imported but unused
Remove unused import
(F401)
8-8: typing.List imported but unused
Remove unused import
(F401)
8-8: typing.Optional imported but unused
Remove unused import
(F401)
8-8: typing.Union imported but unused
Remove unused import
(F401)
10-10: cv2 imported but unused
Remove unused import: cv2
(F401)
11-11: numpy imported but unused
Remove unused import: numpy
(F401)
15-15: lmms_eval.tasks._task_utils.file_utils.generate_submission_file imported but unused
Remove unused import: lmms_eval.tasks._task_utils.file_utils.generate_submission_file
(F401)
37-37: Loop control variable i not used within loop body
Rename unused i to _i
(B007)
50-50: f-string without any placeholders
Remove extraneous f prefix
(F541)
103-103: f-string without any placeholders
Remove extraneous f prefix
(F541)
141-141: Loop control variable k not used within loop body
(B007)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Cursor BugBot
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug: Incorrect F-string Usage in Logging
The eval_logger.info call incorrectly uses an f-string f"base_cache_dir" without placeholders, leading to base_cache_dir and cache_name being logged as separate, unformatted arguments. This results in "base_cache_dir" being logged literally, rather than a single formatted message. The call should be eval_logger.info(f"base_cache_dir: {base_cache_dir}, cache_name: {cache_name}").
lmms_eval/tasks/longtimescope/utils.py#L49-L50
lmms-eval/lmms_eval/tasks/longtimescope/utils.py
Lines 49 to 50 in ea8ff21
| cache_dir = os.path.join(base_cache_dir, cache_name) | |
| eval_logger.info(f"base_cache_dir", base_cache_dir, "cache_name", cache_name) |
lmms_eval/tasks/timescope/utils.py#L49-L50
lmms-eval/lmms_eval/tasks/timescope/utils.py
Lines 49 to 50 in ea8ff21
| cache_dir = os.path.join(base_cache_dir, cache_name) | |
| eval_logger.info(f"base_cache_dir", base_cache_dir, "cache_name", cache_name) |
Bug: Comma Omission Causes Prefix Concatenation
Missing commas in the answer_prefixes list cause unintended string concatenation (e.g., "The best option is" "The correct option is" becomes "The best option isThe correct option is"). This prevents the extract_characters_regex function from correctly identifying and removing the intended individual answer prefixes from responses.
lmms_eval/tasks/longtimescope/utils.py#L70-L73
lmms-eval/lmms_eval/tasks/longtimescope/utils.py
Lines 70 to 73 in ea8ff21
| "The answer", | |
| "The best option is" "The correct option is", | |
| "Best answer:" "Best option:", | |
| ] |
BugBot free trial expires on July 22, 2025
Learn more in the Cursor dashboard.
Was this report helpful? Give feedback by reacting with 👍 or 👎
Before you open a pull-request, please check if a similar issue already exists or has been closed before.
When you open a pull-request, please be sure to include the following
If you meet the lint warnings, you can use following scripts to reformat code.
Thank you for your contributions!
Summary for the Request
Hi EvolvingLMMs-Lab,
This pull request introduces support for the TimeScope benchmark, a new benchmark for evaluating long-form video comprehension in Large Multimodal Models. Further information on the design and methodology of TimeScope is available here: https://github.com/orrzohar/blog/blob/main/timescope.md.
Please let us know any changes or additions are needed. We're happy to adjust to align with the repo's standards.
We appreciate your efforts in reviewing this contribution.
Best Regards,
The Apollo Team
Summary by CodeRabbit