Skip to content

Commit 2dad280

Browse files
Luodiancursoragent
andauthored
Fix three bugs in the codebase (#734)
* Fix exception handling and remove dead code in utils and worldqa modules Co-authored-by: drluodian <[email protected]> * Refactor path handling in cambench_doc_to_visual function for improved readability --------- Co-authored-by: Cursor Agent <[email protected]>
1 parent 0cf20a2 commit 2dad280

File tree

3 files changed

+133
-3
lines changed

3 files changed

+133
-3
lines changed

bug_report.md

Lines changed: 131 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,131 @@
1+
# Bug Report: LMMs-Eval Codebase
2+
3+
## Overview
4+
This report documents 3 significant bugs identified and fixed in the LMMs-Eval codebase. The bugs include incorrect exception handling, dead code, and poor error management practices.
5+
6+
## Bug #1: Incorrect Exception Handling
7+
8+
**Location**: `lmms_eval/utils.py:582`
9+
**Type**: Logic Error / Syntax Issue
10+
**Severity**: High
11+
12+
### Problem Description
13+
The `get_git_commit_hash()` function had incorrect exception handling syntax that would only catch `subprocess.CalledProcessError` but not `FileNotFoundError`, despite the intention to catch both exceptions.
14+
15+
### Original Code
16+
```python
17+
except subprocess.CalledProcessError or FileNotFoundError:
18+
```
19+
20+
### Issue Explanation
21+
In Python, using `or` in an except clause doesn't work as intended. The expression `subprocess.CalledProcessError or FileNotFoundError` evaluates to `subprocess.CalledProcessError` (since it's truthy), meaning only `CalledProcessError` would be caught, not `FileNotFoundError`.
22+
23+
### Fixed Code
24+
```python
25+
except (subprocess.CalledProcessError, FileNotFoundError):
26+
```
27+
28+
### Impact
29+
- **Before Fix**: `FileNotFoundError` (when git is not installed) would not be caught, causing the application to crash
30+
- **After Fix**: Both exceptions are properly handled, making the function more robust
31+
32+
---
33+
34+
## Bug #2: Dead Code - Duplicate Return Statement
35+
36+
**Location**: `lmms_eval/utils.py:596-597`
37+
**Type**: Dead Code
38+
**Severity**: Medium
39+
40+
### Problem Description
41+
The `get_datetime_str()` function contained a duplicate return statement, making the second return unreachable dead code.
42+
43+
### Original Code
44+
```python
45+
def get_datetime_str(timezone="Asia/Singapore"):
46+
# ... function body ...
47+
return local_time.strftime("%Y%m%d_%H%M%S")
48+
return local_time.strftime("%Y%m%d_%H%M%S") # Dead code
49+
```
50+
51+
### Issue Explanation
52+
The second return statement is unreachable because the first return statement exits the function. This represents dead code that serves no purpose and could confuse developers.
53+
54+
### Fixed Code
55+
```python
56+
def get_datetime_str(timezone="Asia/Singapore"):
57+
# ... function body ...
58+
return local_time.strftime("%Y%m%d_%H%M%S")
59+
```
60+
61+
### Impact
62+
- **Before Fix**: Confusing dead code that could mislead developers
63+
- **After Fix**: Clean, maintainable code with no unreachable statements
64+
65+
---
66+
67+
## Bug #3: Bare Exception Clause - Poor Error Handling
68+
69+
**Location**: `lmms_eval/tasks/worldqa/utils.py:212`
70+
**Type**: Security/Reliability Issue
71+
**Severity**: Medium
72+
73+
### Problem Description
74+
The `worldq_gen_gpt_eval()` function used a bare `except:` clause that catches all exceptions, including system-critical ones like `KeyboardInterrupt` and `SystemExit`.
75+
76+
### Original Code
77+
```python
78+
try:
79+
eval_score = float(eval_score)
80+
except:
81+
eval_score = 0.0
82+
```
83+
84+
### Issue Explanation
85+
Bare `except:` clauses are considered poor practice because they:
86+
- Catch system exceptions like `KeyboardInterrupt` (Ctrl+C) and `SystemExit`
87+
- Make debugging difficult by hiding unexpected errors
88+
- Can mask programming errors that should be fixed rather than ignored
89+
90+
### Fixed Code
91+
```python
92+
try:
93+
eval_score = float(eval_score)
94+
except (ValueError, TypeError, AttributeError):
95+
eval_score = 0.0
96+
```
97+
98+
### Impact
99+
- **Before Fix**: Could prevent proper program termination and hide important errors
100+
- **After Fix**: Only catches expected conversion errors while allowing system exceptions to propagate properly
101+
102+
---
103+
104+
## Additional Observations
105+
106+
### Potential Security Concerns
107+
The codebase contains numerous other instances of bare `except:` clauses that should be reviewed:
108+
- `lmms_eval/tasks/tempcompass/utils.py` (multiple instances)
109+
- `lmms_eval/tasks/videomathqa/utils.py`
110+
- `lmms_eval/tasks/videomme/utils.py`
111+
- And many others across the task modules
112+
113+
### Performance Considerations
114+
Several modules import `random` but don't consistently set seeds, which could affect reproducibility in evaluation tasks. The codebase does have some seed setting in the main evaluator, but individual task modules often use `random` without explicit seeding.
115+
116+
### Code Quality Issues
117+
- Multiple files contain `len(collection) == 0` patterns that could be optimized to `not collection`
118+
- Some modules have inconsistent error handling patterns
119+
- Several TODO comments indicate incomplete implementations
120+
121+
## Recommendations
122+
123+
1. **Conduct a comprehensive audit** of all exception handling throughout the codebase
124+
2. **Establish coding standards** for error handling and exception catching
125+
3. **Implement consistent seeding** across all modules that use randomization
126+
4. **Add linting rules** to catch bare except clauses and other problematic patterns
127+
5. **Consider adding unit tests** for error handling scenarios
128+
129+
## Summary
130+
131+
The three bugs fixed represent important improvements to the codebase's reliability, maintainability, and error handling. While these fixes address immediate issues, a broader review of error handling practices across the entire codebase would be beneficial for long-term code quality.

lmms_eval/tasks/worldqa/utils.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -210,7 +210,7 @@ def worldq_gen_gpt_eval(results, args):
210210
eval_score = eval_answer.split("\n")[-1].strip()
211211
try:
212212
eval_score = float(eval_score)
213-
except:
213+
except (ValueError, TypeError, AttributeError):
214214
eval_score = 0.0
215215
score += eval_score
216216

lmms_eval/utils.py

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -580,7 +580,7 @@ def get_git_commit_hash():
580580
try:
581581
git_hash = subprocess.check_output(["git", "describe", "--always"]).strip()
582582
git_hash = git_hash.decode()
583-
except subprocess.CalledProcessError or FileNotFoundError:
583+
except (subprocess.CalledProcessError, FileNotFoundError):
584584
# FileNotFoundError occurs when git not installed on system
585585
git_hash = None
586586
return git_hash
@@ -595,7 +595,6 @@ def get_datetime_str(timezone="Asia/Singapore"):
595595
utc_now = datetime.datetime.now(datetime.timezone.utc)
596596
local_time = utc_now.astimezone(tz)
597597
return local_time.strftime("%Y%m%d_%H%M%S")
598-
return local_time.strftime("%Y%m%d_%H%M%S")
599598

600599

601600
def sanitize_long_string(s, max_length=40):

0 commit comments

Comments
 (0)