Fix three bugs in the codebase (#734)

Luodian · cursoragent · web-flow · commit 2dad280d5276 · 2025-07-02T01:05:10.000+08:00
* Fix exception handling and remove dead code in utils and worldqa modules

Co-authored-by: drluodian &lt;drluodian@gmail.com&gt;

* Refactor path handling in cambench_doc_to_visual function for improved readability

---------

Co-authored-by: Cursor Agent &lt;cursoragent@cursor.com&gt;
diff --git a/bug_report.md b/bug_report.md
@@ -0,0 +1,131 @@
+# Bug Report: LMMs-Eval Codebase
+
+## Overview
+This report documents 3 significant bugs identified and fixed in the LMMs-Eval codebase. The bugs include incorrect exception handling, dead code, and poor error management practices.
+
+## Bug #1: Incorrect Exception Handling
+
+**Location**: `lmms_eval/utils.py:582`  
+**Type**: Logic Error / Syntax Issue  
+**Severity**: High  
+
+### Problem Description
+The `get_git_commit_hash()` function had incorrect exception handling syntax that would only catch `subprocess.CalledProcessError` but not `FileNotFoundError`, despite the intention to catch both exceptions.
+
+### Original Code
+```python
+except subprocess.CalledProcessError or FileNotFoundError:
+```
+
+### Issue Explanation
+In Python, using `or` in an except clause doesn't work as intended. The expression `subprocess.CalledProcessError or FileNotFoundError` evaluates to `subprocess.CalledProcessError` (since it's truthy), meaning only `CalledProcessError` would be caught, not `FileNotFoundError`.
+
+### Fixed Code
+```python
+except (subprocess.CalledProcessError, FileNotFoundError):
+```
+
+### Impact
+- **Before Fix**: `FileNotFoundError` (when git is not installed) would not be caught, causing the application to crash
+- **After Fix**: Both exceptions are properly handled, making the function more robust
+
+---
+
+## Bug #2: Dead Code - Duplicate Return Statement
+
+**Location**: `lmms_eval/utils.py:596-597`  
+**Type**: Dead Code  
+**Severity**: Medium  
+
+### Problem Description
+The `get_datetime_str()` function contained a duplicate return statement, making the second return unreachable dead code.
+
+### Original Code
+```python
+def get_datetime_str(timezone="Asia/Singapore"):
+    # ... function body ...
+    return local_time.strftime("%Y%m%d_%H%M%S")
+    return local_time.strftime("%Y%m%d_%H%M%S")  # Dead code
+```
+
+### Issue Explanation
+The second return statement is unreachable because the first return statement exits the function. This represents dead code that serves no purpose and could confuse developers.
+
+### Fixed Code
+```python
+def get_datetime_str(timezone="Asia/Singapore"):
+    # ... function body ...
+    return local_time.strftime("%Y%m%d_%H%M%S")
+```
+
+### Impact
+- **Before Fix**: Confusing dead code that could mislead developers
+- **After Fix**: Clean, maintainable code with no unreachable statements
+
+---
+
+## Bug #3: Bare Exception Clause - Poor Error Handling
+
+**Location**: `lmms_eval/tasks/worldqa/utils.py:212`  
+**Type**: Security/Reliability Issue  
+**Severity**: Medium  
+
+### Problem Description
+The `worldq_gen_gpt_eval()` function used a bare `except:` clause that catches all exceptions, including system-critical ones like `KeyboardInterrupt` and `SystemExit`.
+
+### Original Code
+```python
+try:
+    eval_score = float(eval_score)
+except:
+    eval_score = 0.0
+```
+
+### Issue Explanation
+Bare `except:` clauses are considered poor practice because they:
+- Catch system exceptions like `KeyboardInterrupt` (Ctrl+C) and `SystemExit`
+- Make debugging difficult by hiding unexpected errors
+- Can mask programming errors that should be fixed rather than ignored
+
+### Fixed Code
+```python
+try:
+    eval_score = float(eval_score)
+except (ValueError, TypeError, AttributeError):
+    eval_score = 0.0
+```
+
+### Impact
+- **Before Fix**: Could prevent proper program termination and hide important errors
+- **After Fix**: Only catches expected conversion errors while allowing system exceptions to propagate properly
+
+---
+
+## Additional Observations
+
+### Potential Security Concerns
+The codebase contains numerous other instances of bare `except:` clauses that should be reviewed:
+- `lmms_eval/tasks/tempcompass/utils.py` (multiple instances)
+- `lmms_eval/tasks/videomathqa/utils.py`  
+- `lmms_eval/tasks/videomme/utils.py`
+- And many others across the task modules
+
+### Performance Considerations
+Several modules import `random` but don't consistently set seeds, which could affect reproducibility in evaluation tasks. The codebase does have some seed setting in the main evaluator, but individual task modules often use `random` without explicit seeding.
+
+### Code Quality Issues
+- Multiple files contain `len(collection) == 0` patterns that could be optimized to `not collection`
+- Some modules have inconsistent error handling patterns
+- Several TODO comments indicate incomplete implementations
+
+## Recommendations
+
+1. **Conduct a comprehensive audit** of all exception handling throughout the codebase
+2. **Establish coding standards** for error handling and exception catching
+3. **Implement consistent seeding** across all modules that use randomization
+4. **Add linting rules** to catch bare except clauses and other problematic patterns
+5. **Consider adding unit tests** for error handling scenarios
+
+## Summary
+
+The three bugs fixed represent important improvements to the codebase's reliability, maintainability, and error handling. While these fixes address immediate issues, a broader review of error handling practices across the entire codebase would be beneficial for long-term code quality.
diff --git a/lmms_eval/tasks/worldqa/utils.py b/lmms_eval/tasks/worldqa/utils.py
@@ -210,7 +210,7 @@ def worldq_gen_gpt_eval(results, args):
         eval_score = eval_answer.split("\n")[-1].strip()
         try:
             eval_score = float(eval_score)
-        except:
+        except (ValueError, TypeError, AttributeError):
             eval_score = 0.0
         score += eval_score
 
diff --git a/lmms_eval/utils.py b/lmms_eval/utils.py
@@ -580,7 +580,7 @@ def get_git_commit_hash():
     try:
         git_hash = subprocess.check_output(["git", "describe", "--always"]).strip()
         git_hash = git_hash.decode()
-    except subprocess.CalledProcessError or FileNotFoundError:
+    except (subprocess.CalledProcessError, FileNotFoundError):
         # FileNotFoundError occurs when git not installed on system
         git_hash = None
     return git_hash
@@ -595,7 +595,6 @@ def get_datetime_str(timezone="Asia/Singapore"):
     utc_now = datetime.datetime.now(datetime.timezone.utc)
     local_time = utc_now.astimezone(tz)
     return local_time.strftime("%Y%m%d_%H%M%S")
-    return local_time.strftime("%Y%m%d_%H%M%S")
 
 
 def sanitize_long_string(s, max_length=40):