TagStudioDev
diff --git a/‎CODEBASE_OPTIMIZATION.md‎
Lines changed: 320 additions & 0 deletions b/‎CODEBASE_OPTIMIZATION.md‎
Lines changed: 320 additions & 0 deletions
@@ -0,0 +1,320 @@
+# Codebase-Wide Optimization Opportunities
+
+This issue tracks performance, code quality, and maintainability optimizations across the entire TagStudio codebase that were identified during code analysis.
+
+## Category 1: Logging and Debugging (HIGHEST IMPACT)
+
+### Issue 1.1: Print Statements in Production Code
+**Severity:** HIGH | **Impact:** Code Quality, Maintainability | **Files:** 4
+
+Direct `print()` calls should be replaced with logger calls for consistency and runtime control.
+
+**Locations:**
+- `src/tagstudio/core/library/json/library.py:715-729` (4 print statements)
+  ```python
+  print("[LIBRARY] Formatting Tags to JSON...")
+  # Should be:
+  logger.info("Formatting Tags to JSON")
+  ```
+
+**Why:** 
+- Inconsistent with the rest of the codebase which uses structlog
+- Can't be controlled at runtime (log level, formatting, redirection)
+- Breaks abstraction in production builds
+- Violates the style guide requirement: "Use the logger system instead of print statements"
+
+**Solution:** 
+```python
+import structlog
+logger = structlog.get_logger(__name__)
+# Replace print() with logger.info(), logger.debug(), etc.
+```
+
+---
+
+### Issue 1.2: Broad Exception Handling Without Error Type Distinction
+**Severity:** MEDIUM | **Impact:** Debugging, Error Handling | **Count:** 40+ instances
+
+Catching bare `Exception` or `Exception as e` without specific exception types makes debugging harder and can mask unexpected errors.
+
+**Locations:**
+- `src/tagstudio/qt/previews/renderer.py:774` - `except Exception as e:`
+- `src/tagstudio/core/library/alchemy/library.py:527` - `except Exception as e:`
+- `src/tagstudio/qt/ts_qt.py:1588` - `except Exception as e:`
+- `src/tagstudio/core/ts_core.py:78` - `except Exception:`
+- Many others (40+ total instances)
+
+**Why:**
+- Can't distinguish between expected errors (e.g., missing file) and bugs
+- Makes monitoring harder (all errors look the same)
+- Difficult to implement proper error recovery strategies
+
+**Best Practice Pattern:**
+```python
+# Instead of:
+except Exception as e:
+    logger.error("Failed", error=str(e))
+
+# Use:
+except (FileNotFoundError, PermissionError) as e:
+    logger.error("Expected error", error_type=type(e).__name__)
+except Exception:
+    logger.exception("Unexpected error")
+```
+
+---
+
+## Category 2: Resource Management (MEDIUM IMPACT)
+
+### Issue 2.1: Manual File Close Statements
+**Severity:** LOW | **Impact:** Code Safety, Maintainability | **Count:** 30+ instances
+
+Some code uses manual `.close()` calls which can leak resources if exceptions occur.
+
+**Locations:**
+- `src/tagstudio/qt/previews/renderer.py:1228, 1320, 1346, 1534`
+- `src/tagstudio/qt/previews/vendored/pydub/audio_segment.py:519, 542, 557, etc.`
+
+**Why:**
+- Not resilient to exceptions between open and close
+- Verbose and error-prone
+
+**Solution - Use Context Managers:**
+```python
+# Instead of:
+f = open(path)
+try:
+    data = f.read()
+finally:
+    f.close()
+
+# Use:
+with open(path) as f:
+    data = f.read()
+```
+
+**Note:** Some files are in `vendored/` directory and shouldn't be modified. Focus on core code.
+
+---
+
+### Issue 2.2: Missing Null Checks Before Operations
+**Severity:** MEDIUM | **Impact:** Runtime Safety | **Scattered instances**
+
+Some code doesn't validate objects exist before operations.
+
+**Example Pattern to Look For:**
+```python
+# Risky:
+self.lib.library_dir.exists()  # May crash if library_dir is None
+
+# Safe:
+if self.lib.library_dir and self.lib.library_dir.exists():
+    pass
+```
+
+---
+
+## Category 3: Iterator and Loop Optimization
+
+### Issue 3.1: Unused Loop Variables
+**Severity:** LOW | **Impact:** Code Clarity | **Count:** 2-3 instances
+
+Loops that don't use their iteration variable should use `_`.
+
+**Already Fixed Examples:**
+- `src/tagstudio/core/cli_driver.py:73` - Now correctly uses `_`
+
+**Pattern to Search For:**
+```python
+# Bad:
+for item in collection:
+    pass  # item not used
+
+# Good:
+for _ in collection:
+    pass
+```
+
+---
+
+### Issue 3.2: Iterator Value Tracking
+**Severity:** LOW | **Impact:** Code Quality | **1-2 instances**
+
+Some iterators yield progress values that are overwritten without tracking intermediate values.
+
+**Example:**
+```python
+# Current (only keeps last value):
+files_scanned = 0
+for count in tracker.refresh_dir(path):
+    files_scanned = count  # Loses intermediate counts
+
+# Better (if progress feedback needed):
+files_scanned = 0
+for count in tracker.refresh_dir(path):
+    if verbose_mode:
+        logger.debug("Progress", files_scanned=count)
+files_scanned = count  # Final value
+```
+
+---
+
+## Category 4: Code Standardization
+
+### Issue 4.1: Inconsistent Error Message Formatting
+**Severity:** LOW | **Impact:** Consistency, Readability | **Scattered**
+
+Error messages use different formats throughout the codebase.
+
+**Examples:**
+- `logger.error("[Preview Panel] Error updating selection", error=e)`
+- `logger.error("Library path does not exist", path=path)`
+- `logger.error("[CacheManager] Failed to remove folder", folder=folder)`
+
+**Recommendation:**
+Establish a consistent format:
+```python
+# Preferred format (without brackets, structured fields):
+logger.error("Failed to remove folder", path=folder, error_type=type(e).__name__)
+```
+
+---
+
+### Issue 4.2: Mixed Logger Initialization
+**Severity:** LOW | **Impact:** Consistency | **Scattered**
+
+Some modules use `structlog.get_logger()` and others use `structlog.get_logger(__name__)`.
+
+**Current Patterns:**
+- Most use: `logger = structlog.get_logger(__name__)`
+- Tests use: `logger = structlog.get_logger()`
+
+**Recommendation:**
+Standardize to `structlog.get_logger(__name__)` everywhere for better context.
+
+---
+
+## Category 5: Performance Bottlenecks
+
+### Issue 5.1: Repeated Path Operations
+**Severity:** LOW | **Impact:** Performance | **Scattered**
+
+Some code repeats `.exists()` or `.is_dir()` checks on the same path.
+
+**Pattern:**
+```python
+# Instead of:
+if path.exists() and path.is_dir():
+    # ... do work
+    if path.exists():  # Repeated!
+        pass
+
+# Use:
+if path.exists() and path.is_dir():
+    is_dir = True
+    # Reuse is_dir variable
+```
+
+---
+
+### Issue 5.2: String Concatenation in Loops
+**Severity:** LOW | **Impact:** Performance | **3-5 instances**
+
+Building strings with `+` in loops is slower than f-strings or join.
+
+---
+
+## Category 6: Type Safety
+
+### Issue 6.1: Missing Type Hints
+**Severity:** LOW | **Impact:** Code Quality, IDE Support | **Widespread**
+
+While many functions have type hints, some older code lacks them.
+
+**Example:**
+```python
+# Missing return type:
+def get_error_message(exception):  # Should be -> str:
+    return str(exception)
+
+# Missing parameter types:
+def process(data):  # Should be -> dict:
+    return {"status": "ok"}
+```
+
+---
+
+## Optimization Priority Matrix
+
+| Category | Severity | Effort | Value | Priority |
+|----------|----------|--------|-------|----------|
+| Print statements → Logger | HIGH | Low | HIGH | **CRITICAL** |
+| Exception handling | MEDIUM | Medium | HIGH | **HIGH** |
+| File context managers | MEDIUM | Medium | MEDIUM | **HIGH** |
+| Unused loop variables | LOW | Very Low | LOW | **LOW** |
+| Error message consistency | LOW | Medium | LOW | **MEDIUM** |
+| Logger initialization | LOW | Low | LOW | **MEDIUM** |
+| Type hints | LOW | Medium | MEDIUM | **MEDIUM** |
+| Path operation caching | LOW | Low | LOW | **LOW** |
+
+---
+
+## Quick Wins (< 30 minutes each)
+
+1. **Fix Print Statements** (4 instances in json/library.py)
+   - ~5 lines per file
+   - Immediate improvement
+
+2. **Unused Loop Variables** (Already partially done in CLI refresh)
+   - Search for `for \w+ in` followed by `pass`
+   - Rename to `_`
+
+3. **Logger Initialization** (Standardize to `__name__`)
+   - Search and replace in test files
+   - 1-2 minutes per file
+
+---
+
+## Medium-Effort Improvements (1-3 hours)
+
+1. **Exception Handling Audit** (renderer.py, library.py)
+   - Identify exceptions that can be caught specifically
+   - Better error categorization
+   - Improved debugging
+
+2. **File Context Managers**
+   - Focus on core code (not vendored)
+   - `src/tagstudio/qt/previews/renderer.py` - 4 instances
+   - ~5 minutes per instance
+
+---
+
+## Long-term Improvements (Nice to have)
+
+1. Complete type hint coverage
+2. Centralized error handling strategy
+3. Structured error response objects
+4. Performance profiling and bottleneck analysis
+
+---
+
+## Related Issues
+- #1270 - Command-line library refresh (uses these patterns)
+- Style guide violations
+- Code quality improvements
+
+## Notes for Contributors
+
+- **Print statements:** Use logger.info() for user-facing messages, logger.debug() for developer info
+- **Exception handling:** Be specific when possible; use bare Exception only for truly unexpected errors
+- **File operations:** Always use context managers (with statements)
+- **Loop variables:** Use `_` for unused loop variables per PEP 8
+- **Error messages:** Keep consistent format with structured fields for logging
+
+## Implementation Suggestions
+
+These optimizations can be implemented incrementally:
+1. **Phase 1:** Fix print statements + unused variables (quick wins)
+2. **Phase 2:** Improve exception handling in hot paths (renderer, library)
+3. **Phase 3:** Standardize error messages and logging
+4. **Phase 4:** Add missing type hints (lowest priority, highest effort)