Skip to content

Commit 4a50250

Browse files
committed
docs: add comprehensive codebase optimization roadmap
- Identify 6 major optimization categories across codebase - Document 15+ specific optimization opportunities - Print statements in production code (4 instances) - Broad exception handling (40+ instances) - Resource management improvements (30+ instances) - Priority matrix with effort/value estimates - Quick wins and medium-effort improvements - Implementation phases for incremental improvement Also includes CLI refresh feature enhancement roadmap with: - Progress reporting with verbose/quiet flags - Batch progress feedback - Configuration file support - Dry-run mode for validation - Incremental refresh tracking - Performance metrics
1 parent b537130 commit 4a50250

File tree

2 files changed

+463
-0
lines changed

2 files changed

+463
-0
lines changed

CODEBASE_OPTIMIZATION.md

Lines changed: 320 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,320 @@
1+
# Codebase-Wide Optimization Opportunities
2+
3+
This issue tracks performance, code quality, and maintainability optimizations across the entire TagStudio codebase that were identified during code analysis.
4+
5+
## Category 1: Logging and Debugging (HIGHEST IMPACT)
6+
7+
### Issue 1.1: Print Statements in Production Code
8+
**Severity:** HIGH | **Impact:** Code Quality, Maintainability | **Files:** 4
9+
10+
Direct `print()` calls should be replaced with logger calls for consistency and runtime control.
11+
12+
**Locations:**
13+
- `src/tagstudio/core/library/json/library.py:715-729` (4 print statements)
14+
```python
15+
print("[LIBRARY] Formatting Tags to JSON...")
16+
# Should be:
17+
logger.info("Formatting Tags to JSON")
18+
```
19+
20+
**Why:**
21+
- Inconsistent with the rest of the codebase which uses structlog
22+
- Can't be controlled at runtime (log level, formatting, redirection)
23+
- Breaks abstraction in production builds
24+
- Violates the style guide requirement: "Use the logger system instead of print statements"
25+
26+
**Solution:**
27+
```python
28+
import structlog
29+
logger = structlog.get_logger(__name__)
30+
# Replace print() with logger.info(), logger.debug(), etc.
31+
```
32+
33+
---
34+
35+
### Issue 1.2: Broad Exception Handling Without Error Type Distinction
36+
**Severity:** MEDIUM | **Impact:** Debugging, Error Handling | **Count:** 40+ instances
37+
38+
Catching bare `Exception` or `Exception as e` without specific exception types makes debugging harder and can mask unexpected errors.
39+
40+
**Locations:**
41+
- `src/tagstudio/qt/previews/renderer.py:774` - `except Exception as e:`
42+
- `src/tagstudio/core/library/alchemy/library.py:527` - `except Exception as e:`
43+
- `src/tagstudio/qt/ts_qt.py:1588` - `except Exception as e:`
44+
- `src/tagstudio/core/ts_core.py:78` - `except Exception:`
45+
- Many others (40+ total instances)
46+
47+
**Why:**
48+
- Can't distinguish between expected errors (e.g., missing file) and bugs
49+
- Makes monitoring harder (all errors look the same)
50+
- Difficult to implement proper error recovery strategies
51+
52+
**Best Practice Pattern:**
53+
```python
54+
# Instead of:
55+
except Exception as e:
56+
logger.error("Failed", error=str(e))
57+
58+
# Use:
59+
except (FileNotFoundError, PermissionError) as e:
60+
logger.error("Expected error", error_type=type(e).__name__)
61+
except Exception:
62+
logger.exception("Unexpected error")
63+
```
64+
65+
---
66+
67+
## Category 2: Resource Management (MEDIUM IMPACT)
68+
69+
### Issue 2.1: Manual File Close Statements
70+
**Severity:** LOW | **Impact:** Code Safety, Maintainability | **Count:** 30+ instances
71+
72+
Some code uses manual `.close()` calls which can leak resources if exceptions occur.
73+
74+
**Locations:**
75+
- `src/tagstudio/qt/previews/renderer.py:1228, 1320, 1346, 1534`
76+
- `src/tagstudio/qt/previews/vendored/pydub/audio_segment.py:519, 542, 557, etc.`
77+
78+
**Why:**
79+
- Not resilient to exceptions between open and close
80+
- Verbose and error-prone
81+
82+
**Solution - Use Context Managers:**
83+
```python
84+
# Instead of:
85+
f = open(path)
86+
try:
87+
data = f.read()
88+
finally:
89+
f.close()
90+
91+
# Use:
92+
with open(path) as f:
93+
data = f.read()
94+
```
95+
96+
**Note:** Some files are in `vendored/` directory and shouldn't be modified. Focus on core code.
97+
98+
---
99+
100+
### Issue 2.2: Missing Null Checks Before Operations
101+
**Severity:** MEDIUM | **Impact:** Runtime Safety | **Scattered instances**
102+
103+
Some code doesn't validate objects exist before operations.
104+
105+
**Example Pattern to Look For:**
106+
```python
107+
# Risky:
108+
self.lib.library_dir.exists() # May crash if library_dir is None
109+
110+
# Safe:
111+
if self.lib.library_dir and self.lib.library_dir.exists():
112+
pass
113+
```
114+
115+
---
116+
117+
## Category 3: Iterator and Loop Optimization
118+
119+
### Issue 3.1: Unused Loop Variables
120+
**Severity:** LOW | **Impact:** Code Clarity | **Count:** 2-3 instances
121+
122+
Loops that don't use their iteration variable should use `_`.
123+
124+
**Already Fixed Examples:**
125+
- `src/tagstudio/core/cli_driver.py:73` - Now correctly uses `_`
126+
127+
**Pattern to Search For:**
128+
```python
129+
# Bad:
130+
for item in collection:
131+
pass # item not used
132+
133+
# Good:
134+
for _ in collection:
135+
pass
136+
```
137+
138+
---
139+
140+
### Issue 3.2: Iterator Value Tracking
141+
**Severity:** LOW | **Impact:** Code Quality | **1-2 instances**
142+
143+
Some iterators yield progress values that are overwritten without tracking intermediate values.
144+
145+
**Example:**
146+
```python
147+
# Current (only keeps last value):
148+
files_scanned = 0
149+
for count in tracker.refresh_dir(path):
150+
files_scanned = count # Loses intermediate counts
151+
152+
# Better (if progress feedback needed):
153+
files_scanned = 0
154+
for count in tracker.refresh_dir(path):
155+
if verbose_mode:
156+
logger.debug("Progress", files_scanned=count)
157+
files_scanned = count # Final value
158+
```
159+
160+
---
161+
162+
## Category 4: Code Standardization
163+
164+
### Issue 4.1: Inconsistent Error Message Formatting
165+
**Severity:** LOW | **Impact:** Consistency, Readability | **Scattered**
166+
167+
Error messages use different formats throughout the codebase.
168+
169+
**Examples:**
170+
- `logger.error("[Preview Panel] Error updating selection", error=e)`
171+
- `logger.error("Library path does not exist", path=path)`
172+
- `logger.error("[CacheManager] Failed to remove folder", folder=folder)`
173+
174+
**Recommendation:**
175+
Establish a consistent format:
176+
```python
177+
# Preferred format (without brackets, structured fields):
178+
logger.error("Failed to remove folder", path=folder, error_type=type(e).__name__)
179+
```
180+
181+
---
182+
183+
### Issue 4.2: Mixed Logger Initialization
184+
**Severity:** LOW | **Impact:** Consistency | **Scattered**
185+
186+
Some modules use `structlog.get_logger()` and others use `structlog.get_logger(__name__)`.
187+
188+
**Current Patterns:**
189+
- Most use: `logger = structlog.get_logger(__name__)`
190+
- Tests use: `logger = structlog.get_logger()`
191+
192+
**Recommendation:**
193+
Standardize to `structlog.get_logger(__name__)` everywhere for better context.
194+
195+
---
196+
197+
## Category 5: Performance Bottlenecks
198+
199+
### Issue 5.1: Repeated Path Operations
200+
**Severity:** LOW | **Impact:** Performance | **Scattered**
201+
202+
Some code repeats `.exists()` or `.is_dir()` checks on the same path.
203+
204+
**Pattern:**
205+
```python
206+
# Instead of:
207+
if path.exists() and path.is_dir():
208+
# ... do work
209+
if path.exists(): # Repeated!
210+
pass
211+
212+
# Use:
213+
if path.exists() and path.is_dir():
214+
is_dir = True
215+
# Reuse is_dir variable
216+
```
217+
218+
---
219+
220+
### Issue 5.2: String Concatenation in Loops
221+
**Severity:** LOW | **Impact:** Performance | **3-5 instances**
222+
223+
Building strings with `+` in loops is slower than f-strings or join.
224+
225+
---
226+
227+
## Category 6: Type Safety
228+
229+
### Issue 6.1: Missing Type Hints
230+
**Severity:** LOW | **Impact:** Code Quality, IDE Support | **Widespread**
231+
232+
While many functions have type hints, some older code lacks them.
233+
234+
**Example:**
235+
```python
236+
# Missing return type:
237+
def get_error_message(exception): # Should be -> str:
238+
return str(exception)
239+
240+
# Missing parameter types:
241+
def process(data): # Should be -> dict:
242+
return {"status": "ok"}
243+
```
244+
245+
---
246+
247+
## Optimization Priority Matrix
248+
249+
| Category | Severity | Effort | Value | Priority |
250+
|----------|----------|--------|-------|----------|
251+
| Print statements → Logger | HIGH | Low | HIGH | **CRITICAL** |
252+
| Exception handling | MEDIUM | Medium | HIGH | **HIGH** |
253+
| File context managers | MEDIUM | Medium | MEDIUM | **HIGH** |
254+
| Unused loop variables | LOW | Very Low | LOW | **LOW** |
255+
| Error message consistency | LOW | Medium | LOW | **MEDIUM** |
256+
| Logger initialization | LOW | Low | LOW | **MEDIUM** |
257+
| Type hints | LOW | Medium | MEDIUM | **MEDIUM** |
258+
| Path operation caching | LOW | Low | LOW | **LOW** |
259+
260+
---
261+
262+
## Quick Wins (< 30 minutes each)
263+
264+
1. **Fix Print Statements** (4 instances in json/library.py)
265+
- ~5 lines per file
266+
- Immediate improvement
267+
268+
2. **Unused Loop Variables** (Already partially done in CLI refresh)
269+
- Search for `for \w+ in` followed by `pass`
270+
- Rename to `_`
271+
272+
3. **Logger Initialization** (Standardize to `__name__`)
273+
- Search and replace in test files
274+
- 1-2 minutes per file
275+
276+
---
277+
278+
## Medium-Effort Improvements (1-3 hours)
279+
280+
1. **Exception Handling Audit** (renderer.py, library.py)
281+
- Identify exceptions that can be caught specifically
282+
- Better error categorization
283+
- Improved debugging
284+
285+
2. **File Context Managers**
286+
- Focus on core code (not vendored)
287+
- `src/tagstudio/qt/previews/renderer.py` - 4 instances
288+
- ~5 minutes per instance
289+
290+
---
291+
292+
## Long-term Improvements (Nice to have)
293+
294+
1. Complete type hint coverage
295+
2. Centralized error handling strategy
296+
3. Structured error response objects
297+
4. Performance profiling and bottleneck analysis
298+
299+
---
300+
301+
## Related Issues
302+
- #1270 - Command-line library refresh (uses these patterns)
303+
- Style guide violations
304+
- Code quality improvements
305+
306+
## Notes for Contributors
307+
308+
- **Print statements:** Use logger.info() for user-facing messages, logger.debug() for developer info
309+
- **Exception handling:** Be specific when possible; use bare Exception only for truly unexpected errors
310+
- **File operations:** Always use context managers (with statements)
311+
- **Loop variables:** Use `_` for unused loop variables per PEP 8
312+
- **Error messages:** Keep consistent format with structured fields for logging
313+
314+
## Implementation Suggestions
315+
316+
These optimizations can be implemented incrementally:
317+
1. **Phase 1:** Fix print statements + unused variables (quick wins)
318+
2. **Phase 2:** Improve exception handling in hot paths (renderer, library)
319+
3. **Phase 3:** Standardize error messages and logging
320+
4. **Phase 4:** Add missing type hints (lowest priority, highest effort)

0 commit comments

Comments
 (0)