-
Notifications
You must be signed in to change notification settings - Fork 21
FIX: --no-gen-tests Flag Causes Path Resolution Error with Module-Root #1086
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…oss-platform compatibility. Remove redundant path matching logic and enhance logging in parse_test_xml for better debugging of test type registration issues.
The optimized code achieves a **702% speedup** (from 4.19ms to 522μs) by adding a single, strategic optimization: **`@lru_cache(maxsize=1024)` on the `_normalize_path_for_comparison` method**. ## Why This Works The original line profiler shows that **98.1% of the normalization time** is spent in `path.resolve()` - an expensive filesystem operation that converts paths to absolute canonical form. When `get_by_original_file_path` searches through test files, it calls `_normalize_path_for_comparison` repeatedly for: 1. The input `file_path` (once per search) 2. Each `test_file.original_file_path` in the collection (potentially many times) Without caching, identical paths are re-normalized on every search, repeating the expensive `resolve()` operation unnecessarily. ## The Optimization By adding `@lru_cache(maxsize=1024)`, Python memoizes the normalization results. When the same `Path` object is normalized multiple times: - **First call**: Performs the expensive `resolve()` operation and caches the result - **Subsequent calls**: Returns the cached string instantly (hash table lookup) Since `Path` objects are hashable and the function is stateless, this is a perfect caching scenario. ## Test Results Analysis The annotated tests confirm the optimization excels when: - **Repeated path lookups** occur: `test_large_scale_many_entries_with_single_match` shows **778% speedup** (3.73ms → 424μs) because the query path is normalized once and cached, then each comparison against 500+ entries reuses cached normalizations for stored paths - **Multiple searches** use the same paths: Tests like `test_basic_match_with_exact_path_string` (734% faster) and `test_multiple_files_first_match_returned` (544% faster) benefit from cached normalizations across test runs - **Cache hits dominate**: Most tests show 540-730% speedups, indicating the cache effectively eliminates repeated `resolve()` calls The one exception (`test_resolve_exception_uses_absolute_fallback` at 9% slower) involves exception handling with custom path objects that don't benefit from caching, but this represents an edge case. ## Impact This optimization is particularly valuable if `get_by_original_file_path` is called frequently in a hot path (e.g., during test collection, file matching, or validation loops where the same paths are queried repeatedly). The 1024-entry cache is large enough to handle typical project sizes while avoiding memory bloat.
⚡️ Codeflash found optimizations for this PR📄 702% (7.02x) speedup for
|
The optimized code achieves a **72% speedup** (from 597μs to 346μs) by adding `@lru_cache(maxsize=4096)` to the `_normalize_path_for_comparison` method. This single change provides substantial performance gains because it caches the results of expensive path normalization operations. **Why this optimization works:** 1. **Eliminates redundant I/O operations**: The line profiler shows that `path.resolve()` consumes 79.2% of the normalization time (2.07ms out of 2.61ms). This operation requires filesystem I/O to resolve symbolic links and compute absolute paths. With caching, repeated calls with the same `Path` object return instantly from memory. 2. **Exploits repetitive access patterns**: In `get_test_type_by_original_file_path`, the method normalizes both the query path AND every `original_file_path` in `test_files` during iteration. When the same paths are queried multiple times or when the same test files are checked repeatedly, the cache eliminates these redundant normalizations. 3. **Negligible memory cost**: With `maxsize=4096`, the cache can store up to 4096 path normalizations. Since each cache entry stores a path string (typically <200 bytes), total memory overhead is minimal (<1MB worst case). **Performance characteristics from test results:** - **Best case** (cache hits): 504-635% faster for repeated queries of the same paths (e.g., `test_returns_matching_test_type_for_equivalent_paths`) - **Worst case** (cache misses): 10-36% slower for large-scale searches through many unique paths, where cache overhead slightly exceeds benefits - **Typical case**: Most real-world scenarios involve querying a limited set of file paths repeatedly, making this optimization highly effective **Key behavioral note**: The cache persists across method calls, so applications that repeatedly query the same test files will see compounding benefits over time.
⚡️ Codeflash found optimizations for this PR📄 73% (0.73x) speedup for
|
…2026-01-17T11.16.14 ⚡️ Speed up method `TestFiles.get_test_type_by_original_file_path` by 73% in PR #1086 (`fix-path-resolution/no-gen-tests`)
…ze-pr1086-2026-01-17T11.10.22
…2026-01-17T11.10.22 ⚡️ Speed up method `TestFiles.get_by_original_file_path` by 702% in PR #1086 (`fix-path-resolution/no-gen-tests`)
⚡️ Codeflash found optimizations for this PR📄 14,620% (146.20x) speedup for
|
⚡️ Codeflash found optimizations for this PR📄 2,276% (22.76x) speedup for
|
|
@claude review this |
|
Claude finished @aseembits93's task in 2m 39s —— View job Code ReviewTodo List
SummaryThis PR successfully addresses the crash when using ✅ Strengths
🔍 Issues Found1. Performance: Inefficient loop patterns (codeflash/models/models.py:375-405)All three methods now use explicit for-loops that call normalization on every iteration: # get_by_original_file_path (lines 376-383)
normalized = self._normalize_path_for_comparison(file_path)
for test_file in self.test_files:
if test_file.original_file_path is None:
continue
normalized_test_path = self._normalize_path_for_comparison(test_file.original_file_path)
if normalized == normalized_test_path:
return test_file
return NoneWhy this matters:
Good news: The codeflash-ai bot has already identified and created optimization PRs:
These dependent PRs should be reviewed and merged to address the performance concerns. 💡 Suggestions2. Error message could be more actionable (codeflash/verification/parse_test_output.py:320-326)Current message: logger.warning(
f"Test type not found for '{test_file_path}'. "
f"Registered test files: {registered_paths}. Skipping test case."
)Suggestion: Add context about the likely cause: logger.warning(
f"Test type not found for '{test_file_path}'. "
f"This may occur when using --no-gen-tests with module-root tests. "
f"Registered test files: {registered_paths}. Skipping test case."
)This helps users understand when this is expected behavior vs. a configuration issue. 3. Consider path normalization at registration timeInstead of normalizing on every lookup, consider normalizing once when Trade-off: Adds complexity vs. performance gain. Given the dependent optimization PRs, this may not be necessary. Monitor performance after merging those PRs. 4. Cache size could be documented (codeflash/models/models.py:408)@staticmethod
@lru_cache(maxsize=4096) # Reasonable for most projects; increase for large monorepos
def _normalize_path_for_comparison(path: Path) -> str:Rationale: 4096 unique paths should handle most projects, but large monorepos might benefit from a larger cache or making this configurable. 🎯 Testing Recommendations
📋 Code Quality NotesGood practices observed:
Minor style notes:
Verdict✅ Approve with recommendations - The code is functionally correct and solves the reported issue. The performance concerns are already being addressed in dependent PRs. Action Items:
Correctness: ✅ Fixes the bug |
Fix: Handle missing test file registration gracefully with improved path matching
Path normalization: Added a helper that normalizes paths before comparison:
Error handling: Replaced the assertion with a warning that:
Changes
models.py: Added_normalize_path_for_comparison()and updated path comparison methods to use itparse_test_output.py: Replaced assertion with warning and continueImpact
Fixes the crash and improves robustness when test file paths don't match exactly.