You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Optimize dependency resolution performance by caching parsed dependencies
PROBLEM IDENTIFIED:
Before this optimization, dependencies were being parsed twice during
candidate resolution:
1. During _check_metadata_consistency() for validation (line 233 in original code)
2. During iter_dependencies() for actual dependency resolution (line 258)
This caused significant performance issues because:
- dist.iter_provided_extras() was called multiple times
- dist.iter_dependencies() was called multiple times
- Parsing requirements from package metadata is computationally expensive
- The TODO comment at line 230 specifically noted this performance problem
SOLUTION IMPLEMENTED:
Added caching mechanism with two new instance variables:
- _cached_dependencies: stores list[Requirement] after parsing once
- _cached_extras: stores list[NormalizedName] after parsing once
HOW THE CACHING WORKS:
1. Cache variables are initialized as None in __init__()
2. During _prepare() -> _check_metadata_consistency(), dependencies are parsed
and cached during validation
3. During iter_dependencies(), the cached results are reused via
_get_cached_dependencies()
4. Cache is populated lazily - only when first accessed
5. Subsequent calls to iter_dependencies() use cached data (no re-parsing)
6. Each candidate instance has its own cache (thread-safe)
ADDITIONAL OPTIMIZATIONS:
- Also optimized ExtrasCandidate.iter_dependencies() to cache
iter_provided_extras() results
- Ensures consistency between validation and dependency resolution phases
TESTING PERFORMED:
1. Created comprehensive test script (test_performance_optimization.py)
2. Used mock objects to verify iter_provided_extras() and iter_dependencies()
are called at most once
3. Verified pip install --dry-run works correctly with caching
4. Test results showed 0 additional calls to parsing methods during multiple
iter_dependencies() invocations
5. Functional testing confirmed dependency resolution still works correctly
PERFORMANCE IMPACT:
- Eliminates duplicate parsing during metadata consistency checks
- Reduces CPU time for packages with complex dependency trees
- Especially beneficial for packages with many dependencies
- Memory overhead is minimal (only stores parsed results, not raw metadata)
Resolves TODO comment about performance in candidates.py line 230
0 commit comments