Skip to content

Commit 79d29c4

Browse files
authored
fix: enrichment application and contract extraction fixes (#94)
* fix: enrichment application and contract extraction improvements - Fix enrichment not being applied when no source files changed - Check for enrichment before early exit in _check_incremental_changes - Mark bundle for regeneration when enrichment is provided - Ensure bundle is saved after enrichment is applied - Fix --force flag performance regression - Skip hash checking when --force is used for contract extraction - Process all features directly without expensive hash computation - Significantly improves performance for large bundles with --force - Fix type checking errors - Fix possibly unbound variables (is_test_mode, compute_file_hash, features_with_files) - Properly scope variables within conditional blocks - Ensure all variables are initialized before use - Add comprehensive test coverage - Add integration tests for enrichment and contract extraction bugs - Add unit tests for contract extraction logic - All tests now passing (11/11 integration tests) - Update version to 0.23.1 - Sync version across pyproject.toml, setup.py, src/__init__.py, src/specfact_cli/__init__.py - Update CHANGELOG.md with bug fixes and test coverage additions * fix: keep progress bars visible during enhanced analysis - Remove progress.remove_task() calls for relationship and graph analysis - Keep final progress bars visible with completion state instead of removing them - Prevents blank lines from appearing when progress bars disappear - Progress bars now show final completion message and remain visible * feat: show current feature/contract in contract extraction progress - Add detailed progress updates showing which feature is currently being processed - For sequential mode: show feature name before and after processing - For parallel mode: show completed feature name and pending count - Progress now displays: 'Extracting contract from FEATURE-NAME... (X/Total, Y pending)' - Improves visibility during long-running contract extraction operations - Helps identify which features are taking longer to process * docs: add contract extraction performance analysis - Document current performance bottlenecks - Identify AST parsing as primary bottleneck - Propose file-level caching optimization (3-5x speedup) - Suggest batch processing and early exit optimizations - Estimate 5-10x overall speedup potential * perf: implement AST caching and early exit optimizations for contract extraction - Add file-level AST caching to prevent redundant parsing (3-5x speedup) - Cache AST trees and file hashes for reuse across features - Invalidate cache when file content changes - Thread-safe cache operations - Add early exit optimization for non-API files (1.5-2x speedup) - Quick regex check before expensive AST parsing - Skip files without API endpoints (models, utilities) - Pre-compiled regex patterns for performance - Add comprehensive tests for optimizations: - Test AST caching prevents redundant parsing - Test early exit skips non-API files - Test cache invalidation on file changes Expected overall improvement: 5-10x speedup for contract extraction For SQLAlchemy (320 features): ~8 minutes -> ~45-90 seconds * fix: disable aggressive early exit that skipped all SQLAlchemy files - Early exit optimization was too aggressive for ORM/class-based codebases - SQLAlchemy doesn't use FastAPI/Flask decorators, so all files were skipped - Contract extractor also processes class-based APIs and interfaces - Disabled early exit to restore functionality (AST caching still provides 3-5x speedup) - Updated test to reflect that early exit detection works but is disabled in extraction Fixes: 0 contracts generated when using --force flag * perf: optimize class-based extraction to skip non-API classes and limit method processing - Skip non-API class types: Protocol, TypedDict, Enum, ABC, Mixin, Base, Meta, Descriptor, Property - Skip classes that inherit from non-API base types - Filter methods more selectively: skip utility methods (processor, adapter, factory, etc.) - Limit methods processed per class to 15 (skip classes with more methods) - Only process methods that strongly suggest API endpoints (CRUD patterns or short names) Performance improvements: - FEATURE-TYPERESOLVE: Skips Protocol/TypedDict classes and TypeEngine utility methods - FEATURE-COLLECTIONADAPTER: Skips non-API classes, processes only relevant methods - Reduces processing time for large ORM/library codebases Expected improvement: 2-3x faster for features with many utility classes * Apply format * fix: resolve CI test failures and deprecation warnings - Fix interface extraction: Check for interfaces (ABC/Protocol with abstract methods) BEFORE skipping base classes - Interfaces should be processed for contract extraction - Non-interface ABC/Protocol classes are still skipped for performance - Fix progress callback tests: Update tests to expect two calls (total, then completed+description) - Progress callback now sets total on first call, then updates with completed count - Fix deprecation warnings: - Suppress ast.NameConstant deprecation warning (Python 3.8+ compatibility) - Replace datetime.utcnow() with datetime.now(UTC) for Python 3.11+ compatibility - Use timezone.utc fallback for older Python versions Fixes: - test_extract_interface_abstract_methods (was skipping ABC interfaces) - test_create_callback_with_prefix (expected single call, got two) - test_create_callback_without_prefix (expected single call, got two) - DeprecationWarning: ast.NameConstant (Python 3.14) - DeprecationWarning: datetime.utcnow() (future removal) * fix: correct UTC import for type checking - Import timezone before try/except block to ensure UTC is defined - Add type: ignore comment to suppress false positive type checker warning - Fixes type checker error: 'UTC' is unbound in except block - Maintains backward compatibility with Python < 3.11 * perf: reduce lock contention in contract extraction for parallel processing Critical performance fix for large feature sets (320+ features): 1. **Moved file I/O outside lock**: File reading and hash calculation now happen outside the lock, eliminating I/O-bound blocking - Lock only held for cache lookups and updates (minimal scope) - Double-check pattern prevents race conditions 2. **Removed unnecessary locks from openapi_spec writes**: - Each feature has its own openapi_spec dict (no sharing) - Python dict assignment is atomic for single operations - Removed locks from: path initialization, schema addition, security schemes, operation addition - Only cache operations (shared across features) use lock now 3. **Separated cache lock**: Renamed _lock to _cache_lock for clarity - Cache is shared resource (needs protection) - openapi_spec dicts are per-feature (no shared lock needed) Performance impact: - Before: Lock held during file I/O (10-100ms per file) blocks all other threads - After: Lock only held for cache access (<1ms), file I/O happens in parallel - Expected: 5-10x faster for 320 features with parallel processing This fixes the 3-hour extraction time for 320 contracts by eliminating lock contention bottleneck. * perf: optimize AST cache to avoid redundant file reads - Reuse file content when checking hash vs parsing - Read file only once per cache check/parse cycle - Reduces I/O operations by 50% for cache hits - Maintains thread safety with minimal lock scope This addresses the performance issue where contract extraction was extremely slow (5+ hours) by eliminating redundant file I/O. * docs: update CHANGELOG for 0.23.1 with contract extraction performance fixes * Fix slowness bug in contract extraction * fix: correct SourceTracking import in profiling script - Changed import from specfact_cli.models.project to specfact_cli.models.source_tracking - Fixes CrossHair import error in contract validation workflow - Resolves ImportError: cannot import name 'SourceTracking' * Fix parallel processing of contract analysis * Revert venv config --------- Co-authored-by: Dominikus Nold <[email protected]>
1 parent 1f88cb7 commit 79d29c4

18 files changed

+2226
-189
lines changed

CHANGELOG.md

Lines changed: 57 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,63 @@ All notable changes to this project will be documented in this file.
99

1010
---
1111

12+
## [0.23.1] - 2026-01-07
13+
14+
### Fixed (0.23.1)
15+
16+
- **Contract Extraction Performance**: Fixed critical performance bottleneck causing extremely slow contract extraction
17+
- **Nested Parallelism Removal**: Eliminated GIL contention from nested ThreadPoolExecutor instances
18+
- Removed file-level parallelism within features (features already processed in parallel at command level)
19+
- Files within each feature now processed sequentially to avoid thread contention
20+
- Performance improvement: contract extraction for large codebases (300+ features) now completes in reasonable time instead of hours
21+
- Resolves issue where CPU usage was low despite long processing times due to GIL contention
22+
- **Cache Invalidation Logic**: Fixed cache update logic to properly detect and handle file changes
23+
- Changed double-check pattern to compare file hashes before updating cache
24+
- Cache now correctly updates when file content changes, not just on cache misses
25+
- Ensures AST cache reflects current file state after modifications
26+
- **Test Robustness**: Enhanced cache invalidation test to handle Path object differences
27+
- Test now handles both `test_file` and `resolved_file` as cache keys
28+
- Path objects are compared by value, ensuring correct cache lookups
29+
- Added assertions to verify cache keys exist before accessing
30+
31+
- **Import Command Bug Fixes**: Fixed critical bugs in enrichment and contract extraction workflow
32+
- **Unhashable Type Error**: Fixed `TypeError: unhashable type: 'Feature'` when applying enrichment reports
33+
- Changed `dict[Feature, list[Path]]` to `dict[str, list[Path]]` using feature keys instead of Feature objects
34+
- Added `feature_objects: dict[str, Feature]` mapping to maintain Feature object references
35+
- Prevents runtime errors during contract extraction when enrichment adds new features
36+
- **Enrichment Performance Regression**: Fixed severe performance issue where enrichment forced full contract regeneration
37+
- Removed `or enrichment` condition from `_check_incremental_changes` that forced full regeneration
38+
- Enrichment now only triggers contract extraction for new features (without contracts)
39+
- Existing contracts are not regenerated when only metadata changes (confidence adjustments, business context)
40+
- Performance improvement: enrichment with unchanged files now completes in seconds instead of 80+ minutes for large bundles
41+
- **Contract Extraction Order**: Fixed contract extraction to run after enrichment application
42+
- Ensures new features from enrichment reports are included in contract extraction
43+
- New features without contracts now correctly get contracts extracted
44+
45+
### Added (0.23.1)
46+
47+
- **Contract Extraction Profiling Tool**: Added diagnostic tool for performance analysis
48+
- New `tools/profile_contract_extraction.py` script for profiling contract extraction bottlenecks
49+
- Helps identify performance issues in contract extraction process
50+
- Provides detailed timing and profiling information for individual features
51+
52+
- **Comprehensive Test Coverage**: Added extensive test suite for import and enrichment bugs
53+
- **Integration Tests**: New `test_import_enrichment_contracts.py` with 5 test cases (552 lines)
54+
- Tests enrichment not forcing full contract regeneration
55+
- Tests new features from enrichment getting contracts extracted
56+
- Tests incremental contract extraction with enrichment
57+
- Tests feature objects not used as dictionary keys
58+
- Tests performance regression prevention
59+
- **Unit Tests**: New `test_import_contract_extraction.py` with 5 test cases (262 lines)
60+
- Tests Feature objects not being hashable (regression test)
61+
- Tests contract extraction using feature keys, not objects
62+
- Tests incremental contract regeneration logic
63+
- Tests enrichment not forcing contract regeneration
64+
- Tests new features from enrichment getting contracts
65+
- **Updated Existing Tests**: Enhanced `test_import_command.py` with enrichment regression test
66+
67+
---
68+
1269
## [0.23.0] - 2026-01-07
1370

1471
### Added (0.23.0)

pyproject.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ build-backend = "hatchling.build"
44

55
[project]
66
name = "specfact-cli"
7-
version = "0.23.0"
7+
version = "0.23.1"
88
description = "Brownfield-first CLI: Reverse engineer legacy Python → specs → enforced contracts. Automate legacy code documentation and prevent modernization regressions."
99
readme = "README.md"
1010
requires-python = ">=3.11"

setup.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@
77
if __name__ == "__main__":
88
_setup = setup(
99
name="specfact-cli",
10-
version="0.23.0",
10+
version="0.23.1",
1111
description="SpecFact CLI - Spec -> Contract -> Sentinel tool for contract-driven development",
1212
packages=find_packages(where="src"),
1313
package_dir={"": "src"},

src/__init__.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,4 +3,4 @@
33
"""
44

55
# Define the package version (kept in sync with pyproject.toml and setup.py)
6-
__version__ = "0.23.0"
6+
__version__ = "0.23.1"

src/specfact_cli/__init__.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,6 @@
99
- Validating reproducibility
1010
"""
1111

12-
__version__ = "0.23.0"
12+
__version__ = "0.23.1"
1313

1414
__all__ = ["__version__"]

src/specfact_cli/analyzers/contract_extractor.py

Lines changed: 9 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -260,8 +260,15 @@ def _ast_to_value_string(self, node: ast.AST) -> str:
260260
return repr(node.value)
261261
if isinstance(node, ast.Name):
262262
return node.id
263-
if isinstance(node, ast.NameConstant): # Python < 3.8
264-
return str(node.value)
263+
# Python < 3.8 compatibility - suppress deprecation warning
264+
import warnings
265+
266+
with warnings.catch_warnings():
267+
warnings.simplefilter("ignore", DeprecationWarning)
268+
# ast.NameConstant is deprecated in Python 3.8+, removed in 3.14
269+
# Keep for backward compatibility with older Python versions
270+
if hasattr(ast, "NameConstant") and isinstance(node, ast.NameConstant):
271+
return str(node.value)
265272

266273
# Use ast.unparse if available
267274
if hasattr(ast, "unparse"):

0 commit comments

Comments
 (0)