Skip to content

Commit 986bc2f

Browse files
GeneAIclaude
authored andcommitted
perf: Implement Phase 2 optimizations - caching, indexing, profiling infrastructure
Implement key optimizations from Phase 2 advanced optimization plan: profiling infrastructure, intelligent caching, and data structure optimizations. TRACK 1: PROFILING INFRASTRUCTURE - scripts/profile_utils.py: Comprehensive profiling utilities - @profile_function decorator with cProfile integration - @time_function for quick timing - @profile_memory for memory profiling - PerformanceMonitor context manager - benchmark_comparison() for A/B testing - benchmarks/profile_suite.py: Profiling test suite - Profiles 5 key areas: scanner, patterns, cost tracker, feedback loops, file I/O - Saves .prof files for snakeviz visualization - Measures baseline performance for future optimizations TRACK 4: INTELLIGENT CACHING (HIGH PRIORITY) File: src/empathy_os/project_index/scanner.py 1. File Hash Caching (@lru_cache, maxsize=1000) - _hash_file(): Cache SHA256 hashes for change detection - Memory: ~64KB (64 bytes × 1000 entries) - Expected hit rate: 80%+ for incremental scans - Benefit: Avoid re-hashing unchanged files 2. AST Parsing Cache (@lru_cache, maxsize=500) - _parse_python_cached(): Cache parsed ASTs with file hash - Memory: ~5MB (10KB per AST × 500 entries) - Expected hit rate: 90%+ for incremental operations - Benefit: Skip expensive ast.parse() for unchanged files - Invalidation: Automatic via file_hash parameter 3. Updated _analyze_code_metrics(): - Uses cached AST parsing instead of ast.parse() directly - Significant speedup for repeated scans - No functional changes - drop-in optimization TRACK 3: DATA STRUCTURE OPTIMIZATION (MEDIUM PRIORITY) File: src/empathy_os/pattern_library.py 1. Index Structures (O(1) lookups): - _patterns_by_type: Dict[pattern_type -> List[pattern_ids]] - _patterns_by_tag: Dict[tag -> List[pattern_ids]] - Reduces query_patterns from O(n) to O(k) where k = matching patterns 2. Optimized query_patterns(): - When pattern_type specified: O(1) index lookup instead of O(n) scan - Expected speedup: 50%+ for type-filtered queries - Backward compatible - same API, better performance 3. New Helper Methods: - get_patterns_by_tag(tag): O(1) tag-based lookup - get_patterns_by_type(type): O(1) type-based lookup - Enable efficient bulk retrieval TESTING: ✅ All scanner tests passing (73 tests) ✅ All pattern tests passing (63 tests) ✅ No regressions detected ✅ Optimizations are transparent to callers PERFORMANCE EXPECTATIONS: Based on Phase 2 plan targets: - Scanner caching: 30-50% faster for incremental scans - Pattern lookup: 50%+ faster for type-filtered queries - AST parsing: 90%+ cache hit rate eliminates redundant parsing INFRASTRUCTURE ADDITIONS: - Profiling utilities ready for identifying bottlenecks - Benchmark comparison tools for validating optimizations - Foundation for Track 2 (generators) and remaining optimizations FILES MODIFIED: 2 core files + 2 new infrastructure files LINES ADDED: ~350 lines of optimization code + profiling tools Next Steps (from Phase 2 plan): - Run profiling suite to identify additional bottlenecks - Convert high-value list comprehensions to generators - Add cache statistics monitoring - Measure actual performance improvements Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
1 parent e136536 commit 986bc2f

File tree

4 files changed

+546
-14
lines changed

4 files changed

+546
-14
lines changed

benchmarks/profile_suite.py

Lines changed: 197 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,197 @@
1+
"""Profiling test suite for identifying bottlenecks in Empathy Framework.
2+
3+
Runs performance profiling on key operations to identify optimization opportunities.
4+
5+
Usage:
6+
python benchmarks/profile_suite.py
7+
8+
Copyright 2025 Smart-AI-Memory
9+
Licensed under Fair Source License 0.9
10+
"""
11+
12+
import sys
13+
from pathlib import Path
14+
15+
# Add src to path
16+
sys.path.insert(0, str(Path(__file__).parent.parent / "src"))
17+
18+
from scripts.profile_utils import profile_function, time_function
19+
20+
21+
@profile_function(output_file="benchmarks/profiles/scanner_scan.prof")
22+
@time_function
23+
def profile_scanner():
24+
"""Profile project scanner on real codebase."""
25+
from empathy_os.project_index import ProjectIndex
26+
27+
print("\n" + "=" * 60)
28+
print("Profiling: Project Scanner")
29+
print("=" * 60)
30+
31+
index = ProjectIndex(project_root=".")
32+
records, summary = index.scan()
33+
34+
print(f"✓ Scanned {summary.total_files} files")
35+
print(f"✓ Lines of code: {summary.total_lines_of_code:,}")
36+
print(f"✓ Test files: {summary.test_file_count}")
37+
38+
39+
@profile_function(output_file="benchmarks/profiles/pattern_library.prof")
40+
@time_function
41+
def profile_pattern_library():
42+
"""Profile pattern library operations."""
43+
from empathy_os.pattern_library import PatternLibrary, Pattern
44+
45+
print("\n" + "=" * 60)
46+
print("Profiling: Pattern Library")
47+
print("=" * 60)
48+
49+
library = PatternLibrary()
50+
51+
# Create some test patterns
52+
for i in range(100):
53+
pattern = Pattern(
54+
name=f"test_pattern_{i}",
55+
description=f"Test pattern {i}",
56+
trigger=f"trigger_{i}",
57+
response=f"response_{i}",
58+
tags=[f"tag_{i % 10}"],
59+
confidence=0.5 + (i % 50) / 100,
60+
)
61+
library.add_pattern(pattern)
62+
63+
# Simulate pattern matching
64+
match_count = 0
65+
for i in range(1000):
66+
context = {"query": f"test query {i}", "history": [f"item {j}" for j in range(10)]}
67+
matches = library.match(context)
68+
match_count += len(matches)
69+
70+
print(f"✓ Created 100 patterns")
71+
print(f"✓ Performed 1000 pattern matches")
72+
print(f"✓ Total matches: {match_count}")
73+
74+
75+
@profile_function(output_file="benchmarks/profiles/cost_tracker.prof")
76+
@time_function
77+
def profile_cost_tracker():
78+
"""Profile cost tracking operations."""
79+
from empathy_os.cost_tracker import CostTracker
80+
81+
print("\n" + "=" * 60)
82+
print("Profiling: Cost Tracker")
83+
print("=" * 60)
84+
85+
tracker = CostTracker()
86+
87+
# Simulate logging 1000 requests
88+
for i in range(1000):
89+
tracker.log_request(
90+
model=f"claude-3-{'haiku' if i % 3 == 0 else 'sonnet'}",
91+
input_tokens=100 + i % 100,
92+
output_tokens=50 + i % 50,
93+
task_type=f"task_{i % 5}",
94+
)
95+
96+
summary = tracker.get_summary(days=7)
97+
report = tracker.get_report(days=7)
98+
99+
print(f"✓ Logged 1000 requests")
100+
print(f"✓ Total cost: ${summary['total_cost']:.4f}")
101+
print(f"✓ Total tokens: {summary['total_tokens']:,}")
102+
103+
104+
@profile_function(output_file="benchmarks/profiles/feedback_loops.prof")
105+
@time_function
106+
def profile_feedback_loops():
107+
"""Profile feedback loop detection."""
108+
from empathy_os.feedback_loops import FeedbackLoopDetector
109+
110+
print("\n" + "=" * 60)
111+
print("Profiling: Feedback Loop Detector")
112+
print("=" * 60)
113+
114+
detector = FeedbackLoopDetector()
115+
116+
# Simulate session history
117+
for i in range(500):
118+
session_data = {
119+
"trust": 0.5 + (i % 50) / 100,
120+
"success_rate": 0.6 + (i % 40) / 100,
121+
"patterns_used": i % 10,
122+
"user_satisfaction": 0.7 + (i % 30) / 100,
123+
}
124+
detector.update_metrics(session_data)
125+
126+
# Detect loops
127+
virtuous = detector.detect_virtuous_cycle(metric="trust")
128+
vicious = detector.detect_vicious_cycle(metric="success_rate")
129+
active_loops = detector.detect_active_loop()
130+
131+
print(f"✓ Processed 500 session updates")
132+
print(f"✓ Virtuous cycles: {len([v for v in [virtuous] if v])}")
133+
print(f"✓ Vicious cycles: {len([v for v in [vicious] if v])}")
134+
print(f"✓ Active loops: {len(active_loops)}")
135+
136+
137+
@time_function
138+
def profile_file_operations():
139+
"""Profile file I/O operations."""
140+
from pathlib import Path
141+
142+
print("\n" + "=" * 60)
143+
print("Profiling: File Operations")
144+
print("=" * 60)
145+
146+
# Test glob operations
147+
py_files = list(Path("src").rglob("*.py"))
148+
print(f"✓ Found {len(py_files)} Python files")
149+
150+
# Test file reading (sample)
151+
sample_files = py_files[:10]
152+
total_lines = 0
153+
for file in sample_files:
154+
try:
155+
lines = len(file.read_text().splitlines())
156+
total_lines += lines
157+
except Exception:
158+
pass
159+
160+
print(f"✓ Read {len(sample_files)} sample files")
161+
print(f"✓ Total lines in sample: {total_lines:,}")
162+
163+
164+
if __name__ == "__main__":
165+
import os
166+
167+
os.makedirs("benchmarks/profiles", exist_ok=True)
168+
169+
print("\n" + "=" * 60)
170+
print("PROFILING SUITE - Empathy Framework")
171+
print("Phase 2 Performance Optimization")
172+
print("=" * 60)
173+
174+
try:
175+
# Run profiling on key areas
176+
profile_scanner()
177+
profile_pattern_library()
178+
profile_cost_tracker()
179+
profile_feedback_loops()
180+
profile_file_operations()
181+
182+
print("\n" + "=" * 60)
183+
print("PROFILING COMPLETE")
184+
print("=" * 60)
185+
print("\nProfile files saved to benchmarks/profiles/")
186+
print("\nVisualize with snakeviz:")
187+
print(" snakeviz benchmarks/profiles/scanner_scan.prof")
188+
print(" snakeviz benchmarks/profiles/pattern_library.prof")
189+
print(" snakeviz benchmarks/profiles/cost_tracker.prof")
190+
print(" snakeviz benchmarks/profiles/feedback_loops.prof")
191+
192+
except Exception as e:
193+
print(f"\n❌ Error during profiling: {e}")
194+
import traceback
195+
196+
traceback.print_exc()
197+
sys.exit(1)

0 commit comments

Comments
 (0)