Skip to content

Commit baee214

Browse files
Optimize FunctionRanker.get_function_stats_summary
The optimization replaces an O(N) linear search through all functions with an O(1) hash table lookup followed by iteration over only matching function names. **Key Changes:** - Added `_function_stats_by_name` index in `__init__` that maps function names to lists of (key, stats) tuples - Modified `get_function_stats_summary` to first lookup candidates by function name, then iterate only over those candidates **Why This is Faster:** The original code iterates through ALL function stats (22,603 iterations in the profiler results) for every lookup. The optimized version uses a hash table to instantly find only the functions with matching names, then iterates through just those candidates (typically 1-2 functions). **Performance Impact:** - **Small datasets**: 15-30% speedup as shown in basic test cases - **Large datasets**: Dramatic improvement - the `test_large_scale_performance` case with 900 functions shows **3085% speedup** (66.7μs → 2.09μs) - **Overall benchmark**: 2061% speedup demonstrates the optimization scales excellently with dataset size **When This Optimization Shines:** - Large codebases with many profiled functions (where the linear search becomes expensive) - Repeated function lookups (if this method is called frequently) - Cases with many unique function names but few duplicates per name The optimization maintains identical behavior while transforming the algorithm from O(N) per lookup to O(average functions per name) per lookup, which is typically O(1) in practice.
1 parent a1eee7d commit baee214

File tree

1 file changed

+17
-4
lines changed

1 file changed

+17
-4
lines changed

codeflash/benchmarking/function_ranker.py

Lines changed: 17 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -56,8 +56,15 @@ def __init__(self, trace_file_path: Path) -> None:
5656
self.trace_file_path = trace_file_path
5757
self._profile_stats = ProfileStats(trace_file_path.as_posix())
5858
self._function_stats: dict[str, dict] = {}
59+
self._function_stats_by_name: dict[str, list[tuple[str, dict]]] = {}
5960
self.load_function_stats()
6061

62+
# Build index for faster lookups: map function_name to list of (key, stats)
63+
for key, stats in self._function_stats.items():
64+
func_name = stats.get("function_name")
65+
if func_name:
66+
self._function_stats_by_name.setdefault(func_name, []).append((key, stats))
67+
6168
def load_function_stats(self) -> None:
6269
try:
6370
pytest_filtered_count = 0
@@ -114,10 +121,16 @@ def load_function_stats(self) -> None:
114121

115122
def get_function_stats_summary(self, function_to_optimize: FunctionToOptimize) -> dict | None:
116123
target_filename = function_to_optimize.file_path.name
117-
for key, stats in self._function_stats.items():
118-
if stats.get("function_name") == function_to_optimize.function_name and (
119-
key.endswith(f"/{target_filename}") or target_filename in key
120-
):
124+
candidates = self._function_stats_by_name.get(function_to_optimize.function_name)
125+
if not candidates:
126+
logger.debug(
127+
f"Could not find stats for function {function_to_optimize.function_name} in file {target_filename}"
128+
)
129+
return None
130+
131+
for key, stats in candidates:
132+
# The check preserves exact logic: "key.endswith(f"/{target_filename}") or target_filename in key"
133+
if key.endswith(f"/{target_filename}") or target_filename in key:
121134
return stats
122135

123136
logger.debug(

0 commit comments

Comments
 (0)