Granular async instrumentation #687

KRRT7 · 2025-08-25T22:54:19Z

User description

dependent on #678

PR Type

Enhancement, Tests, Bug fix

Description

Add async function instrumentation and throughput
Update optimizer for async benchmarking
Extend AST utilities to handle async defs
Add comprehensive async test suites

Diagram Walkthrough

flowchart LR
  A["Async code detection"] -- "decorate defs" --> B["Instrument source module"]
  B -- "run tests" --> C["Parse results"]
  C -- "calc runtime/throughput" --> D["Optimizer decisions"]
  D -- "report gains" --> E["Explanations/critics"]

File Walkthrough

Relevant files

Tests

4 files

test_async_run_and_parse_tests.py `End-to-end async run_and_parse_tests scenarios`	+1039/-0
test_instrument_async_tests.py `Async decorator injection and test instrumentation`	+728/-0
test_unused_helper_revert.py `Unused helper revert with async scenarios`	+614/-5
test_async_wrapper_sqlite_validation.py `Validate async wrapper SQLite capture and perf`	+285/-0

Enhancement

2 files

function_optimizer.py `Async-aware baseline, benchmarking, and throughput`	+180/-24
edit_generated_tests.py `Support AsyncFunctionDef and async test pruning`	+15/-5

Additional files

32 files

e2e-async.yaml	+69/-0
pre-commit.yaml	+0/-19
.pre-commit-config.yaml	+2/-1
async_bubble_sort.py	+43/-0
main.py	+16/-0
pyproject.toml	+6/-0
__init__.py	[link]
codeflash.code-workspace	+5/-1
aiservice.py	+14/-0
code_extractor.py	+86/-3
codeflash_wrap_decorator.py	+167/-0
config_consts.py	+1/-0
coverage_utils.py	+3/-1
instrument_existing_tests.py	+370/-2
unused_definition_remover.py	+4/-1
functions_to_optimize.py	+5/-1
models.py	+4/-0
critic.py	+51/-8
explanation.py	+48/-1
comparator.py	+28/-7
concolic_testing.py	+2/-1
equivalence.py	+22/-0
parse_test_output.py	+24/-0
pytest_plugin.py	+22/-0
test_runner.py	+1/-1
pyproject.toml	+1/-0
end_to_end_test_async.py	+27/-0
test_add_runtime_comments.py	+207/-1
test_code_context_extractor.py	+146/-1
test_code_replacement.py	+172/-0
test_code_utils.py	+81/-1
test_critic.py	+163/-1

github-actions · 2025-08-25T22:56:58Z

PR Code Suggestions ✨

Latest suggestions up to b221be4

Explore these optional code suggestions:

Category	Suggestion	Impact
Possible issue	Snapshot and restore pre-instrumented source Preserve and restore the exact pre-benchmark source for the async instrumentation step; using the original baseline code here can discard concurrent candidate edits or prior changes. Snapshot the file immediately before instrumentation and restore from that snapshot. codeflash/optimization/function_optimizer.py [1559-1588] if test_framework == "pytest": line_profile_results = self.line_profiler_step( code_context=code_context, original_helper_code=original_helper_code, candidate_index=0 ) console.rule() + pre_perf_source_snapshot = None if self.function_to_optimize.is_async: - from codeflash.code_utils.instrument_existing_tests import ( - instrument_source_module_with_async_decorators, - ) - + from codeflash.code_utils.instrument_existing_tests import instrument_source_module_with_async_decorators + # Snapshot current on-disk code prior to perf instrumentation + pre_perf_source_snapshot = Path(self.function_to_optimize.file_path).read_text(encoding="utf8") success, instrumented_source = instrument_source_module_with_async_decorators( self.function_to_optimize.file_path, self.function_to_optimize, TestingMode.PERFORMANCE ) if success and instrumented_source: with self.function_to_optimize.file_path.open("w", encoding="utf8") as f: f.write(instrumented_source) - logger.debug( - f"Applied async performance instrumentation to {self.function_to_optimize.file_path}" - ) + logger.debug(f"Applied async performance instrumentation to {self.function_to_optimize.file_path}") try: benchmarking_results, _ = self.run_and_parse_tests( testing_type=TestingMode.PERFORMANCE, ... ) finally: - if self.function_to_optimize.is_async: - self.write_code_and_helpers( - self.function_to_optimize_source_code, - original_helper_code, - self.function_to_optimize.file_path, - ) + if self.function_to_optimize.is_async and pre_perf_source_snapshot is not None: + with self.function_to_optimize.file_path.open("w", encoding="utf8") as f: + f.write(pre_perf_source_snapshot) Suggestion importance[1-10]: 8 __ Why: The proposal to snapshot the source immediately before async performance instrumentation and restore from that snapshot matches the PR code path and prevents losing intermediate changes; it's a high-impact correctness improvement for async benchmarking flows.	Medium
	Build qualified decorator correctly The timeout decorator is being appended as a Call to a Name with dotted path, which is invalid for qualified names and will not resolve. Construct the decorator using an Attribute chain so `timeout_decorator.timeout` binds correctly. codeflash/code_utils/instrument_existing_tests.py [318-336] def visit_ClassDef(self, node: ast.ClassDef) -> ast.ClassDef: # Add timeout decorator for unittest test classes if needed if self.test_framework == "unittest": timeout_decorator = ast.Call( - func=ast.Name(id="timeout_decorator.timeout", ctx=ast.Load()), + func=ast.Attribute( + value=ast.Name(id="timeout_decorator", ctx=ast.Load()), + attr="timeout", + ctx=ast.Load(), + ), args=[ast.Constant(value=15)], keywords=[], ) for item in node.body: if ( isinstance(item, ast.FunctionDef) and item.name.startswith("test_") and not any( isinstance(d, ast.Call) - and isinstance(d.func, ast.Name) - and d.func.id == "timeout_decorator.timeout" + and isinstance(d.func, ast.Attribute) + and isinstance(d.func.value, ast.Name) + and d.func.value.id == "timeout_decorator" + and d.func.attr == "timeout" for d in item.decorator_list ) ): item.decorator_list.append(timeout_decorator) return self.generic_visit(node) Suggestion importance[1-10]: 8 __ Why: Constructing a call with `ast.Name(id="timeout_decorator.timeout")` is invalid; using an `ast.Attribute` ensures the decorator resolves properly, fixing a real bug in AST generation for unittest timeouts.	Medium
	Fix missing dependency import Importing `env_utils` is missing in this file, causing a NameError at runtime. Add the appropriate import to ensure CI detection works. Keep logic unchanged. codeflash/result/critic.py [65-67] +from codeflash.utils import env_utils + noise_floor = 3 * MIN_IMPROVEMENT_THRESHOLD if original_code_runtime < 10000 else MIN_IMPROVEMENT_THRESHOLD if not disable_gh_action_noise and env_utils.is_ci(): noise_floor = noise_floor * 2 # Increase the noise floor in GitHub Actions mode Suggestion importance[1-10]: 7 __ Why: The code uses `env_utils.is_ci()` but this module isn't imported in the shown diff; adding `from codeflash.utils import env_utils` prevents a NameError and preserves existing behavior.	Medium
	Avoid unconditional source overwrite Ensure the source is restored only if it was mutated during this method; otherwise this may clobber concurrent or external edits. Gate the restoration behind a flag tracking whether instrumentation/rewrites were applied. codeflash/optimization/function_optimizer.py [430-436] if not best_optimization: - self.write_code_and_helpers( - self.function_to_optimize_source_code, original_helper_code, self.function_to_optimize.file_path - ) + # Restore original code only if we modified files during this run + if getattr(self, "did_modify_source", False): + self.write_code_and_helpers( + self.function_to_optimize_source_code, original_helper_code, self.function_to_optimize.file_path + ) return Failure(f"No best optimizations found for function {self.function_to_optimize.qualified_name}") Suggestion importance[1-10]: 6 __ Why: The suggestion correctly targets the newly added unconditional restore when `best_optimization` is falsy and proposes guarding it, which can prevent unnecessary or unsafe overwrites; however, it assumes the presence of a tracking flag not shown in the PR and is an improvement rather than a bug fix.	Low
General	Guard restoration with modification flag Only revert files in finally if they were actually modified; otherwise you risk unnecessary disk churn and potential race overwrites. Track modification and wrap each instrumentation phase with its own guarded revert. codeflash/optimization/function_optimizer.py [1426-1459] +did_modify = False if self.function_to_optimize.is_async: - from codeflash.code_utils.instrument_existing_tests import ( - instrument_source_module_with_async_decorators, - ) - + from codeflash.code_utils.instrument_existing_tests import instrument_source_module_with_async_decorators success, instrumented_source = instrument_source_module_with_async_decorators( self.function_to_optimize.file_path, self.function_to_optimize, TestingMode.BEHAVIOR ) if success and instrumented_source: with self.function_to_optimize.file_path.open("w", encoding="utf8") as f: f.write(instrumented_source) + did_modify = True logger.debug(f"Applied async instrumentation to {self.function_to_optimize.file_path}") -# Instrument codeflash capture try: instrument_codeflash_capture( ... ) finally: - self.write_code_and_helpers( - self.function_to_optimize_source_code, original_helper_code, self.function_to_optimize.file_path - ) + if did_modify: + self.write_code_and_helpers( + self.function_to_optimize_source_code, original_helper_code, self.function_to_optimize.file_path + ) Suggestion importance[1-10]: 7 __ Why: This accurately addresses the async instrumentation/restore block in `establish_original_code_baseline`, reducing unnecessary restores and potential race overwrites by tracking modifications; it's context-aware and improves robustness, though it introduces a new flag not present in the PR.	Medium
General	Use generic_visit for traversal Manually iterating `node.body` and then visiting may skip nested functions deeper than one level and diverge from NodeTransformer traversal semantics. Delegate to `generic_visit` to ensure consistent traversal of all nested nodes. codeflash/code_utils/edit_generated_tests.py [33-40] -for inner_node in node.body: - if isinstance(inner_node, ast.FunctionDef): - self.visit_FunctionDef(inner_node) - elif isinstance(inner_node, ast.AsyncFunctionDef): - self.visit_AsyncFunctionDef(inner_node) +def visit_ClassDef(self, node: ast.ClassDef) -> ast.ClassDef: + self.context_stack.append(node.name) + node = self.generic_visit(node) + self.context_stack.pop() + return node Suggestion importance[1-10]: 5 __ Why: Switching to `generic_visit` ensures deeper nested functions are traversed, improving robustness; however, the current manual visit may be intentional to maintain class context and already handles async methods at one level, so impact is moderate.	Low

Previous suggestions

Suggestions up to commit 91b8902

Category	Suggestion	Impact
Possible issue	Correct star import detection The `child.names` attribute is a sequence, not a single node, so `isinstance(child.names, cst.ImportStar)` will never be true. Instead, check each alias in the list and skip the entire import-from if any alias is a star import. codeflash/code_utils/code_extractor.py [275-276] -if isinstance(child.names, cst.ImportStar): +if any(isinstance(alias, cst.ImportStar) for alias in child.names): continue Suggestion importance[1-10]: 8 __ Why: The `isinstance(child.names, cst.ImportStar)` check always fails because `child.names` is a list, so using `any(isinstance(alias, cst.ImportStar) ...)` correctly detects star imports and skips them.	Medium

Suggestions up to commit c9aaaad

Category	Suggestion	Impact
Possible issue	Include full async execution time Remove the second timestamp reset so that the measured duration includes both synchronous and asynchronous execution time. Always start the timer once before calling the function. This ensures the reported duration accurately reflects total runtime. codeflash/code_utils/codeflash_wrap_decorator.py [62-71] counter = time.perf_counter_ns() ret = func(args, *kwargs) if inspect.isawaitable(ret): - counter = time.perf_counter_ns() return_value = await ret else: return_value = ret codeflash_duration = time.perf_counter_ns() - counter Suggestion importance[1-10]: 8 __ Why: The extra `counter = time.perf_counter_ns()` inside the await branch discards the time spent before awaiting, so removing it yields an accurate total runtime measurement.	Medium

Suggestions up to commit 14af1a8

Category	Suggestion	Impact
Possible issue	Support dict parents in qualified_name Update the property to handle both `FunctionParent` objects and dicts in `self.parents`, avoiding attribute errors when parents are passed as mappings. Build a list of names using a conditional that checks for `name` attribute first, then dict key. codeflash/discovery/functions_to_optimize.py [162-166] @property def qualified_name(self) -> str: if not self.parents: return self.function_name - # Join all parent names with dots to handle nested classes properly - parent_path = ".".join(parent.name for parent in self.parents) + parent_names = [ + p.name if hasattr(p, "name") else p["name"] + for p in self.parents + ] + parent_path = ".".join(parent_names) return f"{parent_path}.{self.function_name}" Suggestion importance[1-10]: 9 __ Why: This prevents attribute errors when `self.parents` contains dicts as in tests, ensuring `qualified_name` works in all cases.	High
General	Measure entire async call duration Remove the second reset of `counter` before `await` so that `codeflash_duration` covers the full execution time of the async function, including any synchronous setup. codeflash/code_utils/codeflash_wrap_decorator.py [63-72] counter = time.perf_counter_ns() ret = func(args, *kwargs) if inspect.isawaitable(ret): - counter = time.perf_counter_ns() return_value = await ret else: return_value = ret codeflash_duration = time.perf_counter_ns() - counter Suggestion importance[1-10]: 6 __ Why: Removing the reset of `counter` ensures the duration includes both synchronous setup and asynchronous execution, improving timing accuracy.	Low
General	Log instrumentation errors Log exceptions in the `except` block so failures to instrument async decorators are surfaced during debugging instead of silently ignored. codeflash/code_utils/instrument_existing_tests.py [325-345] def instrument_source_module_with_async_decorators( ... ) -> tuple[bool, str \| None]: ... try: ... except Exception as e: + logger.exception( + f"Failed to instrument async decorator for " + f"{function_to_optimize.qualified_name} in {source_path}: {e}" + ) return False, None Suggestion importance[1-10]: 5 __ Why: Adding `logger.exception` surfaces internal failures during async decorator instrumentation, aiding debugging without altering control flow.	Low

codeflash/code_utils/codeflash_wrap_decorator.py

misrasaurabh1 · 2025-08-26T22:43:07Z

add an e2e test for this

codeflash/code_utils/codeflash_wrap_decorator.py

tests/test_instrument_async_tests.py

misrasaurabh1 · 2025-08-26T23:07:40Z

Add tests in the style of

codeflash/tests/test_instrument_tests.py

Line 289 in 674e69e

def test_perfinjector_bubble_sort_results() -> None:

. This gives us confidence in that the tests run and provide the correct return values

misrasaurabh1 · 2025-08-26T23:10:07Z

codeflash/code_utils/instrument_existing_tests.py

+            import_transformer = AsyncDecoratorImportAdder(mode)
+            module = module.visit(import_transformer)
+
+        return isort.code(module.code, float_to_top=True), decorator_transformer.added_decorator


why are isorting user's code?

codeflash/optimization/function_optimizer.py

apply testgen-async fix bug when iterating over star imports fix cst * import errors

…687 (`granular-async-instrumentation`) The optimization replaces expensive `Path` object creation and method calls with direct string manipulation operations, delivering a **491% speedup**. **Key optimizations:** 1. **Eliminated Path object overhead**: Replaced `Path(filename).stem.startswith("test_")` with `filename.rpartition('/')[-1].rpartition('\\')[-1].rpartition('.')[0].startswith("test_")` - avoiding Path instantiation entirely. 2. **Optimized path parts extraction**: Replaced `Path(filename).parts` with `filename.replace('\\', '/').split('/')` - using simple string operations instead of Path parsing. **Performance impact analysis:** - Original profiler shows lines 25-26 (Path operations) consumed **86.3%** of total runtime (44.7% + 41.6%) - Optimized version reduces these same operations to just **25.4%** of runtime (15% + 10.4%) - The string manipulation operations are ~6x faster per call than Path object creation **Test case benefits:** - **Large-scale tests** see the biggest gains (516% faster for 900-frame stack, 505% faster for 950-frame chain) because the Path overhead multiplies with stack depth - **Edge cases** with complex paths benefit significantly (182-206% faster for subdirectory and pytest frame tests) - **Basic tests** show minimal overhead since Path operations weren't the bottleneck in shallow stacks The optimization maintains identical behavior while eliminating the most expensive operations identified in the profiling data - Path object instantiation and method calls that occurred once per stack frame.

add End to end test for async optimization

codeflash/code_utils/codeflash_wrap_decorator.py

Get throughput from output for async functions

codeflash/verification/equivalence.py

misrasaurabh1 · 2025-09-24T17:55:17Z

codeflash/verification/parse_test_output.py

 matches_re_end = re.compile(r"!######(.*?):(.*?)([^\.:]*?):(.*?):(.*?):(.*?)######!")


+start_pattern = re.compile(r"!\$######([^:]*):([^:]*):([^:]*):([^:]*):([^:]+)######\$!")


why is this pattern different from the one above? i expect the regexes to be the same

misrasaurabh1 · 2025-09-24T18:03:33Z

tests/test_async_run_and_parse_tests.py

+
+        results_list = test_results.test_results
+        async_calls = [r for r in results_list if r.id.function_getting_tested == "async_merge_sort"]
+        assert len(async_calls) >= 1


this test needs to be improved. See all the properties we are testing in the sync version of this, and test those as well.
Things like - the return values, the test id.

This is important to give us confidence that these critical parameters are correct and never broken in the future.

tests/test_async_run_and_parse_tests.py

codeflash/code_utils/instrument_existing_tests.py

misrasaurabh1 · 2025-09-24T18:05:51Z

tests/test_async_run_and_parse_tests.py

+        assert test_results.test_results is not None
+        assert len(test_results.test_results) >= 2
+
+        results_list = test_results.test_results


test for test_ids and return values exactly

misrasaurabh1 · 2025-09-24T18:06:07Z

tests/test_async_run_and_parse_tests.py

+
+        assert test_results is not None
+        assert test_results.test_results is not None
+        assert len(test_results.test_results) >= 2


use equal operator in the test. make it precise

tests/test_async_run_and_parse_tests.py

misrasaurabh1 · 2025-09-24T18:08:15Z

tests/test_add_runtime_comments.py

        # Check that comments were added
        modified_source = result.generated_tests[0].generated_original_test_source
-        assert modified_source == expected
+        assert modified_source == expected


eventually we should also add the throughput annotations when available

codeflash/code_utils/instrument_existing_tests.py

codeflash-ai · 2025-09-24T19:11:32Z

codeflash/result/critic.py

+    # Runtime performance evaluation
    noise_floor = 3 * MIN_IMPROVEMENT_THRESHOLD if original_code_runtime < 10000 else MIN_IMPROVEMENT_THRESHOLD
    if not disable_gh_action_noise and env_utils.is_ci():
        noise_floor = noise_floor * 2  # Increase the noise floor in GitHub Actions mode

    perf_gain = performance_gain(
        original_runtime_ns=original_code_runtime, optimized_runtime_ns=candidate_result.best_test_runtime
    )
-    if best_runtime_until_now is None:
-        # collect all optimizations with this
-        return bool(perf_gain > noise_floor)
-    return bool(perf_gain > noise_floor and candidate_result.best_test_runtime < best_runtime_until_now)
+    runtime_improved = perf_gain > noise_floor
+
+    # Check runtime comparison with best so far
+    runtime_is_best = best_runtime_until_now is None or candidate_result.best_test_runtime < best_runtime_until_now
+
+    throughput_improved = True  # Default to True if no throughput data
+    throughput_is_best = True  # Default to True if no throughput data
+
+    if original_async_throughput is not None and candidate_result.async_throughput is not None:
+        if original_async_throughput > 0:
+            throughput_gain_value = throughput_gain(
+                original_throughput=original_async_throughput, optimized_throughput=candidate_result.async_throughput
+            )
+            throughput_improved = throughput_gain_value > MIN_THROUGHPUT_IMPROVEMENT_THRESHOLD
+
+        throughput_is_best = (
+            best_throughput_until_now is None or candidate_result.async_throughput > best_throughput_until_now
+        )
+
+    if original_async_throughput is not None and candidate_result.async_throughput is not None:
+        # When throughput data is available, accept if EITHER throughput OR runtime improves significantly
+        throughput_acceptance = throughput_improved and throughput_is_best
+        runtime_acceptance = runtime_improved and runtime_is_best
+        return throughput_acceptance or runtime_acceptance


⚡️Codeflash found 16% (0.16x) speedup for speedup_critic in codeflash/result/critic.py

⏱️ Runtime : 3.15 milliseconds → 2.71 milliseconds (best of 98 runs)

📝 Explanation and details

The optimized code achieves a 16% speedup by eliminating function call overhead and streamlining conditional logic in the performance-critical speedup_critic function.

Key optimizations:

Inlined performance calculation: Instead of calling performance_gain(), the performance gain is calculated directly inline as (original_code_runtime - candidate_result.best_test_runtime) / candidate_result.best_test_runtime. This eliminates function call overhead which was consuming 35.2% of the original execution time according to the line profiler.

Inlined throughput calculation: Similarly, the throughput gain calculation is moved inline as (candidate_result.async_throughput - original_async_throughput) / original_async_throughput, removing another function call.

Streamlined conditional structure: The throughput evaluation logic is reorganized to eliminate redundant variable assignments and combine the final decision logic more efficiently. The original code had separate variables for throughput_acceptance and runtime_acceptance, while the optimized version directly returns the combined condition.

Reduced variable assignments: Eliminated unnecessary intermediate variables like throughput_improved = True defaults, handling the logic more directly within the conditional branches.

The line profiler shows the original performance_gain and throughput_gain function calls took significant time (25.8ms and 9.9ms respectively out of 73.3ms total). By inlining these simple calculations, the optimized version reduces total execution time to 31.0ms.

These optimizations are particularly effective for high-volume scenarios where speedup_critic is called frequently, as evidenced by the large-scale test cases showing consistent 13-20% improvements when processing hundreds or thousands of candidates.

✅ Correctness verification report:

Test Status

⚙️ Existing Unit Tests ✅ 14 Passed

🌀 Generated Regression Tests ✅ 5050 Passed

⏪ Replay Tests 🔘 None Found

🔎 Concolic Coverage Tests 🔘 None Found

📊 Tests Coverage 100.0%

⚙️ Existing Unit Tests and Runtime

Test File::Test Function Original ⏱️ Optimized ⏱️ Speedup

test_critic.py::test_speedup_critic 4.67μs 4.26μs 9.65%✅

test_critic.py::test_speedup_critic_with_async_throughput 9.48μs 8.40μs 12.7%✅

🌀 Generated Regression Tests and Runtime

from __future__ import annotations import os from dataclasses import dataclass from functools import lru_cache # imports import pytest # used for our unit tests from codeflash.result.critic import speedup_critic # Simulate codeflash.code_utils.config_consts MIN_IMPROVEMENT_THRESHOLD = 0.01 # 1% MIN_THROUGHPUT_IMPROVEMENT_THRESHOLD = 0.01 # 1% # Simulate codeflash.models.models.OptimizedCandidateResult @dataclass class OptimizedCandidateResult: best_test_runtime: int async_throughput: int | None = None from codeflash.result.critic import speedup_critic # ---- BASIC TEST CASES ---- def test_basic_runtime_improvement_above_threshold(): # Test: optimized code is 10% faster, above 1% threshold orig = 100_000 # ns opt = 90_000 # ns (10% faster) cand = OptimizedCandidateResult(best_test_runtime=opt) codeflash_output = speedup_critic(cand, orig, None) # 2.13μs -> 2.06μs (3.39% faster) def test_basic_runtime_improvement_below_threshold(): # Test: optimized code is 0.5% faster, below 1% threshold orig = 100_000 opt = 99_500 # 0.5% faster cand = OptimizedCandidateResult(best_test_runtime=opt) codeflash_output = speedup_critic(cand, orig, None) # 2.09μs -> 1.88μs (11.2% faster) def test_basic_runtime_no_improvement(): # Test: optimized code is slower orig = 100_000 opt = 110_000 cand = OptimizedCandidateResult(best_test_runtime=opt) codeflash_output = speedup_critic(cand, orig, None) # 1.98μs -> 1.85μs (7.07% faster) def test_basic_runtime_improvement_and_best_so_far(): # Test: improvement and better than previous best orig = 100_000 opt = 90_000 prev_best = 91_000 cand = OptimizedCandidateResult(best_test_runtime=opt) codeflash_output = speedup_critic(cand, orig, prev_best) # 2.11μs -> 1.98μs (6.50% faster) def test_basic_runtime_improvement_not_best_so_far(): # Test: improvement but not better than previous best orig = 100_000 opt = 92_000 prev_best = 90_000 cand = OptimizedCandidateResult(best_test_runtime=opt) codeflash_output = speedup_critic(cand, orig, prev_best) # 2.03μs -> 1.91μs (6.33% faster) def test_basic_throughput_improvement_above_threshold(): # Test: throughput improvement above threshold, no runtime improvement orig = 100_000 opt = 100_000 orig_through = 1000 opt_through = 1100 # 10% improvement cand = OptimizedCandidateResult(best_test_runtime=opt, async_throughput=opt_through) codeflash_output = speedup_critic(cand, orig, None, original_async_throughput=orig_through) # 3.06μs -> 2.65μs (15.1% faster) def test_basic_throughput_improvement_below_threshold(): # Test: throughput improvement below threshold, no runtime improvement orig = 100_000 opt = 100_000 orig_through = 1000 opt_through = 1005 # 0.5% improvement cand = OptimizedCandidateResult(best_test_runtime=opt, async_throughput=opt_through) codeflash_output = speedup_critic(cand, orig, None, original_async_throughput=orig_through) # 2.71μs -> 2.37μs (14.4% faster) def test_basic_throughput_and_runtime_improvement_either_suffices(): # Test: throughput improved, runtime not; should accept orig = 100_000 opt = 100_000 orig_through = 1000 opt_through = 1100 cand = OptimizedCandidateResult(best_test_runtime=opt, async_throughput=opt_through) codeflash_output = speedup_critic(cand, orig, None, original_async_throughput=orig_through) # 2.65μs -> 2.35μs (12.3% faster) # Test: runtime improved, throughput not; should accept orig = 100_000 opt = 90_000 orig_through = 1000 opt_through = 1000 cand = OptimizedCandidateResult(best_test_runtime=opt, async_throughput=opt_through) codeflash_output = speedup_critic(cand, orig, None, original_async_throughput=orig_through) # 1.63μs -> 1.38μs (18.2% faster) def test_basic_throughput_best_so_far(): # Test: throughput improved but not best so far orig = 100_000 opt = 100_000 orig_through = 1000 opt_through = 1100 prev_best_through = 1200 cand = OptimizedCandidateResult(best_test_runtime=opt, async_throughput=opt_through) codeflash_output = speedup_critic( cand, orig, None, original_async_throughput=orig_through, best_throughput_until_now=prev_best_through ) # 2.90μs -> 2.50μs (16.1% faster) def test_basic_runtime_and_throughput_best_so_far(): # Test: runtime and throughput both improved and both best so far orig = 100_000 opt = 90_000 orig_through = 1000 opt_through = 1100 prev_best = 91_000 prev_best_through = 1050 cand = OptimizedCandidateResult(best_test_runtime=opt, async_throughput=opt_through) codeflash_output = speedup_critic( cand, orig, prev_best, original_async_throughput=orig_through, best_throughput_until_now=prev_best_through ) # 3.01μs -> 2.62μs (15.0% faster) # ---- EDGE TEST CASES ---- def test_edge_runtime_exactly_at_threshold(): # Test: improvement exactly at threshold (should be False, must be > threshold) orig = 100_000 opt = int(orig / (1 + MIN_IMPROVEMENT_THRESHOLD)) # exactly 1% faster cand = OptimizedCandidateResult(best_test_runtime=opt) codeflash_output = speedup_critic(cand, orig, None) # 1.89μs -> 1.73μs (9.17% faster) def test_edge_throughput_exactly_at_threshold(): # Test: throughput improvement exactly at threshold (should be False) orig = 100_000 opt = 100_000 orig_through = 1000 opt_through = int(orig_through * (1 + MIN_THROUGHPUT_IMPROVEMENT_THRESHOLD)) cand = OptimizedCandidateResult(best_test_runtime=opt, async_throughput=opt_through) codeflash_output = speedup_critic(cand, orig, None, original_async_throughput=orig_through) # 2.77μs -> 2.30μs (20.0% faster) def test_edge_runtime_below_10us_noise_floor(): # Test: original runtime below 10us, noise floor is 3x orig = 9000 # 9us opt = int(orig / (1 + 3 * MIN_IMPROVEMENT_THRESHOLD)) - 1 # Just above noise floor cand = OptimizedCandidateResult(best_test_runtime=opt) codeflash_output = speedup_critic(cand, orig, None) # 1.91μs -> 1.81μs (5.57% faster) # Now just below noise floor opt = int(orig / (1 + 3 * MIN_IMPROVEMENT_THRESHOLD)) + 1 cand = OptimizedCandidateResult(best_test_runtime=opt) codeflash_output = speedup_critic(cand, orig, None) # 872ns -> 761ns (14.6% faster) def test_edge_runtime_zero_optimized_runtime(): # Test: optimized_runtime_ns == 0 (should return False, as gain is 0) orig = 100_000 opt = 0 cand = OptimizedCandidateResult(best_test_runtime=opt) codeflash_output = speedup_critic(cand, orig, None) # 2.13μs -> 1.73μs (23.1% faster) def test_edge_throughput_zero_original_throughput(): # Test: original throughput is zero (should not crash, gain is 0) orig = 100_000 opt = 100_000 orig_through = 0 opt_through = 1000 cand = OptimizedCandidateResult(best_test_runtime=opt, async_throughput=opt_through) codeflash_output = speedup_critic(cand, orig, None, original_async_throughput=orig_through) # 2.77μs -> 2.50μs (10.4% faster) def test_edge_throughput_none_values(): # Test: async_throughput is None in candidate (should default to runtime logic) orig = 100_000 opt = 90_000 cand = OptimizedCandidateResult(best_test_runtime=opt, async_throughput=None) codeflash_output = speedup_critic(cand, orig, None, original_async_throughput=1000) # 2.54μs -> 2.20μs (15.0% faster) # Test: original_async_throughput is None (should default to runtime logic) cand = OptimizedCandidateResult(best_test_runtime=opt, async_throughput=1100) codeflash_output = speedup_critic(cand, orig, None, original_async_throughput=None) # 1.10μs -> 972ns (13.4% faster) def test_large_scale_many_candidates_runtime(monkeypatch): # Test: 1000 candidates, only one is best and above threshold orig = 100_000 prev_best = 90_000 # All candidates are worse than prev_best except one results = [OptimizedCandidateResult(best_test_runtime=orig - i) for i in range(1000)] # Only candidate at index 950 is below prev_best and above threshold results[950] = OptimizedCandidateResult(best_test_runtime=85_000) count = 0 for cand in results: if speedup_critic(cand, orig, prev_best): count += 1 def test_large_scale_throughput(monkeypatch): # Test: 500 candidates, only a few have throughput above threshold and best so far orig = 100_000 orig_through = 1000 prev_best_through = 1100 results = [] for i in range(500): # Most are below threshold results.append(OptimizedCandidateResult(best_test_runtime=orig, async_throughput=orig_through + i)) # Only candidate at index 499 is above prev_best_through and above threshold results[499] = OptimizedCandidateResult(best_test_runtime=orig, async_throughput=1200) count = 0 for cand in results: if speedup_critic( cand, orig, None, original_async_throughput=orig_through, best_throughput_until_now=prev_best_through ): count += 1 def test_large_scale_runtime_and_throughput_combined(): # Test: 100 candidates, some with runtime improvement, some with throughput, some both orig = 100_000 orig_through = 1000 prev_best = 95_000 prev_best_through = 1050 results = [] # 10 with runtime improvement and best so far for i in range(10): results.append(OptimizedCandidateResult(best_test_runtime=90_000 - i, async_throughput=1000)) # 10 with throughput improvement and best so far for i in range(10): results.append(OptimizedCandidateResult(best_test_runtime=100_000, async_throughput=1100 + i)) # 10 with both for i in range(10): results.append(OptimizedCandidateResult(best_test_runtime=90_000 - i, async_throughput=1100 + i)) # The rest with no improvement for i in range(70): results.append(OptimizedCandidateResult(best_test_runtime=100_000, async_throughput=1000)) count = 0 for cand in results: if speedup_critic( cand, orig, prev_best, original_async_throughput=orig_through, best_throughput_until_now=prev_best_through ): count += 1 def test_large_scale_edge_case_all_equal(): # Test: all candidates have identical performance, none should pass orig = 100_000 prev_best = 90_000 orig_through = 1000 prev_best_through = 1100 results = [OptimizedCandidateResult(best_test_runtime=100_000, async_throughput=1000) for _ in range(500)] for cand in results: codeflash_output = speedup_critic( cand, orig, prev_best, original_async_throughput=orig_through, best_throughput_until_now=prev_best_through ) # 385μs -> 321μs (19.9% faster) def test_large_scale_edge_case_all_best(): # Test: all candidates are best and above threshold, all should pass orig = 100_000 prev_best = 110_000 orig_through = 1000 prev_best_through = 900 results = [OptimizedCandidateResult(best_test_runtime=90_000, async_throughput=1200) for _ in range(500)] for cand in results: codeflash_output = speedup_critic( cand, orig, prev_best, original_async_throughput=orig_through, best_throughput_until_now=prev_best_through ) # 384μs -> 321μs (19.5% faster) # codeflash_output is used to check that the output of the original code is the same as that of the optimized code. #------------------------------------------------ from __future__ import annotations import os from dataclasses import dataclass from functools import lru_cache # imports import pytest # used for our unit tests from codeflash.result.critic import speedup_critic # --- Minimal stubs for external dependencies/constants --- # These are required for the function to run in this test file. MIN_IMPROVEMENT_THRESHOLD = 0.01 # 1% MIN_THROUGHPUT_IMPROVEMENT_THRESHOLD = 0.01 # 1% @dataclass class OptimizedCandidateResult: best_test_runtime: int # in nanoseconds async_throughput: int | None = None from codeflash.result.critic import speedup_critic # ----------- 1. BASIC TEST CASES ----------- def test_basic_significant_runtime_improvement(): # Optimized code is 20% faster than original, above threshold orig = 100_000 # ns opt = 80_000 # ns (20% faster) candidate = OptimizedCandidateResult(best_test_runtime=opt) # No previous best codeflash_output = speedup_critic(candidate, orig, None) # 2.42μs -> 2.11μs (14.7% faster) def test_basic_insufficient_runtime_improvement(): # Only 0.5% improvement, below threshold orig = 100_000 opt = 99_500 # 0.5% improvement candidate = OptimizedCandidateResult(best_test_runtime=opt) codeflash_output = speedup_critic(candidate, orig, None) # 2.08μs -> 1.94μs (7.15% faster) def test_basic_no_improvement(): # Optimized code is slower orig = 100_000 opt = 120_000 candidate = OptimizedCandidateResult(best_test_runtime=opt) codeflash_output = speedup_critic(candidate, orig, None) # 2.00μs -> 1.89μs (5.81% faster) def test_basic_best_runtime_until_now(): # There is a previous best, and this candidate is not better orig = 100_000 opt = 80_000 candidate = OptimizedCandidateResult(best_test_runtime=opt) best_so_far = 75_000 # Already have a better one codeflash_output = speedup_critic(candidate, orig, best_so_far) # 2.13μs -> 1.92μs (11.0% faster) def test_basic_new_best_runtime(): # There is a previous best, and this candidate is better orig = 100_000 opt = 70_000 candidate = OptimizedCandidateResult(best_test_runtime=opt) best_so_far = 75_000 codeflash_output = speedup_critic(candidate, orig, best_so_far) # 1.94μs -> 1.93μs (0.517% faster) def test_basic_throughput_improvement_accepts(): # Throughput improvement is significant, runtime is not orig = 100_000 opt = 99_500 orig_thr = 1000 opt_thr = 1100 # 10% improvement candidate = OptimizedCandidateResult(best_test_runtime=opt, async_throughput=opt_thr) codeflash_output = speedup_critic(candidate, orig, None, original_async_throughput=orig_thr) # 3.14μs -> 2.73μs (14.7% faster) def test_basic_throughput_no_improvement_rejects(): # Throughput improvement is below threshold, runtime is not improved orig = 100_000 opt = 99_500 orig_thr = 1000 opt_thr = 1005 # 0.5% improvement candidate = OptimizedCandidateResult(best_test_runtime=opt, async_throughput=opt_thr) codeflash_output = speedup_critic(candidate, orig, None, original_async_throughput=orig_thr) # 2.71μs -> 2.42μs (12.0% faster) def test_basic_throughput_and_runtime_both_improve(): # Both throughput and runtime improve significantly orig = 100_000 opt = 80_000 orig_thr = 1000 opt_thr = 1200 candidate = OptimizedCandidateResult(best_test_runtime=opt, async_throughput=opt_thr) codeflash_output = speedup_critic(candidate, orig, None, original_async_throughput=orig_thr) # 2.74μs -> 2.32μs (17.7% faster) def test_basic_throughput_is_best_check(): # Throughput improvement is significant but not best so far, should reject orig = 100_000 opt = 99_000 orig_thr = 1000 opt_thr = 1100 best_thr = 1200 # Already have a better throughput candidate = OptimizedCandidateResult(best_test_runtime=opt, async_throughput=opt_thr) codeflash_output = speedup_critic(candidate, orig, None, original_async_throughput=orig_thr, best_throughput_until_now=best_thr) # 3.00μs -> 2.62μs (14.1% faster) def test_basic_throughput_best_but_runtime_not_best(): # Throughput is new best, runtime is not best but throughput is enough orig = 100_000 opt = 99_000 orig_thr = 1000 opt_thr = 1300 best_thr = 1200 candidate = OptimizedCandidateResult(best_test_runtime=opt, async_throughput=opt_thr) codeflash_output = speedup_critic(candidate, orig, 98_000, original_async_throughput=orig_thr, best_throughput_until_now=best_thr) # 2.97μs -> 2.73μs (8.80% faster) # ----------- 2. EDGE TEST CASES ----------- def test_edge_runtime_just_below_threshold(): # Improvement is just below the threshold (should reject) orig = 100_000 opt = int(orig / (1 + MIN_IMPROVEMENT_THRESHOLD - 0.0001)) # Just under threshold candidate = OptimizedCandidateResult(best_test_runtime=opt) codeflash_output = speedup_critic(candidate, orig, None) # 1.94μs -> 1.78μs (8.97% faster) def test_edge_runtime_just_above_threshold(): # Improvement is just above the threshold (should accept) orig = 100_000 opt = int(orig / (1 + MIN_IMPROVEMENT_THRESHOLD + 0.0001)) # Just over threshold candidate = OptimizedCandidateResult(best_test_runtime=opt) codeflash_output = speedup_critic(candidate, orig, None) # 1.74μs -> 1.57μs (10.9% faster) def test_edge_small_original_runtime_noise_floor(monkeypatch): # For original_code_runtime < 10_000, noise floor is 3x threshold orig = 9000 opt = int(orig / (1 + 3 * MIN_IMPROVEMENT_THRESHOLD + 0.0001)) # Just over noise floor candidate = OptimizedCandidateResult(best_test_runtime=opt) codeflash_output = speedup_critic(candidate, orig, None) # 1.94μs -> 1.84μs (5.37% faster) def test_edge_small_original_runtime_just_below_noise_floor(monkeypatch): # For original_code_runtime < 10_000, just below noise floor orig = 9000 opt = int(orig / (1 + 3 * MIN_IMPROVEMENT_THRESHOLD - 0.0001)) # Just under noise floor candidate = OptimizedCandidateResult(best_test_runtime=opt) codeflash_output = speedup_critic(candidate, orig, None) # 1.78μs -> 1.77μs (0.564% faster) def test_edge_zero_optimized_runtime(): # Optimized runtime is zero (should not crash, should return False) orig = 100_000 opt = 0 candidate = OptimizedCandidateResult(best_test_runtime=opt) codeflash_output = speedup_critic(candidate, orig, None) # 2.29μs -> 1.80μs (27.2% faster) def test_edge_zero_original_throughput(): # Original throughput is zero, should not crash, throughput gain is 0 orig = 100_000 opt = 99_000 orig_thr = 0 opt_thr = 1000 candidate = OptimizedCandidateResult(best_test_runtime=opt, async_throughput=opt_thr) codeflash_output = speedup_critic(candidate, orig, None, original_async_throughput=orig_thr) # 2.84μs -> 2.60μs (9.29% faster) def test_edge_none_throughput_values(): # Throughput values are None, should fallback to runtime only orig = 100_000 opt = 80_000 candidate = OptimizedCandidateResult(best_test_runtime=opt, async_throughput=None) codeflash_output = speedup_critic(candidate, orig, None, original_async_throughput=None) # 2.34μs -> 2.19μs (6.89% faster) def test_edge_none_candidate_throughput(): # Candidate throughput is None, should fallback to runtime only orig = 100_000 opt = 80_000 candidate = OptimizedCandidateResult(best_test_runtime=opt, async_throughput=None) codeflash_output = speedup_critic(candidate, orig, None, original_async_throughput=1000) # 2.31μs -> 2.20μs (4.99% faster) def test_edge_none_original_throughput(): # Original throughput is None, should fallback to runtime only orig = 100_000 opt = 80_000 candidate = OptimizedCandidateResult(best_test_runtime=opt, async_throughput=1200) codeflash_output = speedup_critic(candidate, orig, None, original_async_throughput=None) # 2.16μs -> 2.01μs (7.45% faster) def test_edge_throughput_and_runtime_both_worse(): # Both throughput and runtime are worse, must reject orig = 100_000 opt = 120_000 orig_thr = 1000 opt_thr = 900 candidate = OptimizedCandidateResult(best_test_runtime=opt, async_throughput=opt_thr) codeflash_output = speedup_critic(candidate, orig, None, original_async_throughput=orig_thr) # 2.96μs -> 2.54μs (16.1% faster) def test_edge_throughput_better_runtime_worse(): # Throughput is significantly better, runtime is worse, should accept orig = 100_000 opt = 120_000 orig_thr = 1000 opt_thr = 1100 candidate = OptimizedCandidateResult(best_test_runtime=opt, async_throughput=opt_thr) codeflash_output = speedup_critic(candidate, orig, None, original_async_throughput=orig_thr) # 2.79μs -> 2.42μs (14.8% faster) def test_edge_throughput_better_but_not_best(): # Throughput is improved but not the best so far, should reject orig = 100_000 opt = 120_000 orig_thr = 1000 opt_thr = 1100 best_thr = 1200 candidate = OptimizedCandidateResult(best_test_runtime=opt, async_throughput=opt_thr) codeflash_output = speedup_critic(candidate, orig, None, original_async_throughput=orig_thr, best_throughput_until_now=best_thr) # 3.04μs -> 2.59μs (17.4% faster) # ----------- 3. LARGE SCALE TEST CASES ----------- def test_large_scale_many_candidates_runtime(monkeypatch): # Test with a large number of candidates, only the best one should be accepted orig = 1_000_000 best_so_far = 800_000 # Try a batch of 1000 candidates with slightly worse runtimes for i in range(1000): candidate = OptimizedCandidateResult(best_test_runtime=best_so_far + i + 1) codeflash_output = speedup_critic(candidate, orig, best_so_far) # 510μs -> 451μs (13.0% faster) # Now test with a better candidate candidate = OptimizedCandidateResult(best_test_runtime=750_000) codeflash_output = speedup_critic(candidate, orig, best_so_far) # 601ns -> 501ns (20.0% faster) def test_large_scale_throughput_candidates(): # Test with many throughput candidates, only the best and improved one should be accepted orig = 1_000_000 orig_thr = 10_000 best_thr = 12_000 # All these are not best so should be rejected for i in range(1000): candidate = OptimizedCandidateResult(best_test_runtime=900_000, async_throughput=best_thr - 1 - i) codeflash_output = speedup_critic(candidate, orig, None, original_async_throughput=orig_thr, best_throughput_until_now=best_thr) # 783μs -> 653μs (20.0% faster) # Now test with a new best throughput candidate = OptimizedCandidateResult(best_test_runtime=900_000, async_throughput=13_000) codeflash_output = speedup_critic(candidate, orig, None, original_async_throughput=orig_thr, best_throughput_until_now=best_thr) # 872ns -> 721ns (20.9% faster) def test_large_scale_performance(monkeypatch): # Simulate a large batch of candidates, all with significant improvement orig = 1_000_000 for i in range(1000): candidate = OptimizedCandidateResult(best_test_runtime=orig - 20_000 - i) codeflash_output = speedup_critic(candidate, orig, None) # 486μs -> 427μs (13.7% faster) def test_large_scale_edge_thresholds(): # Many candidates, some just above and some just below threshold orig = 1_000_000 # Just below threshold opt = int(orig / (1 + MIN_IMPROVEMENT_THRESHOLD - 0.0001)) for _ in range(500): candidate = OptimizedCandidateResult(best_test_runtime=opt) codeflash_output = speedup_critic(candidate, orig, None) # 244μs -> 215μs (13.4% faster) # Just above threshold opt = int(orig / (1 + MIN_IMPROVEMENT_THRESHOLD + 0.0001)) for _ in range(500): candidate = OptimizedCandidateResult(best_test_runtime=opt) codeflash_output = speedup_critic(candidate, orig, None) # 241μs -> 212μs (13.7% faster) def test_large_scale_throughput_and_runtime_mixed(): # Many candidates, some with only throughput improvement, some with only runtime, some with both, some with neither orig = 1_000_000 orig_thr = 10_000 # Only throughput improves candidate = OptimizedCandidateResult(best_test_runtime=995_000, async_throughput=11_000) codeflash_output = speedup_critic(candidate, orig, None, original_async_throughput=orig_thr) # 3.04μs -> 2.58μs (17.4% faster) # Only runtime improves candidate = OptimizedCandidateResult(best_test_runtime=900_000, async_throughput=10_000) codeflash_output = speedup_critic(candidate, orig, None, original_async_throughput=orig_thr) # 1.45μs -> 1.24μs (16.8% faster) # Both improve candidate = OptimizedCandidateResult(best_test_runtime=900_000, async_throughput=11_000) codeflash_output = speedup_critic(candidate, orig, None, original_async_throughput=orig_thr) # 1.01μs -> 832ns (21.6% faster) # Neither improves candidate = OptimizedCandidateResult(best_test_runtime=1_010_000, async_throughput=9_000) codeflash_output = speedup_critic(candidate, orig, None, original_async_throughput=orig_thr) # 922ns -> 802ns (15.0% faster) # codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

To test or edit this optimization locally git merge codeflash/optimize-pr687-2025-09-24T19.11.26

Click to see suggested changes

Suggested change

# Runtime performance evaluation

noise_floor = 3 * MIN_IMPROVEMENT_THRESHOLD if original_code_runtime < 10000 else MIN_IMPROVEMENT_THRESHOLD

if not disable_gh_action_noise and env_utils.is_ci():

noise_floor = noise_floor * 2 # Increase the noise floor in GitHub Actions mode

perf_gain = performance_gain(

original_runtime_ns=original_code_runtime, optimized_runtime_ns=candidate_result.best_test_runtime

)

if best_runtime_until_now is None:

# collect all optimizations with this

return bool(perf_gain > noise_floor)

return bool(perf_gain > noise_floor and candidate_result.best_test_runtime < best_runtime_until_now)

runtime_improved = perf_gain > noise_floor

# Check runtime comparison with best so far

runtime_is_best = best_runtime_until_now is None or candidate_result.best_test_runtime < best_runtime_until_now

throughput_improved = True # Default to True if no throughput data

throughput_is_best = True # Default to True if no throughput data

if original_async_throughput is not None and candidate_result.async_throughput is not None:

if original_async_throughput > 0:

throughput_gain_value = throughput_gain(

original_throughput=original_async_throughput, optimized_throughput=candidate_result.async_throughput

)

throughput_improved = throughput_gain_value > MIN_THROUGHPUT_IMPROVEMENT_THRESHOLD

throughput_is_best = (

best_throughput_until_now is None or candidate_result.async_throughput > best_throughput_until_now

)

if original_async_throughput is not None and candidate_result.async_throughput is not None:

# When throughput data is available, accept if EITHER throughput OR runtime improves significantly

throughput_acceptance = throughput_improved and throughput_is_best

runtime_acceptance = runtime_improved and runtime_is_best

return throughput_acceptance or runtime_acceptance

noise_floor = 3 * MIN_IMPROVEMENT_THRESHOLD if original_code_runtime < 10000 else MIN_IMPROVEMENT_THRESHOLD

if not disable_gh_action_noise and env_utils.is_ci():

noise_floor = noise_floor * 2 # Increase the noise floor in GitHub Actions mode

perf_gain = (

(original_code_runtime - candidate_result.best_test_runtime) / candidate_result.best_test_runtime

if candidate_result.best_test_runtime != 0

else 0.0

)

runtime_improved = perf_gain > noise_floor

runtime_is_best = best_runtime_until_now is None or candidate_result.best_test_runtime < best_runtime_until_now

# Combine throughput logic for tighter critical-path performance

if original_async_throughput is not None and candidate_result.async_throughput is not None:

if original_async_throughput > 0:

throughput_gain_value = (

candidate_result.async_throughput - original_async_throughput

) / original_async_throughput

throughput_improved = throughput_gain_value > MIN_THROUGHPUT_IMPROVEMENT_THRESHOLD

else:

throughput_improved = True

throughput_is_best = (

best_throughput_until_now is None or candidate_result.async_throughput > best_throughput_until_now

)

# Accept if either throughput or runtime improvement is good and is best so far

return (throughput_improved and throughput_is_best) or (runtime_improved and runtime_is_best)

# No async throughput measured: fallback to only runtime logic

codeflash/code_utils/instrument_existing_tests.py

KRRT7 changed the base branch from main to standalone-fto-async August 25, 2025 22:55

KRRT7 force-pushed the granular-async-instrumentation branch from 52dbe88 to b153989 Compare August 26, 2025 10:14

first pass at wrapper deco behavioral

0a57afa

KRRT7 force-pushed the granular-async-instrumentation branch from ef07e94 to 0a57afa Compare August 26, 2025 10:24

don't reapply too early

c9aaaad