Skip to content

Conversation

@KRRT7
Copy link
Contributor

@KRRT7 KRRT7 commented Aug 25, 2025

User description

dependent on #678


PR Type

Enhancement, Tests, Bug fix


Description

  • Add async function instrumentation and throughput

  • Update optimizer for async benchmarking

  • Extend AST utilities to handle async defs

  • Add comprehensive async test suites


Diagram Walkthrough

flowchart LR
  A["Async code detection"] -- "decorate defs" --> B["Instrument source module"]
  B -- "run tests" --> C["Parse results"]
  C -- "calc runtime/throughput" --> D["Optimizer decisions"]
  D -- "report gains" --> E["Explanations/critics"]
Loading

File Walkthrough

Relevant files
Tests
4 files
test_async_run_and_parse_tests.py
End-to-end async run_and_parse_tests scenarios                     
+1039/-0
test_instrument_async_tests.py
Async decorator injection and test instrumentation             
+728/-0 
test_unused_helper_revert.py
Unused helper revert with async scenarios                               
+614/-5 
test_async_wrapper_sqlite_validation.py
Validate async wrapper SQLite capture and perf                     
+285/-0 
Enhancement
2 files
function_optimizer.py
Async-aware baseline, benchmarking, and throughput             
+180/-24
edit_generated_tests.py
Support AsyncFunctionDef and async test pruning                   
+15/-5   
Additional files
32 files
e2e-async.yaml +69/-0   
pre-commit.yaml +0/-19   
.pre-commit-config.yaml +2/-1     
async_bubble_sort.py +43/-0   
main.py +16/-0   
pyproject.toml +6/-0     
__init__.py [link]   
codeflash.code-workspace +5/-1     
aiservice.py +14/-0   
code_extractor.py +86/-3   
codeflash_wrap_decorator.py +167/-0 
config_consts.py +1/-0     
coverage_utils.py +3/-1     
instrument_existing_tests.py +370/-2 
unused_definition_remover.py +4/-1     
functions_to_optimize.py +5/-1     
models.py +4/-0     
critic.py +51/-8   
explanation.py +48/-1   
comparator.py +28/-7   
concolic_testing.py +2/-1     
equivalence.py +22/-0   
parse_test_output.py +24/-0   
pytest_plugin.py +22/-0   
test_runner.py +1/-1     
pyproject.toml +1/-0     
end_to_end_test_async.py +27/-0   
test_add_runtime_comments.py +207/-1 
test_code_context_extractor.py +146/-1 
test_code_replacement.py +172/-0 
test_code_utils.py +81/-1   
test_critic.py +163/-1 

@KRRT7 KRRT7 changed the base branch from main to standalone-fto-async August 25, 2025 22:55
@github-actions
Copy link

github-actions bot commented Aug 25, 2025

PR Code Suggestions ✨

Latest suggestions up to b221be4

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
Possible issue
Snapshot and restore pre-instrumented source

Preserve and restore the exact pre-benchmark source for the async instrumentation
step; using the original baseline code here can discard concurrent candidate edits
or prior changes. Snapshot the file immediately before instrumentation and restore
from that snapshot.

codeflash/optimization/function_optimizer.py [1559-1588]

 if test_framework == "pytest":
     line_profile_results = self.line_profiler_step(
         code_context=code_context, original_helper_code=original_helper_code, candidate_index=0
     )
     console.rule()
 
+    pre_perf_source_snapshot = None
     if self.function_to_optimize.is_async:
-        from codeflash.code_utils.instrument_existing_tests import (
-            instrument_source_module_with_async_decorators,
-        )
-
+        from codeflash.code_utils.instrument_existing_tests import instrument_source_module_with_async_decorators
+        # Snapshot current on-disk code prior to perf instrumentation
+        pre_perf_source_snapshot = Path(self.function_to_optimize.file_path).read_text(encoding="utf8")
         success, instrumented_source = instrument_source_module_with_async_decorators(
             self.function_to_optimize.file_path, self.function_to_optimize, TestingMode.PERFORMANCE
         )
         if success and instrumented_source:
             with self.function_to_optimize.file_path.open("w", encoding="utf8") as f:
                 f.write(instrumented_source)
-            logger.debug(
-                f"Applied async performance instrumentation to {self.function_to_optimize.file_path}"
-            )
+            logger.debug(f"Applied async performance instrumentation to {self.function_to_optimize.file_path}")
 
     try:
         benchmarking_results, _ = self.run_and_parse_tests(
             testing_type=TestingMode.PERFORMANCE,
             ...
         )
     finally:
-        if self.function_to_optimize.is_async:
-            self.write_code_and_helpers(
-                self.function_to_optimize_source_code,
-                original_helper_code,
-                self.function_to_optimize.file_path,
-            )
+        if self.function_to_optimize.is_async and pre_perf_source_snapshot is not None:
+            with self.function_to_optimize.file_path.open("w", encoding="utf8") as f:
+                f.write(pre_perf_source_snapshot)
Suggestion importance[1-10]: 8

__

Why: The proposal to snapshot the source immediately before async performance instrumentation and restore from that snapshot matches the PR code path and prevents losing intermediate changes; it's a high-impact correctness improvement for async benchmarking flows.

Medium
Build qualified decorator correctly

The timeout decorator is being appended as a Call to a Name with dotted path, which
is invalid for qualified names and will not resolve. Construct the decorator using
an Attribute chain so timeout_decorator.timeout binds correctly.

codeflash/code_utils/instrument_existing_tests.py [318-336]

 def visit_ClassDef(self, node: ast.ClassDef) -> ast.ClassDef:
     # Add timeout decorator for unittest test classes if needed
     if self.test_framework == "unittest":
         timeout_decorator = ast.Call(
-            func=ast.Name(id="timeout_decorator.timeout", ctx=ast.Load()),
+            func=ast.Attribute(
+                value=ast.Name(id="timeout_decorator", ctx=ast.Load()),
+                attr="timeout",
+                ctx=ast.Load(),
+            ),
             args=[ast.Constant(value=15)],
             keywords=[],
         )
         for item in node.body:
             if (
                 isinstance(item, ast.FunctionDef)
                 and item.name.startswith("test_")
                 and not any(
                     isinstance(d, ast.Call)
-                    and isinstance(d.func, ast.Name)
-                    and d.func.id == "timeout_decorator.timeout"
+                    and isinstance(d.func, ast.Attribute)
+                    and isinstance(d.func.value, ast.Name)
+                    and d.func.value.id == "timeout_decorator"
+                    and d.func.attr == "timeout"
                     for d in item.decorator_list
                 )
             ):
                 item.decorator_list.append(timeout_decorator)
     return self.generic_visit(node)
Suggestion importance[1-10]: 8

__

Why: Constructing a call with ast.Name(id="timeout_decorator.timeout") is invalid; using an ast.Attribute ensures the decorator resolves properly, fixing a real bug in AST generation for unittest timeouts.

Medium
Fix missing dependency import

Importing env_utils is missing in this file, causing a NameError at runtime. Add the
appropriate import to ensure CI detection works. Keep logic unchanged.

codeflash/result/critic.py [65-67]

+from codeflash.utils import env_utils
+
 noise_floor = 3 * MIN_IMPROVEMENT_THRESHOLD if original_code_runtime < 10000 else MIN_IMPROVEMENT_THRESHOLD
 if not disable_gh_action_noise and env_utils.is_ci():
     noise_floor = noise_floor * 2  # Increase the noise floor in GitHub Actions mode
Suggestion importance[1-10]: 7

__

Why: The code uses env_utils.is_ci() but this module isn't imported in the shown diff; adding from codeflash.utils import env_utils prevents a NameError and preserves existing behavior.

Medium
Avoid unconditional source overwrite

Ensure the source is restored only if it was mutated during this method; otherwise
this may clobber concurrent or external edits. Gate the restoration behind a flag
tracking whether instrumentation/rewrites were applied.

codeflash/optimization/function_optimizer.py [430-436]

 if not best_optimization:
-    self.write_code_and_helpers(
-        self.function_to_optimize_source_code, original_helper_code, self.function_to_optimize.file_path
-    )
+    # Restore original code only if we modified files during this run
+    if getattr(self, "did_modify_source", False):
+        self.write_code_and_helpers(
+            self.function_to_optimize_source_code, original_helper_code, self.function_to_optimize.file_path
+        )
     return Failure(f"No best optimizations found for function {self.function_to_optimize.qualified_name}")
Suggestion importance[1-10]: 6

__

Why: The suggestion correctly targets the newly added unconditional restore when best_optimization is falsy and proposes guarding it, which can prevent unnecessary or unsafe overwrites; however, it assumes the presence of a tracking flag not shown in the PR and is an improvement rather than a bug fix.

Low
General
Guard restoration with modification flag

Only revert files in finally if they were actually modified; otherwise you risk
unnecessary disk churn and potential race overwrites. Track modification and wrap
each instrumentation phase with its own guarded revert.

codeflash/optimization/function_optimizer.py [1426-1459]

+did_modify = False
 if self.function_to_optimize.is_async:
-    from codeflash.code_utils.instrument_existing_tests import (
-        instrument_source_module_with_async_decorators,
-    )
-
+    from codeflash.code_utils.instrument_existing_tests import instrument_source_module_with_async_decorators
     success, instrumented_source = instrument_source_module_with_async_decorators(
         self.function_to_optimize.file_path, self.function_to_optimize, TestingMode.BEHAVIOR
     )
     if success and instrumented_source:
         with self.function_to_optimize.file_path.open("w", encoding="utf8") as f:
             f.write(instrumented_source)
+        did_modify = True
         logger.debug(f"Applied async instrumentation to {self.function_to_optimize.file_path}")
-# Instrument codeflash capture
 try:
     instrument_codeflash_capture(
         ...
     )
 finally:
-    self.write_code_and_helpers(
-        self.function_to_optimize_source_code, original_helper_code, self.function_to_optimize.file_path
-    )
+    if did_modify:
+        self.write_code_and_helpers(
+            self.function_to_optimize_source_code, original_helper_code, self.function_to_optimize.file_path
+        )
Suggestion importance[1-10]: 7

__

Why: This accurately addresses the async instrumentation/restore block in establish_original_code_baseline, reducing unnecessary restores and potential race overwrites by tracking modifications; it's context-aware and improves robustness, though it introduces a new flag not present in the PR.

Medium
Use generic_visit for traversal

Manually iterating node.body and then visiting may skip nested functions deeper than
one level and diverge from NodeTransformer traversal semantics. Delegate to
generic_visit to ensure consistent traversal of all nested nodes.

codeflash/code_utils/edit_generated_tests.py [33-40]

-for inner_node in node.body:
-    if isinstance(inner_node, ast.FunctionDef):
-        self.visit_FunctionDef(inner_node)
-    elif isinstance(inner_node, ast.AsyncFunctionDef):
-        self.visit_AsyncFunctionDef(inner_node)
+def visit_ClassDef(self, node: ast.ClassDef) -> ast.ClassDef:
+    self.context_stack.append(node.name)
+    node = self.generic_visit(node)
+    self.context_stack.pop()
+    return node
Suggestion importance[1-10]: 5

__

Why: Switching to generic_visit ensures deeper nested functions are traversed, improving robustness; however, the current manual visit may be intentional to maintain class context and already handles async methods at one level, so impact is moderate.

Low

Previous suggestions

Suggestions up to commit 91b8902
CategorySuggestion                                                                                                                                    Impact
Possible issue
Correct star import detection

The child.names attribute is a sequence, not a single node, so
isinstance(child.names, cst.ImportStar) will never be true. Instead, check each
alias in the list and skip the entire import-from if any alias is a star import.

codeflash/code_utils/code_extractor.py [275-276]

-if isinstance(child.names, cst.ImportStar):
+if any(isinstance(alias, cst.ImportStar) for alias in child.names):
     continue
Suggestion importance[1-10]: 8

__

Why: The isinstance(child.names, cst.ImportStar) check always fails because child.names is a list, so using any(isinstance(alias, cst.ImportStar) ...) correctly detects star imports and skips them.

Medium
Suggestions up to commit c9aaaad
CategorySuggestion                                                                                                                                    Impact
Possible issue
Include full async execution time

Remove the second timestamp reset so that the measured duration includes both
synchronous and asynchronous execution time. Always start the timer once before
calling the function. This ensures the reported duration accurately reflects total
runtime.

codeflash/code_utils/codeflash_wrap_decorator.py [62-71]

 counter = time.perf_counter_ns()
 ret = func(*args, **kwargs)
 
 if inspect.isawaitable(ret):
-    counter = time.perf_counter_ns()
     return_value = await ret
 else:
     return_value = ret
 
 codeflash_duration = time.perf_counter_ns() - counter
Suggestion importance[1-10]: 8

__

Why: The extra counter = time.perf_counter_ns() inside the await branch discards the time spent before awaiting, so removing it yields an accurate total runtime measurement.

Medium
Suggestions up to commit 14af1a8
CategorySuggestion                                                                                                                                    Impact
Possible issue
Support dict parents in qualified_name

Update the property to handle both FunctionParent objects and dicts in self.parents,
avoiding attribute errors when parents are passed as mappings. Build a list of names
using a conditional that checks for name attribute first, then dict key.

codeflash/discovery/functions_to_optimize.py [162-166]

 @property
 def qualified_name(self) -> str:
     if not self.parents:
         return self.function_name
-    # Join all parent names with dots to handle nested classes properly
-    parent_path = ".".join(parent.name for parent in self.parents)
+    parent_names = [
+        p.name if hasattr(p, "name") else p["name"]
+        for p in self.parents
+    ]
+    parent_path = ".".join(parent_names)
     return f"{parent_path}.{self.function_name}"
Suggestion importance[1-10]: 9

__

Why: This prevents attribute errors when self.parents contains dicts as in tests, ensuring qualified_name works in all cases.

High
General
Measure entire async call duration

Remove the second reset of counter before await so that codeflash_duration covers
the full execution time of the async function, including any synchronous setup.

codeflash/code_utils/codeflash_wrap_decorator.py [63-72]

 counter = time.perf_counter_ns()
 ret = func(*args, **kwargs)
 
 if inspect.isawaitable(ret):
-    counter = time.perf_counter_ns()
     return_value = await ret
 else:
     return_value = ret
 
 codeflash_duration = time.perf_counter_ns() - counter
Suggestion importance[1-10]: 6

__

Why: Removing the reset of counter ensures the duration includes both synchronous setup and asynchronous execution, improving timing accuracy.

Low
Log instrumentation errors

Log exceptions in the except block so failures to instrument async decorators are
surfaced during debugging instead of silently ignored.

codeflash/code_utils/instrument_existing_tests.py [325-345]

 def instrument_source_module_with_async_decorators(
     ...
 ) -> tuple[bool, str | None]:
     ...
     try:
         ...
     except Exception as e:
+        logger.exception(
+            f"Failed to instrument async decorator for "
+            f"{function_to_optimize.qualified_name} in {source_path}: {e}"
+        )
         return False, None
Suggestion importance[1-10]: 5

__

Why: Adding logger.exception surfaces internal failures during async decorator instrumentation, aiding debugging without altering control flow.

Low

@KRRT7 KRRT7 force-pushed the granular-async-instrumentation branch from 52dbe88 to b153989 Compare August 26, 2025 10:14
@KRRT7 KRRT7 force-pushed the granular-async-instrumentation branch from ef07e94 to 0a57afa Compare August 26, 2025 10:24
@misrasaurabh1
Copy link
Contributor

add an e2e test for this

@misrasaurabh1
Copy link
Contributor

Add tests in the style of

def test_perfinjector_bubble_sort_results() -> None:
. This gives us confidence in that the tests run and provide the correct return values

import_transformer = AsyncDecoratorImportAdder(mode)
module = module.visit(import_transformer)

return isort.code(module.code, float_to_top=True), decorator_transformer.added_decorator
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are isorting user's code?

codeflash-ai bot added a commit that referenced this pull request Aug 29, 2025
…687 (`granular-async-instrumentation`)

The optimization replaces expensive `Path` object creation and method calls with direct string manipulation operations, delivering a **491% speedup**.

**Key optimizations:**

1. **Eliminated Path object overhead**: Replaced `Path(filename).stem.startswith("test_")` with `filename.rpartition('/')[-1].rpartition('\\')[-1].rpartition('.')[0].startswith("test_")` - avoiding Path instantiation entirely.

2. **Optimized path parts extraction**: Replaced `Path(filename).parts` with `filename.replace('\\', '/').split('/')` - using simple string operations instead of Path parsing.

**Performance impact analysis:**
- Original profiler shows lines 25-26 (Path operations) consumed **86.3%** of total runtime (44.7% + 41.6%)
- Optimized version reduces these same operations to just **25.4%** of runtime (15% + 10.4%)
- The string manipulation operations are ~6x faster per call than Path object creation

**Test case benefits:**
- **Large-scale tests** see the biggest gains (516% faster for 900-frame stack, 505% faster for 950-frame chain) because the Path overhead multiplies with stack depth
- **Edge cases** with complex paths benefit significantly (182-206% faster for subdirectory and pytest frame tests)
- **Basic tests** show minimal overhead since Path operations weren't the bottleneck in shallow stacks

The optimization maintains identical behavior while eliminating the most expensive operations identified in the profiling data - Path object instantiation and method calls that occurred once per stack frame.
KRRT7 and others added 2 commits September 23, 2025 00:42
Get throughput from output  for async functions
matches_re_end = re.compile(r"!######(.*?):(.*?)([^\.:]*?):(.*?):(.*?):(.*?)######!")


start_pattern = re.compile(r"!\$######([^:]*):([^:]*):([^:]*):([^:]*):([^:]+)######\$!")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this pattern different from the one above? i expect the regexes to be the same


results_list = test_results.test_results
async_calls = [r for r in results_list if r.id.function_getting_tested == "async_merge_sort"]
assert len(async_calls) >= 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this test needs to be improved. See all the properties we are testing in the sync version of this, and test those as well.
Things like - the return values, the test id.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is important to give us confidence that these critical parameters are correct and never broken in the future.

assert test_results.test_results is not None
assert len(test_results.test_results) >= 2

results_list = test_results.test_results
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test for test_ids and return values exactly


assert test_results is not None
assert test_results.test_results is not None
assert len(test_results.test_results) >= 2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use equal operator in the test. make it precise

# Check that comments were added
modified_source = result.generated_tests[0].generated_original_test_source
assert modified_source == expected
assert modified_source == expected
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

eventually we should also add the throughput annotations when available

Comment on lines +63 to +94
# Runtime performance evaluation
noise_floor = 3 * MIN_IMPROVEMENT_THRESHOLD if original_code_runtime < 10000 else MIN_IMPROVEMENT_THRESHOLD
if not disable_gh_action_noise and env_utils.is_ci():
noise_floor = noise_floor * 2 # Increase the noise floor in GitHub Actions mode

perf_gain = performance_gain(
original_runtime_ns=original_code_runtime, optimized_runtime_ns=candidate_result.best_test_runtime
)
if best_runtime_until_now is None:
# collect all optimizations with this
return bool(perf_gain > noise_floor)
return bool(perf_gain > noise_floor and candidate_result.best_test_runtime < best_runtime_until_now)
runtime_improved = perf_gain > noise_floor

# Check runtime comparison with best so far
runtime_is_best = best_runtime_until_now is None or candidate_result.best_test_runtime < best_runtime_until_now

throughput_improved = True # Default to True if no throughput data
throughput_is_best = True # Default to True if no throughput data

if original_async_throughput is not None and candidate_result.async_throughput is not None:
if original_async_throughput > 0:
throughput_gain_value = throughput_gain(
original_throughput=original_async_throughput, optimized_throughput=candidate_result.async_throughput
)
throughput_improved = throughput_gain_value > MIN_THROUGHPUT_IMPROVEMENT_THRESHOLD

throughput_is_best = (
best_throughput_until_now is None or candidate_result.async_throughput > best_throughput_until_now
)

if original_async_throughput is not None and candidate_result.async_throughput is not None:
# When throughput data is available, accept if EITHER throughput OR runtime improves significantly
throughput_acceptance = throughput_improved and throughput_is_best
runtime_acceptance = runtime_improved and runtime_is_best
return throughput_acceptance or runtime_acceptance
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚡️Codeflash found 16% (0.16x) speedup for speedup_critic in codeflash/result/critic.py

⏱️ Runtime : 3.15 milliseconds 2.71 milliseconds (best of 98 runs)

📝 Explanation and details

The optimized code achieves a 16% speedup by eliminating function call overhead and streamlining conditional logic in the performance-critical speedup_critic function.

Key optimizations:

  1. Inlined performance calculation: Instead of calling performance_gain(), the performance gain is calculated directly inline as (original_code_runtime - candidate_result.best_test_runtime) / candidate_result.best_test_runtime. This eliminates function call overhead which was consuming 35.2% of the original execution time according to the line profiler.

  2. Inlined throughput calculation: Similarly, the throughput gain calculation is moved inline as (candidate_result.async_throughput - original_async_throughput) / original_async_throughput, removing another function call.

  3. Streamlined conditional structure: The throughput evaluation logic is reorganized to eliminate redundant variable assignments and combine the final decision logic more efficiently. The original code had separate variables for throughput_acceptance and runtime_acceptance, while the optimized version directly returns the combined condition.

  4. Reduced variable assignments: Eliminated unnecessary intermediate variables like throughput_improved = True defaults, handling the logic more directly within the conditional branches.

The line profiler shows the original performance_gain and throughput_gain function calls took significant time (25.8ms and 9.9ms respectively out of 73.3ms total). By inlining these simple calculations, the optimized version reduces total execution time to 31.0ms.

These optimizations are particularly effective for high-volume scenarios where speedup_critic is called frequently, as evidenced by the large-scale test cases showing consistent 13-20% improvements when processing hundreds or thousands of candidates.

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 14 Passed
🌀 Generated Regression Tests 5050 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 🔘 None Found
📊 Tests Coverage 100.0%
⚙️ Existing Unit Tests and Runtime
Test File::Test Function Original ⏱️ Optimized ⏱️ Speedup
test_critic.py::test_speedup_critic 4.67μs 4.26μs 9.65%✅
test_critic.py::test_speedup_critic_with_async_throughput 9.48μs 8.40μs 12.7%✅
🌀 Generated Regression Tests and Runtime
from __future__ import annotations

import os
from dataclasses import dataclass
from functools import lru_cache

# imports
import pytest  # used for our unit tests
from codeflash.result.critic import speedup_critic

# Simulate codeflash.code_utils.config_consts
MIN_IMPROVEMENT_THRESHOLD = 0.01  # 1%
MIN_THROUGHPUT_IMPROVEMENT_THRESHOLD = 0.01  # 1%

# Simulate codeflash.models.models.OptimizedCandidateResult
@dataclass
class OptimizedCandidateResult:
    best_test_runtime: int
    async_throughput: int | None = None
from codeflash.result.critic import speedup_critic

# ---- BASIC TEST CASES ----

def test_basic_runtime_improvement_above_threshold():
    # Test: optimized code is 10% faster, above 1% threshold
    orig = 100_000  # ns
    opt = 90_000    # ns (10% faster)
    cand = OptimizedCandidateResult(best_test_runtime=opt)
    codeflash_output = speedup_critic(cand, orig, None) # 2.13μs -> 2.06μs (3.39% faster)

def test_basic_runtime_improvement_below_threshold():
    # Test: optimized code is 0.5% faster, below 1% threshold
    orig = 100_000
    opt = 99_500  # 0.5% faster
    cand = OptimizedCandidateResult(best_test_runtime=opt)
    codeflash_output = speedup_critic(cand, orig, None) # 2.09μs -> 1.88μs (11.2% faster)

def test_basic_runtime_no_improvement():
    # Test: optimized code is slower
    orig = 100_000
    opt = 110_000
    cand = OptimizedCandidateResult(best_test_runtime=opt)
    codeflash_output = speedup_critic(cand, orig, None) # 1.98μs -> 1.85μs (7.07% faster)

def test_basic_runtime_improvement_and_best_so_far():
    # Test: improvement and better than previous best
    orig = 100_000
    opt = 90_000
    prev_best = 91_000
    cand = OptimizedCandidateResult(best_test_runtime=opt)
    codeflash_output = speedup_critic(cand, orig, prev_best) # 2.11μs -> 1.98μs (6.50% faster)

def test_basic_runtime_improvement_not_best_so_far():
    # Test: improvement but not better than previous best
    orig = 100_000
    opt = 92_000
    prev_best = 90_000
    cand = OptimizedCandidateResult(best_test_runtime=opt)
    codeflash_output = speedup_critic(cand, orig, prev_best) # 2.03μs -> 1.91μs (6.33% faster)

def test_basic_throughput_improvement_above_threshold():
    # Test: throughput improvement above threshold, no runtime improvement
    orig = 100_000
    opt = 100_000
    orig_through = 1000
    opt_through = 1100  # 10% improvement
    cand = OptimizedCandidateResult(best_test_runtime=opt, async_throughput=opt_through)
    codeflash_output = speedup_critic(cand, orig, None, original_async_throughput=orig_through) # 3.06μs -> 2.65μs (15.1% faster)

def test_basic_throughput_improvement_below_threshold():
    # Test: throughput improvement below threshold, no runtime improvement
    orig = 100_000
    opt = 100_000
    orig_through = 1000
    opt_through = 1005  # 0.5% improvement
    cand = OptimizedCandidateResult(best_test_runtime=opt, async_throughput=opt_through)
    codeflash_output = speedup_critic(cand, orig, None, original_async_throughput=orig_through) # 2.71μs -> 2.37μs (14.4% faster)

def test_basic_throughput_and_runtime_improvement_either_suffices():
    # Test: throughput improved, runtime not; should accept
    orig = 100_000
    opt = 100_000
    orig_through = 1000
    opt_through = 1100
    cand = OptimizedCandidateResult(best_test_runtime=opt, async_throughput=opt_through)
    codeflash_output = speedup_critic(cand, orig, None, original_async_throughput=orig_through) # 2.65μs -> 2.35μs (12.3% faster)

    # Test: runtime improved, throughput not; should accept
    orig = 100_000
    opt = 90_000
    orig_through = 1000
    opt_through = 1000
    cand = OptimizedCandidateResult(best_test_runtime=opt, async_throughput=opt_through)
    codeflash_output = speedup_critic(cand, orig, None, original_async_throughput=orig_through) # 1.63μs -> 1.38μs (18.2% faster)

def test_basic_throughput_best_so_far():
    # Test: throughput improved but not best so far
    orig = 100_000
    opt = 100_000
    orig_through = 1000
    opt_through = 1100
    prev_best_through = 1200
    cand = OptimizedCandidateResult(best_test_runtime=opt, async_throughput=opt_through)
    codeflash_output = speedup_critic(
        cand, orig, None, original_async_throughput=orig_through, best_throughput_until_now=prev_best_through
    ) # 2.90μs -> 2.50μs (16.1% faster)

def test_basic_runtime_and_throughput_best_so_far():
    # Test: runtime and throughput both improved and both best so far
    orig = 100_000
    opt = 90_000
    orig_through = 1000
    opt_through = 1100
    prev_best = 91_000
    prev_best_through = 1050
    cand = OptimizedCandidateResult(best_test_runtime=opt, async_throughput=opt_through)
    codeflash_output = speedup_critic(
        cand, orig, prev_best, original_async_throughput=orig_through, best_throughput_until_now=prev_best_through
    ) # 3.01μs -> 2.62μs (15.0% faster)

# ---- EDGE TEST CASES ----

def test_edge_runtime_exactly_at_threshold():
    # Test: improvement exactly at threshold (should be False, must be > threshold)
    orig = 100_000
    opt = int(orig / (1 + MIN_IMPROVEMENT_THRESHOLD))  # exactly 1% faster
    cand = OptimizedCandidateResult(best_test_runtime=opt)
    codeflash_output = speedup_critic(cand, orig, None) # 1.89μs -> 1.73μs (9.17% faster)

def test_edge_throughput_exactly_at_threshold():
    # Test: throughput improvement exactly at threshold (should be False)
    orig = 100_000
    opt = 100_000
    orig_through = 1000
    opt_through = int(orig_through * (1 + MIN_THROUGHPUT_IMPROVEMENT_THRESHOLD))
    cand = OptimizedCandidateResult(best_test_runtime=opt, async_throughput=opt_through)
    codeflash_output = speedup_critic(cand, orig, None, original_async_throughput=orig_through) # 2.77μs -> 2.30μs (20.0% faster)

def test_edge_runtime_below_10us_noise_floor():
    # Test: original runtime below 10us, noise floor is 3x
    orig = 9000  # 9us
    opt = int(orig / (1 + 3 * MIN_IMPROVEMENT_THRESHOLD)) - 1  # Just above noise floor
    cand = OptimizedCandidateResult(best_test_runtime=opt)
    codeflash_output = speedup_critic(cand, orig, None) # 1.91μs -> 1.81μs (5.57% faster)

    # Now just below noise floor
    opt = int(orig / (1 + 3 * MIN_IMPROVEMENT_THRESHOLD)) + 1
    cand = OptimizedCandidateResult(best_test_runtime=opt)
    codeflash_output = speedup_critic(cand, orig, None) # 872ns -> 761ns (14.6% faster)


def test_edge_runtime_zero_optimized_runtime():
    # Test: optimized_runtime_ns == 0 (should return False, as gain is 0)
    orig = 100_000
    opt = 0
    cand = OptimizedCandidateResult(best_test_runtime=opt)
    codeflash_output = speedup_critic(cand, orig, None) # 2.13μs -> 1.73μs (23.1% faster)

def test_edge_throughput_zero_original_throughput():
    # Test: original throughput is zero (should not crash, gain is 0)
    orig = 100_000
    opt = 100_000
    orig_through = 0
    opt_through = 1000
    cand = OptimizedCandidateResult(best_test_runtime=opt, async_throughput=opt_through)
    codeflash_output = speedup_critic(cand, orig, None, original_async_throughput=orig_through) # 2.77μs -> 2.50μs (10.4% faster)

def test_edge_throughput_none_values():
    # Test: async_throughput is None in candidate (should default to runtime logic)
    orig = 100_000
    opt = 90_000
    cand = OptimizedCandidateResult(best_test_runtime=opt, async_throughput=None)
    codeflash_output = speedup_critic(cand, orig, None, original_async_throughput=1000) # 2.54μs -> 2.20μs (15.0% faster)

    # Test: original_async_throughput is None (should default to runtime logic)
    cand = OptimizedCandidateResult(best_test_runtime=opt, async_throughput=1100)
    codeflash_output = speedup_critic(cand, orig, None, original_async_throughput=None) # 1.10μs -> 972ns (13.4% faster)


def test_large_scale_many_candidates_runtime(monkeypatch):
    # Test: 1000 candidates, only one is best and above threshold
    orig = 100_000
    prev_best = 90_000
    # All candidates are worse than prev_best except one
    results = [OptimizedCandidateResult(best_test_runtime=orig - i) for i in range(1000)]
    # Only candidate at index 950 is below prev_best and above threshold
    results[950] = OptimizedCandidateResult(best_test_runtime=85_000)
    count = 0
    for cand in results:
        if speedup_critic(cand, orig, prev_best):
            count += 1

def test_large_scale_throughput(monkeypatch):
    # Test: 500 candidates, only a few have throughput above threshold and best so far
    orig = 100_000
    orig_through = 1000
    prev_best_through = 1100
    results = []
    for i in range(500):
        # Most are below threshold
        results.append(OptimizedCandidateResult(best_test_runtime=orig, async_throughput=orig_through + i))
    # Only candidate at index 499 is above prev_best_through and above threshold
    results[499] = OptimizedCandidateResult(best_test_runtime=orig, async_throughput=1200)
    count = 0
    for cand in results:
        if speedup_critic(
            cand, orig, None, original_async_throughput=orig_through, best_throughput_until_now=prev_best_through
        ):
            count += 1

def test_large_scale_runtime_and_throughput_combined():
    # Test: 100 candidates, some with runtime improvement, some with throughput, some both
    orig = 100_000
    orig_through = 1000
    prev_best = 95_000
    prev_best_through = 1050
    results = []
    # 10 with runtime improvement and best so far
    for i in range(10):
        results.append(OptimizedCandidateResult(best_test_runtime=90_000 - i, async_throughput=1000))
    # 10 with throughput improvement and best so far
    for i in range(10):
        results.append(OptimizedCandidateResult(best_test_runtime=100_000, async_throughput=1100 + i))
    # 10 with both
    for i in range(10):
        results.append(OptimizedCandidateResult(best_test_runtime=90_000 - i, async_throughput=1100 + i))
    # The rest with no improvement
    for i in range(70):
        results.append(OptimizedCandidateResult(best_test_runtime=100_000, async_throughput=1000))
    count = 0
    for cand in results:
        if speedup_critic(
            cand, orig, prev_best, original_async_throughput=orig_through, best_throughput_until_now=prev_best_through
        ):
            count += 1

def test_large_scale_edge_case_all_equal():
    # Test: all candidates have identical performance, none should pass
    orig = 100_000
    prev_best = 90_000
    orig_through = 1000
    prev_best_through = 1100
    results = [OptimizedCandidateResult(best_test_runtime=100_000, async_throughput=1000) for _ in range(500)]
    for cand in results:
        codeflash_output = speedup_critic(
            cand, orig, prev_best, original_async_throughput=orig_through, best_throughput_until_now=prev_best_through
        ) # 385μs -> 321μs (19.9% faster)

def test_large_scale_edge_case_all_best():
    # Test: all candidates are best and above threshold, all should pass
    orig = 100_000
    prev_best = 110_000
    orig_through = 1000
    prev_best_through = 900
    results = [OptimizedCandidateResult(best_test_runtime=90_000, async_throughput=1200) for _ in range(500)]
    for cand in results:
        codeflash_output = speedup_critic(
            cand, orig, prev_best, original_async_throughput=orig_through, best_throughput_until_now=prev_best_through
        ) # 384μs -> 321μs (19.5% faster)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
#------------------------------------------------
from __future__ import annotations

import os
from dataclasses import dataclass
from functools import lru_cache

# imports
import pytest  # used for our unit tests
from codeflash.result.critic import speedup_critic

# --- Minimal stubs for external dependencies/constants ---
# These are required for the function to run in this test file.
MIN_IMPROVEMENT_THRESHOLD = 0.01  # 1%
MIN_THROUGHPUT_IMPROVEMENT_THRESHOLD = 0.01  # 1%

@dataclass
class OptimizedCandidateResult:
    best_test_runtime: int  # in nanoseconds
    async_throughput: int | None = None
from codeflash.result.critic import speedup_critic

# ----------- 1. BASIC TEST CASES -----------

def test_basic_significant_runtime_improvement():
    # Optimized code is 20% faster than original, above threshold
    orig = 100_000  # ns
    opt = 80_000    # ns (20% faster)
    candidate = OptimizedCandidateResult(best_test_runtime=opt)
    # No previous best
    codeflash_output = speedup_critic(candidate, orig, None) # 2.42μs -> 2.11μs (14.7% faster)

def test_basic_insufficient_runtime_improvement():
    # Only 0.5% improvement, below threshold
    orig = 100_000
    opt = 99_500  # 0.5% improvement
    candidate = OptimizedCandidateResult(best_test_runtime=opt)
    codeflash_output = speedup_critic(candidate, orig, None) # 2.08μs -> 1.94μs (7.15% faster)

def test_basic_no_improvement():
    # Optimized code is slower
    orig = 100_000
    opt = 120_000
    candidate = OptimizedCandidateResult(best_test_runtime=opt)
    codeflash_output = speedup_critic(candidate, orig, None) # 2.00μs -> 1.89μs (5.81% faster)

def test_basic_best_runtime_until_now():
    # There is a previous best, and this candidate is not better
    orig = 100_000
    opt = 80_000
    candidate = OptimizedCandidateResult(best_test_runtime=opt)
    best_so_far = 75_000  # Already have a better one
    codeflash_output = speedup_critic(candidate, orig, best_so_far) # 2.13μs -> 1.92μs (11.0% faster)

def test_basic_new_best_runtime():
    # There is a previous best, and this candidate is better
    orig = 100_000
    opt = 70_000
    candidate = OptimizedCandidateResult(best_test_runtime=opt)
    best_so_far = 75_000
    codeflash_output = speedup_critic(candidate, orig, best_so_far) # 1.94μs -> 1.93μs (0.517% faster)

def test_basic_throughput_improvement_accepts():
    # Throughput improvement is significant, runtime is not
    orig = 100_000
    opt = 99_500
    orig_thr = 1000
    opt_thr = 1100  # 10% improvement
    candidate = OptimizedCandidateResult(best_test_runtime=opt, async_throughput=opt_thr)
    codeflash_output = speedup_critic(candidate, orig, None, original_async_throughput=orig_thr) # 3.14μs -> 2.73μs (14.7% faster)

def test_basic_throughput_no_improvement_rejects():
    # Throughput improvement is below threshold, runtime is not improved
    orig = 100_000
    opt = 99_500
    orig_thr = 1000
    opt_thr = 1005  # 0.5% improvement
    candidate = OptimizedCandidateResult(best_test_runtime=opt, async_throughput=opt_thr)
    codeflash_output = speedup_critic(candidate, orig, None, original_async_throughput=orig_thr) # 2.71μs -> 2.42μs (12.0% faster)

def test_basic_throughput_and_runtime_both_improve():
    # Both throughput and runtime improve significantly
    orig = 100_000
    opt = 80_000
    orig_thr = 1000
    opt_thr = 1200
    candidate = OptimizedCandidateResult(best_test_runtime=opt, async_throughput=opt_thr)
    codeflash_output = speedup_critic(candidate, orig, None, original_async_throughput=orig_thr) # 2.74μs -> 2.32μs (17.7% faster)

def test_basic_throughput_is_best_check():
    # Throughput improvement is significant but not best so far, should reject
    orig = 100_000
    opt = 99_000
    orig_thr = 1000
    opt_thr = 1100
    best_thr = 1200  # Already have a better throughput
    candidate = OptimizedCandidateResult(best_test_runtime=opt, async_throughput=opt_thr)
    codeflash_output = speedup_critic(candidate, orig, None, original_async_throughput=orig_thr, best_throughput_until_now=best_thr) # 3.00μs -> 2.62μs (14.1% faster)

def test_basic_throughput_best_but_runtime_not_best():
    # Throughput is new best, runtime is not best but throughput is enough
    orig = 100_000
    opt = 99_000
    orig_thr = 1000
    opt_thr = 1300
    best_thr = 1200
    candidate = OptimizedCandidateResult(best_test_runtime=opt, async_throughput=opt_thr)
    codeflash_output = speedup_critic(candidate, orig, 98_000, original_async_throughput=orig_thr, best_throughput_until_now=best_thr) # 2.97μs -> 2.73μs (8.80% faster)

# ----------- 2. EDGE TEST CASES -----------

def test_edge_runtime_just_below_threshold():
    # Improvement is just below the threshold (should reject)
    orig = 100_000
    opt = int(orig / (1 + MIN_IMPROVEMENT_THRESHOLD - 0.0001))  # Just under threshold
    candidate = OptimizedCandidateResult(best_test_runtime=opt)
    codeflash_output = speedup_critic(candidate, orig, None) # 1.94μs -> 1.78μs (8.97% faster)

def test_edge_runtime_just_above_threshold():
    # Improvement is just above the threshold (should accept)
    orig = 100_000
    opt = int(orig / (1 + MIN_IMPROVEMENT_THRESHOLD + 0.0001))  # Just over threshold
    candidate = OptimizedCandidateResult(best_test_runtime=opt)
    codeflash_output = speedup_critic(candidate, orig, None) # 1.74μs -> 1.57μs (10.9% faster)

def test_edge_small_original_runtime_noise_floor(monkeypatch):
    # For original_code_runtime < 10_000, noise floor is 3x threshold
    orig = 9000
    opt = int(orig / (1 + 3 * MIN_IMPROVEMENT_THRESHOLD + 0.0001))  # Just over noise floor
    candidate = OptimizedCandidateResult(best_test_runtime=opt)
    codeflash_output = speedup_critic(candidate, orig, None) # 1.94μs -> 1.84μs (5.37% faster)

def test_edge_small_original_runtime_just_below_noise_floor(monkeypatch):
    # For original_code_runtime < 10_000, just below noise floor
    orig = 9000
    opt = int(orig / (1 + 3 * MIN_IMPROVEMENT_THRESHOLD - 0.0001))  # Just under noise floor
    candidate = OptimizedCandidateResult(best_test_runtime=opt)
    codeflash_output = speedup_critic(candidate, orig, None) # 1.78μs -> 1.77μs (0.564% faster)




def test_edge_zero_optimized_runtime():
    # Optimized runtime is zero (should not crash, should return False)
    orig = 100_000
    opt = 0
    candidate = OptimizedCandidateResult(best_test_runtime=opt)
    codeflash_output = speedup_critic(candidate, orig, None) # 2.29μs -> 1.80μs (27.2% faster)

def test_edge_zero_original_throughput():
    # Original throughput is zero, should not crash, throughput gain is 0
    orig = 100_000
    opt = 99_000
    orig_thr = 0
    opt_thr = 1000
    candidate = OptimizedCandidateResult(best_test_runtime=opt, async_throughput=opt_thr)
    codeflash_output = speedup_critic(candidate, orig, None, original_async_throughput=orig_thr) # 2.84μs -> 2.60μs (9.29% faster)

def test_edge_none_throughput_values():
    # Throughput values are None, should fallback to runtime only
    orig = 100_000
    opt = 80_000
    candidate = OptimizedCandidateResult(best_test_runtime=opt, async_throughput=None)
    codeflash_output = speedup_critic(candidate, orig, None, original_async_throughput=None) # 2.34μs -> 2.19μs (6.89% faster)

def test_edge_none_candidate_throughput():
    # Candidate throughput is None, should fallback to runtime only
    orig = 100_000
    opt = 80_000
    candidate = OptimizedCandidateResult(best_test_runtime=opt, async_throughput=None)
    codeflash_output = speedup_critic(candidate, orig, None, original_async_throughput=1000) # 2.31μs -> 2.20μs (4.99% faster)

def test_edge_none_original_throughput():
    # Original throughput is None, should fallback to runtime only
    orig = 100_000
    opt = 80_000
    candidate = OptimizedCandidateResult(best_test_runtime=opt, async_throughput=1200)
    codeflash_output = speedup_critic(candidate, orig, None, original_async_throughput=None) # 2.16μs -> 2.01μs (7.45% faster)

def test_edge_throughput_and_runtime_both_worse():
    # Both throughput and runtime are worse, must reject
    orig = 100_000
    opt = 120_000
    orig_thr = 1000
    opt_thr = 900
    candidate = OptimizedCandidateResult(best_test_runtime=opt, async_throughput=opt_thr)
    codeflash_output = speedup_critic(candidate, orig, None, original_async_throughput=orig_thr) # 2.96μs -> 2.54μs (16.1% faster)

def test_edge_throughput_better_runtime_worse():
    # Throughput is significantly better, runtime is worse, should accept
    orig = 100_000
    opt = 120_000
    orig_thr = 1000
    opt_thr = 1100
    candidate = OptimizedCandidateResult(best_test_runtime=opt, async_throughput=opt_thr)
    codeflash_output = speedup_critic(candidate, orig, None, original_async_throughput=orig_thr) # 2.79μs -> 2.42μs (14.8% faster)

def test_edge_throughput_better_but_not_best():
    # Throughput is improved but not the best so far, should reject
    orig = 100_000
    opt = 120_000
    orig_thr = 1000
    opt_thr = 1100
    best_thr = 1200
    candidate = OptimizedCandidateResult(best_test_runtime=opt, async_throughput=opt_thr)
    codeflash_output = speedup_critic(candidate, orig, None, original_async_throughput=orig_thr, best_throughput_until_now=best_thr) # 3.04μs -> 2.59μs (17.4% faster)

# ----------- 3. LARGE SCALE TEST CASES -----------

def test_large_scale_many_candidates_runtime(monkeypatch):
    # Test with a large number of candidates, only the best one should be accepted
    orig = 1_000_000
    best_so_far = 800_000
    # Try a batch of 1000 candidates with slightly worse runtimes
    for i in range(1000):
        candidate = OptimizedCandidateResult(best_test_runtime=best_so_far + i + 1)
        codeflash_output = speedup_critic(candidate, orig, best_so_far) # 510μs -> 451μs (13.0% faster)
    # Now test with a better candidate
    candidate = OptimizedCandidateResult(best_test_runtime=750_000)
    codeflash_output = speedup_critic(candidate, orig, best_so_far) # 601ns -> 501ns (20.0% faster)

def test_large_scale_throughput_candidates():
    # Test with many throughput candidates, only the best and improved one should be accepted
    orig = 1_000_000
    orig_thr = 10_000
    best_thr = 12_000
    # All these are not best so should be rejected
    for i in range(1000):
        candidate = OptimizedCandidateResult(best_test_runtime=900_000, async_throughput=best_thr - 1 - i)
        codeflash_output = speedup_critic(candidate, orig, None, original_async_throughput=orig_thr, best_throughput_until_now=best_thr) # 783μs -> 653μs (20.0% faster)
    # Now test with a new best throughput
    candidate = OptimizedCandidateResult(best_test_runtime=900_000, async_throughput=13_000)
    codeflash_output = speedup_critic(candidate, orig, None, original_async_throughput=orig_thr, best_throughput_until_now=best_thr) # 872ns -> 721ns (20.9% faster)

def test_large_scale_performance(monkeypatch):
    # Simulate a large batch of candidates, all with significant improvement
    orig = 1_000_000
    for i in range(1000):
        candidate = OptimizedCandidateResult(best_test_runtime=orig - 20_000 - i)
        codeflash_output = speedup_critic(candidate, orig, None) # 486μs -> 427μs (13.7% faster)

def test_large_scale_edge_thresholds():
    # Many candidates, some just above and some just below threshold
    orig = 1_000_000
    # Just below threshold
    opt = int(orig / (1 + MIN_IMPROVEMENT_THRESHOLD - 0.0001))
    for _ in range(500):
        candidate = OptimizedCandidateResult(best_test_runtime=opt)
        codeflash_output = speedup_critic(candidate, orig, None) # 244μs -> 215μs (13.4% faster)
    # Just above threshold
    opt = int(orig / (1 + MIN_IMPROVEMENT_THRESHOLD + 0.0001))
    for _ in range(500):
        candidate = OptimizedCandidateResult(best_test_runtime=opt)
        codeflash_output = speedup_critic(candidate, orig, None) # 241μs -> 212μs (13.7% faster)

def test_large_scale_throughput_and_runtime_mixed():
    # Many candidates, some with only throughput improvement, some with only runtime, some with both, some with neither
    orig = 1_000_000
    orig_thr = 10_000
    # Only throughput improves
    candidate = OptimizedCandidateResult(best_test_runtime=995_000, async_throughput=11_000)
    codeflash_output = speedup_critic(candidate, orig, None, original_async_throughput=orig_thr) # 3.04μs -> 2.58μs (17.4% faster)
    # Only runtime improves
    candidate = OptimizedCandidateResult(best_test_runtime=900_000, async_throughput=10_000)
    codeflash_output = speedup_critic(candidate, orig, None, original_async_throughput=orig_thr) # 1.45μs -> 1.24μs (16.8% faster)
    # Both improve
    candidate = OptimizedCandidateResult(best_test_runtime=900_000, async_throughput=11_000)
    codeflash_output = speedup_critic(candidate, orig, None, original_async_throughput=orig_thr) # 1.01μs -> 832ns (21.6% faster)
    # Neither improves
    candidate = OptimizedCandidateResult(best_test_runtime=1_010_000, async_throughput=9_000)
    codeflash_output = speedup_critic(candidate, orig, None, original_async_throughput=orig_thr) # 922ns -> 802ns (15.0% faster)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

To test or edit this optimization locally git merge codeflash/optimize-pr687-2025-09-24T19.11.26

Click to see suggested changes
Suggested change
# Runtime performance evaluation
noise_floor = 3 * MIN_IMPROVEMENT_THRESHOLD if original_code_runtime < 10000 else MIN_IMPROVEMENT_THRESHOLD
if not disable_gh_action_noise and env_utils.is_ci():
noise_floor = noise_floor * 2 # Increase the noise floor in GitHub Actions mode
perf_gain = performance_gain(
original_runtime_ns=original_code_runtime, optimized_runtime_ns=candidate_result.best_test_runtime
)
if best_runtime_until_now is None:
# collect all optimizations with this
return bool(perf_gain > noise_floor)
return bool(perf_gain > noise_floor and candidate_result.best_test_runtime < best_runtime_until_now)
runtime_improved = perf_gain > noise_floor
# Check runtime comparison with best so far
runtime_is_best = best_runtime_until_now is None or candidate_result.best_test_runtime < best_runtime_until_now
throughput_improved = True # Default to True if no throughput data
throughput_is_best = True # Default to True if no throughput data
if original_async_throughput is not None and candidate_result.async_throughput is not None:
if original_async_throughput > 0:
throughput_gain_value = throughput_gain(
original_throughput=original_async_throughput, optimized_throughput=candidate_result.async_throughput
)
throughput_improved = throughput_gain_value > MIN_THROUGHPUT_IMPROVEMENT_THRESHOLD
throughput_is_best = (
best_throughput_until_now is None or candidate_result.async_throughput > best_throughput_until_now
)
if original_async_throughput is not None and candidate_result.async_throughput is not None:
# When throughput data is available, accept if EITHER throughput OR runtime improves significantly
throughput_acceptance = throughput_improved and throughput_is_best
runtime_acceptance = runtime_improved and runtime_is_best
return throughput_acceptance or runtime_acceptance
noise_floor = 3 * MIN_IMPROVEMENT_THRESHOLD if original_code_runtime < 10000 else MIN_IMPROVEMENT_THRESHOLD
if not disable_gh_action_noise and env_utils.is_ci():
noise_floor = noise_floor * 2 # Increase the noise floor in GitHub Actions mode
perf_gain = (
(original_code_runtime - candidate_result.best_test_runtime) / candidate_result.best_test_runtime
if candidate_result.best_test_runtime != 0
else 0.0
)
runtime_improved = perf_gain > noise_floor
runtime_is_best = best_runtime_until_now is None or candidate_result.best_test_runtime < best_runtime_until_now
# Combine throughput logic for tighter critical-path performance
if original_async_throughput is not None and candidate_result.async_throughput is not None:
if original_async_throughput > 0:
throughput_gain_value = (
candidate_result.async_throughput - original_async_throughput
) / original_async_throughput
throughput_improved = throughput_gain_value > MIN_THROUGHPUT_IMPROVEMENT_THRESHOLD
else:
throughput_improved = True
throughput_is_best = (
best_throughput_until_now is None or candidate_result.async_throughput > best_throughput_until_now
)
# Accept if either throughput or runtime improvement is good and is best so far
return (throughput_improved and throughput_is_best) or (runtime_improved and runtime_is_best)
# No async throughput measured: fallback to only runtime logic

@KRRT7 KRRT7 closed this Sep 26, 2025
@KRRT7 KRRT7 deleted the granular-async-instrumentation branch September 26, 2025 00:12
@KRRT7 KRRT7 restored the granular-async-instrumentation branch September 26, 2025 00:12
@KRRT7 KRRT7 reopened this Sep 26, 2025
@KRRT7 KRRT7 requested a review from misrasaurabh1 September 26, 2025 02:08
@KRRT7 KRRT7 merged commit 1e103bd into standalone-fto-async Sep 26, 2025
17 of 21 checks passed
@KRRT7 KRRT7 deleted the granular-async-instrumentation branch September 27, 2025 00:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

workflow-modified This PR modifies GitHub Actions workflows

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants