Get throughput from output for async functions #739

KRRT7 · 2025-09-17T21:45:39Z

PR Type

Enhancement, Tests, Bug fix

Description

Derive test context from pytest hooks
Add async throughput calculation support
Extend baseline model with throughput
Update tests for new interfaces

Diagram Walkthrough

flowchart LR
  pytestHook["pytest hooks set test context env"] -- "env vars" --> extractor["extract_test_context_from_frame uses env"]
  runParse["run_and_parse_tests returns results thrice"] -- "behavior/perf results" --> calcTP["calculate_async_throughput aggregates stdout"]
  calcTP -- "function throughput" --> baseline["OriginalCodeBaseline.async_throughput"]

File Walkthrough

Relevant files

Enhancement

6 files

codeflash_wrap_decorator.py `Replace stack introspection with env-based test context`	+12/-174
models.py `Add async throughput to baseline model`	+1/-0
function_optimizer.py `Plumb results for throughput; compute and store async throughput`	+44/-10
parse_test_output.py `Add helpers to compute throughput from stdout`	+63/-0
pytest_plugin.py `Set/clear test context env vars per test`	+22/-0
test_runner.py `Log executed pytest command for visibility`	+1/-1

Tests

8 files

test_async_run_and_parse_tests.py `Adapt to new API and class-name expectations`	+10/-10
test_async_wrapper_sqlite_validation.py `Provide env context; fix sync context extraction test`	+5/-2
test_codeflash_capture.py `Update for triple-return API and expectations`	+12/-12
test_extract_test_context_from_frame.py `Remove obsolete stack-based context tests`	+0/-217
test_instrument_all_and_run.py `Adjust tests to new run_and_parse_tests signature`	+5/-5
test_instrument_tests.py `Update behavioral/perf/line profile calls to new API`	+22/-22
test_instrumentation_run_results_aiservice.py `Align to new return tuple from test runner`	+5/-5
test_pickle_patcher.py `Adapt pickle patch tests to new API`	+4/-4

Configuration changes

1 files

codeflash.code-workspace `Update launch args to async concurrency example`	+5/-1

This reverts commit 96211b8.

github-actions · 2025-09-17T21:46:51Z

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 3 🔵🔵🔵⚪⚪
🧪 PR contains tests
🔒 No security concerns identified
⚡ Recommended focus areas for review Regex Robustness The new throughput parsers rely on strict colon-delimited patterns without escaping, which may fail if module/function names contain colons or unexpected characters, or if stdout interleaves tags. Consider anchoring, stricter grouping, and normalization to avoid false positives/negatives. start_pattern = re.compile(r"!\$######([^:]):([^:]):([^:]):([^:]):([^:]+)######\$!") end_pattern = re.compile(r"!######([^:]):([^:]):([^:]):([^:]):([^:]+)######!") def calculate_function_throughput_from_stdout(stdout: str, function_name: str) -> int: """A completed execution is defined as having both a start tag and matching end tag: Start: !$######test_module:test_function:function_name:loop_index:iteration_id######$! End: !######test_module:test_function:function_name:loop_index:iteration_id######! """ start_matches = start_pattern.findall(stdout) end_matches = end_pattern.findall(stdout) end_matches_set = set(end_matches) # Count completed executions for the specific function only function_throughput = 0 for start_match in start_matches: # Check if this execution is for the function we're interested in and has a matching end tag # function_name is at index 2 in the match tuple if start_match in end_matches_set and len(start_match) > 2 and start_match[2] == function_name: function_throughput += 1 return function_throughput Excessive Logging Logging the full pytest command at info level on every run could be noisy and leak paths; consider using debug level or redacting sensitive parts. blocklist_args = [f"-p no:{plugin}" for plugin in BEHAVIORAL_BLOCKLISTED_PLUGINS if plugin != "cov"] logger.info(f"{' '.join(coverage_cmd + common_pytest_args + blocklist_args + result_args + test_files)}") results = execute_test_subprocess( Env Var Dependence extract_test_context_from_frame now hard-requires environment variables and raises if missing; ensure all runners (including unittest and non-pytest paths) set these vars or add a safer fallback to prevent crashes. def extract_test_context_from_frame() -> tuple[str, str \| None, str]: # test_module = os.environ.get("CODEFLASH_TEST_MODULE") test_module = os.environ["CODEFLASH_TEST_MODULE"] test_class = os.environ.get("CODEFLASH_TEST_CLASS", None) # test_function = os.environ.get("CODEFLASH_TEST_FUNCTION") test_function = os.environ["CODEFLASH_TEST_FUNCTION"] if test_module and test_function: return (test_module, test_class if test_class else None, test_function) raise RuntimeError( "Test context environment variables not set - ensure tests are run through codeflash test runner" )

github-actions · 2025-09-17T21:47:37Z

PR Code Suggestions ✨

Explore these optional code suggestions:

Category	Suggestion	Impact
Possible issue	Normalize end tags before matching The start/end tag formats elsewhere include optional extra timing appended to the end tag, making tuple equality fail and undercounting throughput. Normalize end matches (strip trailing timing after the last colon) before comparing to start tuples. This aligns with the normalization used above and ensures completed executions are counted correctly. codeflash/verification/parse_test_output.py [81-103] start_pattern = re.compile(r"!\$######([^:]):([^:]):([^:]):([^:]):([^:]+)######\$!") end_pattern = re.compile(r"!######([^:]):([^:]):([^:]):([^:]):([^:]+)######!") def calculate_function_throughput_from_stdout(stdout: str, function_name: str) -> int: """A completed execution is defined as having both a start tag and matching end tag: Start: !$######test_module:test_function:function_name:loop_index:iteration_id######$! - End: !######test_module:test_function:function_name:loop_index:iteration_id######! + End: !######test_module:test_function:function_name:loop_index:iteration_id[:timing]######! """ start_matches = start_pattern.findall(stdout) - end_matches = end_pattern.findall(stdout) - end_matches_set = set(end_matches) + raw_end_matches = end_pattern.findall(stdout) - # Count completed executions for the specific function only + # Normalize end matches by stripping optional trailing ':timing' from the last group + normalized_end_matches = [] + for m in raw_end_matches: + if m and len(m) == 5: + last = m[4] + parts = last.split(":") + if len(parts) > 1: + # keep everything except the last timing segment + last = ":".join(parts[:-1]) + normalized_end_matches.append((m[0], m[1], m[2], m[3], last)) + else: + normalized_end_matches.append(m) + + end_matches_set = set(normalized_end_matches) + function_throughput = 0 - for start_match in start_matches: - # Check if this execution is for the function we're interested in and has a matching end tag - # function_name is at index 2 in the match tuple - if start_match in end_matches_set and len(start_match) > 2 and start_match[2] == function_name: + if len(start_match) > 2 and start_match[2] == function_name and start_match in end_matches_set: function_throughput += 1 return function_throughput Suggestion importance[1-10]: 8 __ Why: This addresses a subtle correctness bug: end tags may include timing info causing tuple mismatch; normalization aligns with earlier logic and prevents undercounting. High impact for accurate throughput metrics and consistent with PR context.	Medium
Possible issue	Prevent KeyError on env access Using direct indexing on `os.environ` will raise `KeyError` in environments outside pytest or if hooks fail, causing hard crashes. Safely read the variables with `.get()` and raise the existing `RuntimeError` only when required fields are missing. This keeps previous behavior while avoiding unexpected `KeyError`. codeflash/code_utils/codeflash_wrap_decorator.py [33-45] def extract_test_context_from_frame() -> tuple[str, str \| None, str]: - # test_module = os.environ.get("CODEFLASH_TEST_MODULE") - test_module = os.environ["CODEFLASH_TEST_MODULE"] + test_module = os.environ.get("CODEFLASH_TEST_MODULE") test_class = os.environ.get("CODEFLASH_TEST_CLASS", None) - # test_function = os.environ.get("CODEFLASH_TEST_FUNCTION") - test_function = os.environ["CODEFLASH_TEST_FUNCTION"] + test_function = os.environ.get("CODEFLASH_TEST_FUNCTION") if test_module and test_function: return (test_module, test_class if test_class else None, test_function) raise RuntimeError( "Test context environment variables not set - ensure tests are run through codeflash test runner" ) Suggestion importance[1-10]: 7 __ Why: The suggestion correctly identifies that direct os.environ indexing can raise KeyError and proposes a safe .get() fallback without changing semantics. It's contextually accurate and reduces unexpected crashes, though not critical.	Medium
General	Avoid blank class env leakage Writing empty string for `CODEFLASH_TEST_CLASS` makes downstream consumers distinguish "no class" vs "unset". Use `pop(..., None)` before setup to avoid stale values, and only set `CODEFLASH_TEST_CLASS` when a class exists. This prevents confusing blank class names and ensures clean state per test. codeflash/verification/pytest_plugin.py [454-474] @pytest.hookimpl(tryfirst=True) def pytest_runtest_setup(self, item: pytest.Item) -> None: + # Clear any previous test context to avoid leakage between tests + for var in ("CODEFLASH_TEST_MODULE", "CODEFLASH_TEST_CLASS", "CODEFLASH_TEST_FUNCTION"): + os.environ.pop(var, None) + test_module_name = item.module.__name__ if item.module else "unknown_module" - test_class_name = None - if item.cls: - test_class_name = item.cls.__name__ + test_class_name = item.cls.__name__ if item.cls else None test_function_name = item.name if "[" in test_function_name: test_function_name = test_function_name.split("[", 1)[0] os.environ["CODEFLASH_TEST_MODULE"] = test_module_name - os.environ["CODEFLASH_TEST_CLASS"] = test_class_name or "" + if test_class_name: + os.environ["CODEFLASH_TEST_CLASS"] = test_class_name os.environ["CODEFLASH_TEST_FUNCTION"] = test_function_name @pytest.hookimpl(trylast=True) def pytest_runtest_teardown(self, item: pytest.Item) -> None: """Clean up test context environment variables after each test.""" for var in ["CODEFLASH_TEST_MODULE", "CODEFLASH_TEST_CLASS", "CODEFLASH_TEST_FUNCTION"]: os.environ.pop(var, None) Suggestion importance[1-10]: 6 __ Why: Clearing env vars before setup and omitting empty class values improves robustness and state isolation. It's a reasonable enhancement with moderate impact; existing code already cleans up in teardown so the issue is not severe.	Low

The optimization achieves a 25% speedup by **eliminating redundant AST node creation** inside the loop. **Key change:** The `timeout_decorator` AST node is now created once before the loop instead of being recreated for every test method that needs it. In the original code, this AST structure was built 3,411 times during profiling, consuming significant time in object allocation and initialization. **Why this works:** AST nodes are immutable once created, so the same `timeout_decorator` instance can be safely appended to multiple method decorator lists. This eliminates: - Repeated `ast.Call()` constructor calls - Redundant `ast.Name()` and `ast.Constant()` object creation - Multiple attribute assignments for the same decorator structure **Performance characteristics:** The optimization is most effective for large test classes with many test methods (showing 24-33% improvements in tests with 500+ methods), while having minimal impact on classes with few or no test methods. This makes it particularly valuable for comprehensive test suites where classes commonly contain dozens of test methods. The line profiler shows the AST node creation operations dropped from ~3,400 hits to just ~25 hits, directly correlating with the observed speedup.

codeflash-ai · 2025-09-22T19:41:40Z

⚡️ Codeflash found optimizations for this PR

📄 26% (0.26x) speedup for `AsyncCallInstrumenter.visit_ClassDef` in `codeflash/code_utils/instrument_existing_tests.py`

⏱️ Runtime : 17.4 milliseconds → 13.9 milliseconds (best of 141 runs)

I created a new dependent PR with the suggested changes. Please review:

⚡️ Speed up method AsyncCallInstrumenter.visit_ClassDef by 26% in PR #739 (get-throughput-from-output) #744

If you approve, it will be merged into this PR (branch get-throughput-from-output).

codeflash/code_utils/instrument_existing_tests.py

codeflash-ai · 2025-09-22T20:20:23Z

⚡️ Codeflash found optimizations for this PR

📄 69% (0.69x) speedup for `AsyncCallInstrumenter._call_in_positions` in `codeflash/code_utils/instrument_existing_tests.py`

⏱️ Runtime : 313 microseconds → 185 microseconds (best of 137 runs)

I created a new dependent PR with the suggested changes. Please review:

⚡️ Speed up method AsyncCallInstrumenter._call_in_positions by 69% in PR #739 (get-throughput-from-output) #745

If you approve, it will be merged into this PR (branch get-throughput-from-output).

…25-09-22T19.41.32 ⚡️ Speed up method `AsyncCallInstrumenter.visit_ClassDef` by 26% in PR #739 (`get-throughput-from-output`)

codeflash-ai · 2025-09-22T20:57:18Z

This PR is now faster! 🚀 @KRRT7 accepted my optimizations from:

⚡️ Speed up method AsyncCallInstrumenter.visit_ClassDef by 26% in PR #739 (get-throughput-from-output) #744

codeflash-ai · 2025-09-23T04:24:49Z

⚡️ Codeflash found optimizations for this PR

📄 50% (0.50x) speedup for `AsyncCallInstrumenter.visit_AsyncFunctionDef` in `codeflash/code_utils/instrument_existing_tests.py`

⏱️ Runtime : 6.85 milliseconds → 4.56 milliseconds (best of 103 runs)

I created a new dependent PR with the suggested changes. Please review:

⚡️ Speed up method AsyncCallInstrumenter.visit_AsyncFunctionDef by 50% in PR #739 (get-throughput-from-output) #749

If you approve, it will be merged into this PR (branch get-throughput-from-output).

add End to end test for async optimization

misrasaurabh1 · 2025-09-23T19:57:24Z

.github/workflows/pre-commit.yaml

@@ -1,19 +0,0 @@
-name: Lint


why are we removing this?

I'll restore it later. it's behaving differently locally vs CI

misrasaurabh1 · 2025-09-23T20:12:46Z

codeflash/code_utils/codeflash_wrap_decorator.py

+        line_id = os.environ["CODEFLASH_CURRENT_LINE_ID"]
        loop_index = int(os.environ["CODEFLASH_LOOP_INDEX"])
-        test_module_name, test_class_name, test_name = extract_test_context_from_frame()
+        test_module_name, test_class_name, test_name = extract_test_context_from_env()


in a line below why do we have a default of 0 - iteration = os.environ.get("CODEFLASH_TEST_ITERATION", "0")

misrasaurabh1 · 2025-09-23T20:20:45Z

codeflash/verification/test_runner.py


            blocklist_args = [f"-p no:{plugin}" for plugin in BEHAVIORAL_BLOCKLISTED_PLUGINS if plugin != "cov"]
-
+            logger.info(f"{' '.join(coverage_cmd + common_pytest_args + blocklist_args + result_args + test_files)}")


info level looks too high to me for this log

misrasaurabh1 · 2025-09-23T20:25:08Z

codeflash/result/critic.py

+        # When throughput data is available, accept if EITHER throughput OR runtime improves significantly
+        throughput_acceptance = throughput_improved and throughput_is_best
+        runtime_acceptance = runtime_improved and runtime_is_best
+        return throughput_acceptance or runtime_acceptance


i think this might have cases where one improves and the other degrades. This might require more thought in the future

yes, currently, runtime "degrades" codeflash-ai/optimize-me#121 but throughput improves, this still needs work.

misrasaurabh1

approving but leaving a few comments

KRRT7 added 12 commits September 12, 2025 17:37

Update codeflash.code-workspace

ab500c1

Update parse_test_output.py

f98dae6

add tests_project_root to the wrappers

96211b8

Update codeflash.code-workspace

f872d5d

Revert "add tests_project_root to the wrappers"

ee1d386

This reverts commit 96211b8.

Update test_runner.py

b858881

Delete test_extract_test_context_from_frame.py

a42c63a

better impl

fadbdf7

Update parse_test_output.py

1034ee8

calculate throughput

4ae0f93

add unit tests

fb66ff5

fix wrapper validation test

2190963

github-actions bot added the Review effort 3/5 label Sep 17, 2025

KRRT7 added 15 commits September 17, 2025 15:32

fix linting

2a3c975

cleanup

9c53948

Update function_optimizer.py

e5c4562

optimized candidate result oo

04d31b5

critic

8e516fc

update tests again

d7f9bbd

Update codeflash_wrap_decorator.py

58e5592

Update test_async_wrapper_sqlite_validation.py

a7a06ea

check.

d1b40d0

go

a947ea0

throughput

2e0f38f

perfstdout

d3eefca

calculate throughtput for baseline

039c5ba

Update function_optimizer.py

aba3dcb

Update critic.py

d3afd8a

KRRT7 and others added 4 commits September 22, 2025 12:21

update tests

00a8981

pre-commit fixes

1e4b3ba

unbound local

654055d

codeflash-ai bot mentioned this pull request Sep 22, 2025

⚡️ Speed up method AsyncCallInstrumenter.visit_ClassDef by 26% in PR #739 (get-throughput-from-output) #744

Merged

codeflash-ai bot reviewed Sep 22, 2025

View reviewed changes

codeflash/code_utils/instrument_existing_tests.py Show resolved Hide resolved

codeflash-ai bot mentioned this pull request Sep 22, 2025

⚡️ Speed up method AsyncCallInstrumenter._call_in_positions by 69% in PR #739 (get-throughput-from-output) #745

Closed

Merge pull request #744 from codeflash-ai/codeflash/optimize-pr739-20…

7aa30a8

…25-09-22T19.41.32 ⚡️ Speed up method `AsyncCallInstrumenter.visit_ClassDef` by 26% in PR #739 (`get-throughput-from-output`)

KRRT7 added 2 commits September 22, 2025 14:18

Update .pre-commit-config.yaml

fecdc94

Update pre-commit.yaml

eccdeff

github-actions bot added the workflow-modified This PR modifies GitHub Actions workflows label Sep 22, 2025

KRRT7 added 4 commits September 22, 2025 14:21

Update .pre-commit-config.yaml

8b1b2be

Delete pre-commit.yaml

515b976

give throughput context to explanations

30c8988

logger.debug

d8849a5

codeflash-ai bot mentioned this pull request Sep 23, 2025

⚡️ Speed up method AsyncCallInstrumenter.visit_AsyncFunctionDef by 50% in PR #739 (get-throughput-from-output) #749

Closed

KRRT7 and others added 3 commits September 23, 2025 00:38

add end to end test

47384e2

Create e2e-async.yaml

43615bd

Merge pull request #752 from codeflash-ai/end-to-end-test

9aa34d9

add End to end test for async optimization

KRRT7 requested a review from misrasaurabh1 September 23, 2025 07:58

misrasaurabh1 reviewed Sep 23, 2025

View reviewed changes

misrasaurabh1 approved these changes Sep 23, 2025

View reviewed changes

KRRT7 merged commit b221be4 into granular-async-instrumentation Sep 24, 2025
19 of 21 checks passed


		blocklist_args = [f"-p no:{plugin}" for plugin in BEHAVIORAL_BLOCKLISTED_PLUGINS if plugin != "cov"]

		logger.info(f"{' '.join(coverage_cmd + common_pytest_args + blocklist_args + result_args + test_files)}")

Get throughput from output for async functions #739

Get throughput from output for async functions #739

Uh oh!

Conversation

KRRT7 commented Sep 17, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Type

Description

Diagram Walkthrough

File Walkthrough

Uh oh!

github-actions bot commented Sep 17, 2025

PR Reviewer Guide 🔍

Uh oh!

github-actions bot commented Sep 17, 2025

PR Code Suggestions ✨

Uh oh!

codeflash-ai bot commented Sep 22, 2025

⚡️ Codeflash found optimizations for this PR

📄 26% (0.26x) speedup for AsyncCallInstrumenter.visit_ClassDef in codeflash/code_utils/instrument_existing_tests.py

I created a new dependent PR with the suggested changes. Please review:

Uh oh!

Uh oh!

codeflash-ai bot commented Sep 22, 2025

⚡️ Codeflash found optimizations for this PR

📄 69% (0.69x) speedup for AsyncCallInstrumenter._call_in_positions in codeflash/code_utils/instrument_existing_tests.py

I created a new dependent PR with the suggested changes. Please review:

Uh oh!

codeflash-ai bot commented Sep 22, 2025

Uh oh!

codeflash-ai bot commented Sep 23, 2025

⚡️ Codeflash found optimizations for this PR

📄 50% (0.50x) speedup for AsyncCallInstrumenter.visit_AsyncFunctionDef in codeflash/code_utils/instrument_existing_tests.py

I created a new dependent PR with the suggested changes. Please review:

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

misrasaurabh1 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

KRRT7 commented Sep 17, 2025 •

edited by github-actions bot

Loading

📄 26% (0.26x) speedup for `AsyncCallInstrumenter.visit_ClassDef` in `codeflash/code_utils/instrument_existing_tests.py`

📄 69% (0.69x) speedup for `AsyncCallInstrumenter._call_in_positions` in `codeflash/code_utils/instrument_existing_tests.py`

📄 50% (0.50x) speedup for `AsyncCallInstrumenter.visit_AsyncFunctionDef` in `codeflash/code_utils/instrument_existing_tests.py`