[LSP] log the errors from the failed behavior test results (CF-757) #830

mohammedahmed18 · 2025-10-17T15:33:33Z

User description

PR Type

Enhancement

Description

Log unique pytest error lines to LSP
Add error extraction utility function
Improve baseline failure messaging
Adjust performance tests return handling

Diagram Walkthrough

flowchart LR
  A["behavioral tests run"] -- "nonzero return" --> B["extract_unique_errors()"]
  B -- "unique E-lines" --> C["lsp_log(errors)"]
  A -- "coverage check" --> D["coverage_critic()"]
  D -- "fails" --> E["Failure: coverage below threshold"]
  A -- "tests fail" --> F["Failure: tests did not pass"]

File Walkthrough

Relevant files

Enhancement

code_utils.py `Utility to extract unique pytest error lines` codeflash/code_utils/code_utils.py Add `extract_unique_errors(pytest_output)` Use regex to capture lines starting with `E` Return a set of unique trimmed error messages	+16/-0
function_optimizer.py `Log behavior test errors and refine test flow` codeflash/optimization/function_optimizer.py Import and use `extract_unique_errors` for LSP logging Log errors to `lsp_log` when behavior tests fail Clarify baseline warnings and failure messages Simplify performance run return handling signature	+12/-2

CLAassistant · 2025-10-17T15:33:40Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.

Codeflash Bot seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

github-actions · 2025-10-17T15:34:32Z

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 2 🔵🔵⚪⚪⚪
🧪 No relevant tests
🔒 No security concerns identified
⚡ Recommended focus areas for review Possible Issue The new function `extract_unique_errors` uses `re.finditer` but this file segment doesn't show an import for `re`. Ensure `import re` exists in this module or adjust imports to avoid NameError at runtime. def extract_unique_errors(pytest_output: str) -> set[str]: unique_errors = set() # Regex pattern to match error lines: # - Start with 'E' followed by optional whitespace # - Capture the actual error message pattern = r"^E\s+(.)$" for match in re.finditer(pattern, pytest_output, re.MULTILINE): error_message = match.group(1).strip() if error_message: unique_errors.add(error_message) return unique_errors Behavior Change* In baseline coverage failure handling, an early Failure is now returned if any behavioral tests failed (`did_pass_all_tests`). Confirm this ordering doesn't mask low coverage messages or alter intended baseline logic. did_pass_all_tests = all(result.did_pass for result in behavioral_results) if not did_pass_all_tests: return Failure("Tests failed to pass for the original code.") return Failure( f"Test coverage is {coverage_results.coverage}%, which is below the required threshold of {COVERAGE_THRESHOLD}%." ) LSP Logging Robustness When logging unique errors to LSP, `extract_unique_errors(run_result.stdout)` only inspects stdout. Pytest often writes errors to stderr; consider including stderr to avoid missing errors. if is_LSP_enabled(): unique_errors = extract_unique_errors(run_result.stdout) if unique_errors: lsp_log(LspCodeMessage(code="\n".join(unique_errors), file_name="errors"))

github-actions · 2025-10-17T15:34:52Z

PR Code Suggestions ✨

Explore these optional code suggestions:

Category	Suggestion	Impact
Possible issue	Restore expected return unpacking Preserve the original two-value unpacking so failures still return the sentinel. Dropping the second value can break callers expecting both and hide parsing errors. Capture and ignore the second value explicitly if unused. codeflash/optimization/function_optimizer.py [1605-1609] -benchmarking_results = self.run_and_parse_tests( +benchmarking_results, _ = self.run_and_parse_tests( testing_type=TestingMode.PERFORMANCE, test_env=test_env, test_files=self.test_files, optimization_iteration=0, ) Suggestion importance[1-10]: 7 __ Why: The PR changed a previously two-value unpack into a single variable, while `run_and_parse_tests` still returns two values in some paths. Restoring `, _` avoids tuple-to-single-value mismatch and potential runtime errors.	Medium
Possible issue	Guard against None coverage Use short-circuit logic to avoid evaluating `coverage_results.coverage` when `coverage_results` is None. Otherwise, a failed/None coverage result will raise an AttributeError. Check the object before accessing its attributes. codeflash/optimization/function_optimizer.py [1574-1580] if not coverage_critic(coverage_results, self.args.test_framework): did_pass_all_tests = all(result.did_pass for result in behavioral_results) if not did_pass_all_tests: return Failure("Tests failed to pass for the original code.") + if coverage_results is None or coverage_results.coverage is None: + return Failure("Test coverage could not be determined.") return Failure( f"Test coverage is {coverage_results.coverage}%, which is below the required threshold of {COVERAGE_THRESHOLD}%." ) Suggestion importance[1-10]: 5 __ Why: Adding a None-check for `coverage_results` could prevent an AttributeError if upstream ever returns None, but the PR context does not indicate such a case occurs. The change is safe and minor, improving robustness without addressing a confirmed bug.	Low
General	Parse both stdout and stderr Include `stderr` in error extraction to avoid missing error lines some runners emit there. Merge stdout and stderr prior to parsing to capture all relevant `E ...` lines. codeflash/optimization/function_optimizer.py [1942-1946] if is_LSP_enabled(): - unique_errors = extract_unique_errors(run_result.stdout) + combined_output = "\n".join(filter(None, [run_result.stdout, run_result.stderr])) + unique_errors = extract_unique_errors(combined_output) if unique_errors: lsp_log(LspCodeMessage(code="\n".join(unique_errors), file_name="errors")) Suggestion importance[1-10]: 6 __ Why: Including `stderr` in parsing can surface additional pytest error lines that may be emitted there, improving error visibility for LSP. It's a reasonable enhancement with moderate impact and low risk.	Low

KRRT7 · 2025-10-20T18:44:35Z

codeflash/optimization/function_optimizer.py

                        )

                try:
-                    benchmarking_results, _ = self.run_and_parse_tests(


_ is needed

…led-test-results

codeflash/code_utils/code_utils.py

Co-authored-by: codeflash-ai[bot] <148906541+codeflash-ai[bot]@users.noreply.github.com>

codeflash-ai · 2025-10-21T11:51:05Z

This PR is now faster! 🚀 mohammed ahmed accepted my code suggestion above.

KRRT7 · 2025-10-22T06:57:05Z

codeflash/optimization/function_optimizer.py

+            did_pass_all_tests = all(result.did_pass for result in behavioral_results)
+            if not did_pass_all_tests:
+                return Failure("Tests failed to pass for the original code.")


we're not quite good with our tests to where this is going to work smoothly, currently we have tests that fail frequently

so you are saying if the coverage wasn't enough, and some tests failed,
optimization didn't fail necessary because of failed tests? and what do you suggest instead ? @KRRT7

yeah that's correct, if we don't have good tests then the optimizations will fail, let's merge this in and I'll work on this

KRRT7 · 2025-10-28T22:33:49Z

codeflash/code_utils/code_utils.py

+def extract_unique_errors(pytest_output: str) -> set[str]:
+    unique_errors = set()
+
+    # Regex pattern to match error lines:
+    # - Start with 'E' followed by optional whitespace
+    # - Capture the actual error message
+    pattern = r"^E\s+(.*)$"


codeflash should've found an optimization here

Codeflash Bot added 2 commits October 17, 2025 18:30

log the test results errors to the lsp logs

77d102f

Merge branch 'main' of github.com:codeflash-ai/codeflash

f8f5590

github-actions bot added the Review effort 2/5 label Oct 17, 2025

Merge branch 'main' into lsp/failed-test-results

f983f08

KRRT7 requested changes Oct 20, 2025

View reviewed changes

codeflash/optimization/function_optimizer.py

)

try:

benchmarking_results, _ = self.run_and_parse_tests(

Copy link

Contributor

KRRT7 Oct 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_ is needed

mohammedahmed18 changed the title ~~[LSP] log the errors from the failed behavior test results~~ [LSP] log the errors from the failed behavior test results (CF-757) Oct 20, 2025

mohammedahmed18 added 2 commits October 21, 2025 13:34

Merge branch 'main' of github.com:codeflash-ai/codeflash into lsp/fai…

83f1dea

…led-test-results

fix

ff924f4

mohammedahmed18 requested a review from KRRT7 October 21, 2025 10:46

codeflash-ai bot reviewed Oct 21, 2025

View reviewed changes

codeflash/code_utils/code_utils.py Outdated Show resolved Hide resolved

Apply suggestion from @codeflash-ai[bot]

b8f3093

Co-authored-by: codeflash-ai[bot] <148906541+codeflash-ai[bot]@users.noreply.github.com>

mohammedahmed18 and others added 2 commits October 21, 2025 14:53

ignore ruff lint warning

b2bb31e

Merge branch 'main' into lsp/failed-test-results

a893d92

KRRT7 requested changes Oct 22, 2025

View reviewed changes

KRRT7 approved these changes Oct 28, 2025

View reviewed changes

Merge branch 'main' into lsp/failed-test-results

c45f7c4

KRRT7 merged commit de9837a into main Oct 28, 2025
20 of 22 checks passed

KRRT7 deleted the lsp/failed-test-results branch October 28, 2025 23:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[LSP] log the errors from the failed behavior test results (CF-757) #830

[LSP] log the errors from the failed behavior test results (CF-757) #830

Uh oh!

mohammedahmed18 commented Oct 17, 2025 •

edited by github-actions bot

Loading

Uh oh!

CLAassistant commented Oct 17, 2025

Uh oh!

github-actions bot commented Oct 17, 2025

Uh oh!

github-actions bot commented Oct 17, 2025

Uh oh!

KRRT7 Oct 20, 2025

Uh oh!

Uh oh!

codeflash-ai bot commented Oct 21, 2025

Uh oh!

KRRT7 Oct 22, 2025

Uh oh!

mohammedahmed18 Oct 22, 2025 •

edited

Loading

Uh oh!

KRRT7 Oct 28, 2025

Uh oh!

KRRT7 Oct 28, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[LSP] log the errors from the failed behavior test results (CF-757) #830

[LSP] log the errors from the failed behavior test results (CF-757) #830

Uh oh!

Conversation

mohammedahmed18 commented Oct 17, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

User description

PR Type

Description

Diagram Walkthrough

File Walkthrough

Uh oh!

CLAassistant commented Oct 17, 2025

Uh oh!

github-actions bot commented Oct 17, 2025

PR Reviewer Guide 🔍

Uh oh!

github-actions bot commented Oct 17, 2025

PR Code Suggestions ✨

Uh oh!

KRRT7 Oct 20, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

codeflash-ai bot commented Oct 21, 2025

Uh oh!

KRRT7 Oct 22, 2025

Choose a reason for hiding this comment

Uh oh!

mohammedahmed18 Oct 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

KRRT7 Oct 28, 2025

Choose a reason for hiding this comment

Uh oh!

KRRT7 Oct 28, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

mohammedahmed18 commented Oct 17, 2025 •

edited by github-actions bot

Loading

mohammedahmed18 Oct 22, 2025 •

edited

Loading