AST deduplication #615

aseembits93 · 2025-08-05T23:14:49Z

PR Type

Enhancement

Description

Use a persistent ThreadPoolExecutor for all async tasks
Deduplicate candidates by AST normalization
Queue AI refinement calls via futures and aggregate results
Log optimizations_post mapping postprocessed code

Diagram Walkthrough

flowchart LR
  A["Submit profiling & test gen futures"] --> B["Collect initial candidates"]
  B --> C["Normalize AST & dedupe candidates"]
  C --> D["Run optimized candidates"]
  D --> E["Queue refinement futures"]
  E --> F["Aggregate refinement results"]
  F --> G["Select best candidate"]
  G --> H["log_results with optimizations_post"]

File Walkthrough

Relevant files

Enhancement

aiservice.py `Add optimizations_post to log_results` codeflash/api/aiservice.py Added `optimizations_post` param to `log_results` signature Included `optimizations_post` in request payload	+3/-0
function_optimizer.py `Refactor candidate loop with async executor & AST dedup` codeflash/optimization/function_optimizer.py Introduce instance‐level ThreadPoolExecutor Deduplicate optimization candidates via AST parse Track and map shortest postprocessed code strings Change `refine_optimizations` to return Future Collect and apply async refinement before final selection Pass `optimizations_post` into `log_results` call	+299/-262
optimizer.py `Ensure executor shutdown on exit` codeflash/optimization/optimizer.py Shutdown executor in `finally` block	+1/-0

github-actions · 2025-08-05T23:15:52Z

PR Reviewer Guide 🔍

(Review updated until commit `86e1a72`)

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 4 🔵🔵🔵🔵⚪
🧪 No relevant tests
🔒 No security concerns identified
⚡ Recommended focus areas for review Executor Shutdown The persistent executor (`function_optimizer.executor`) is shut down after each optimization in the CLI loop, which can cancel or prevent pending tasks from completing. Consider managing executor lifecycle outside the per-optimization loop. function_optimizer.executor.shutdown(wait=True) Missing Future Error Handling Futures from test generation and refinement are retrieved without try/except blocks; exceptions within those tasks will propagate and potentially crash the pipeline. Wrap `future.result()` calls with appropriate error handling and logging. for future in future_tests: res = future.result() if res: ( generated_test_source, instrumented_behavior_test_source, instrumented_perf_test_source, test_behavior_path, test_perf_path, ) = res AST Normalization Robustness AST-based deduplication uses `ast.parse` and `ast.unparse`, which may not normalize all semantically equivalent code and can miss edge cases. Validate the normalization strategy on diverse code patterns and consider fallback mechanisms. normalized_code = ast.unparse(ast.parse(candidate.source_code.flat.strip())) if normalized_code in ast_code_to_id: past_opt_id = ast_code_to_id[normalized_code]["optimization_id"] # update speedup ratio, is_correct, optimizations_post, optimized_line_profiler_results, optimized_runtimes speedup_ratios[candidate.optimization_id] = speedup_ratios[past_opt_id] is_correct[candidate.optimization_id] = is_correct[past_opt_id]

github-actions · 2025-08-05T23:16:58Z

PR Code Suggestions ✨

Latest suggestions up to 86e1a72
Explore these optional code suggestions:

Category	Suggestion	Impact
Possible issue	Check metrics exist before copy Guard against missing entries when copying metrics from a past optimization to avoid KeyErrors. Check that `past_opt_id` exists in each dict before assignment and skip deduplication if any metric is missing. codeflash/optimization/function_optimizer.py [445-447] +if past_opt_id not in speedup_ratios or past_opt_id not in is_correct or past_opt_id not in optimized_runtimes: + logger.warning(f"Missing metrics for {past_opt_id}, skipping deduplication") + continue speedup_ratios[candidate.optimization_id] = speedup_ratios[past_opt_id] is_correct[candidate.optimization_id] = is_correct[past_opt_id] optimized_runtimes[candidate.optimization_id] = optimized_runtimes[past_opt_id] Suggestion importance[1-10]: 5 __ Why: Guarding against missing `past_opt_id` entries avoids `KeyError` when copying metrics, improving robustness without significantly altering functionality.	Low
Possible issue	Preserve code whitespace for AST Avoid stripping leading/trailing whitespace before parsing, as it can remove necessary indentation and potentially introduce syntax errors. Parse the raw code string to preserve its structure. codeflash/optimization/function_optimizer.py [440] -normalized_code = ast.unparse(ast.parse(candidate.source_code.flat.strip())) +normalized_code = ast.unparse(ast.parse(candidate.source_code.flat)) Suggestion importance[1-10]: 4 __ Why: Avoiding `strip()` preserves leading/trailing whitespace and prevents potential indentation errors when parsing, though AST.parse typically tolerates extra whitespace so impact is minor.	Low
General	Do not overwrite original mapping Avoid overriding the original post-processed code entry for the past optimization ID. Only set `optimizations_post` for the new candidate to preserve the initial mapping. codeflash/optimization/function_optimizer.py [453-456] optimizations_post[candidate.optimization_id] = ast_code_to_id[normalized_code]["shorter_source_code"].markdown -optimizations_post[past_opt_id] = ast_code_to_id[normalized_code]["shorter_source_code"].markdown Suggestion importance[1-10]: 5 __ Why: Removing the assignment to `optimizations_post[past_opt_id]` prevents overwriting the original mapping and preserves the initial post-processed code entry.	Low

Previous suggestions

Suggestions up to commit ecb10ab

Category	Suggestion	Impact
Possible issue	Add missing helper classes argument The call to `replace_function_and_helpers_with_optimized_code` is missing the `file_path_to_helper_classes` argument, which will raise a `TypeError`. Add the missing parameter so the helper classes can be correctly located. codeflash/optimization/function_optimizer.py [421-425] did_update = self.replace_function_and_helpers_with_optimized_code( code_context=code_context, optimized_code=candidate.source_code, original_helper_code=original_helper_code, + file_path_to_helper_classes=file_path_to_helper_classes, ) Suggestion importance[1-10]: 9 __ Why: The call to `replace_function_and_helpers_with_optimized_code` is missing the required `file_path_to_helper_classes` argument, which will raise a `TypeError` and prevent helper classes from being located correctly.	High
General	Defer shared executor shutdown Shutting down the shared executor inside the per-function loop can lead to rejected submissions on subsequent calls. Defer or remove this shutdown so the executor remains available until all tasks complete. codeflash/optimization/optimizer.py [335-336] -if function_optimizer is not None: - function_optimizer.executor.shutdown(wait=True) +# if function_optimizer is not None: +# function_optimizer.executor.shutdown(wait=True) Suggestion importance[1-10]: 8 __ Why: Calling `executor.shutdown` inside the per-function loop prematurely closes the shared ThreadPoolExecutor and leads to rejected task submissions on subsequent iterations.	Medium
General	Guard AST normalization with try/except Parsing and unparsing arbitrary `candidate.source_code` may throw syntax errors and crash the loop. Wrap the AST normalization in a `try/except` to skip invalid code strings gracefully. codeflash/optimization/function_optimizer.py [438-439] -normalized_code = ast.unparse(ast.parse(candidate.source_code.strip())) +try: + normalized_code = ast.unparse(ast.parse(candidate.source_code.strip())) +except (SyntaxError, ValueError, cst.ParserSyntaxError): + continue if normalized_code in ast_code_to_id: Suggestion importance[1-10]: 5 __ Why: Wrapping `ast.parse`/`ast.unparse` in a `try/except` prevents uncaught `SyntaxError` or `ParserSyntaxError` from crashing the candidate processing loop.	Low

github-actions · 2025-08-07T21:19:22Z

Persistent review updated to latest commit 86e1a72

aseembits93 and others added 5 commits August 4, 2025 15:45

async refinement calls for better queing

83ee9c9

quickfixes

1d5c2f9

Merge branch 'main' into opt-candidate-loop

7f3bcea

common threadpool executor

0d93128

todo cleanup

ecb10ab

github-actions bot added the Review effort 4/5 label Aug 5, 2025

aseembits93 added 6 commits August 5, 2025 16:20

precommit mypy fix

01f3f2c

cleaning up

5e64776

Merge remote-tracking branch 'origin/main' into ast-deduplication

bd320ee

almost ready for review

ff8215a

Merge branch 'main' into ast-deduplication

02ce034

line profiler only available for successful runs

86e1a72

aseembits93 marked this pull request as ready for review August 7, 2025 21:18

aseembits93 mentioned this pull request Aug 7, 2025

Non-Blocking Async refinement calls for better queing CF-666 #609

Closed

aseembits93 requested review from KRRT7 and misrasaurabh1 August 7, 2025 21:23

KRRT7 approved these changes Aug 7, 2025

View reviewed changes

aseembits93 merged commit 5823492 into main Aug 7, 2025
20 checks passed

aseembits93 deleted the ast-deduplication branch August 17, 2025 19:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

AST deduplication #615

AST deduplication #615

Uh oh!

aseembits93 commented Aug 5, 2025 •

edited by github-actions bot

Loading

Uh oh!

github-actions bot commented Aug 5, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Aug 5, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Aug 7, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

AST deduplication #615

AST deduplication #615

Uh oh!

Conversation

aseembits93 commented Aug 5, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Type

Description

Diagram Walkthrough

File Walkthrough

Uh oh!

github-actions bot commented Aug 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Reviewer Guide 🔍

(Review updated until commit 86e1a72)

Uh oh!

github-actions bot commented Aug 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Code Suggestions ✨

Previous suggestions

Uh oh!

github-actions bot commented Aug 7, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

aseembits93 commented Aug 5, 2025 •

edited by github-actions bot

Loading

github-actions bot commented Aug 5, 2025 •

edited

Loading

(Review updated until commit `86e1a72`)

github-actions bot commented Aug 5, 2025 •

edited

Loading