Skip to content

Conversation

@aseembits93
Copy link
Contributor

@aseembits93 aseembits93 commented Aug 5, 2025

PR Type

Enhancement


Description

  • Use a persistent ThreadPoolExecutor for all async tasks

  • Deduplicate candidates by AST normalization

  • Queue AI refinement calls via futures and aggregate results

  • Log optimizations_post mapping postprocessed code


Diagram Walkthrough

flowchart LR
  A["Submit profiling & test gen futures"] --> B["Collect initial candidates"]
  B --> C["Normalize AST & dedupe candidates"]
  C --> D["Run optimized candidates"]
  D --> E["Queue refinement futures"]
  E --> F["Aggregate refinement results"]
  F --> G["Select best candidate"]
  G --> H["log_results with optimizations_post"]
Loading

File Walkthrough

Relevant files
Enhancement
aiservice.py
Add optimizations_post to log_results                                       

codeflash/api/aiservice.py

  • Added optimizations_post param to log_results signature
  • Included optimizations_post in request payload
+3/-0     
function_optimizer.py
Refactor candidate loop with async executor & AST dedup   

codeflash/optimization/function_optimizer.py

  • Introduce instance‐level ThreadPoolExecutor
  • Deduplicate optimization candidates via AST parse
  • Track and map shortest postprocessed code strings
  • Change refine_optimizations to return Future
  • Collect and apply async refinement before final selection
  • Pass optimizations_post into log_results call
+299/-262
optimizer.py
Ensure executor shutdown on exit                                                 

codeflash/optimization/optimizer.py

  • Shutdown executor in finally block
+1/-0     

@github-actions
Copy link

github-actions bot commented Aug 5, 2025

PR Reviewer Guide 🔍

(Review updated until commit 86e1a72)

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 4 🔵🔵🔵🔵⚪
🧪 No relevant tests
🔒 No security concerns identified
⚡ Recommended focus areas for review

Executor Shutdown

The persistent executor (function_optimizer.executor) is shut down after each optimization in the CLI loop, which can cancel or prevent pending tasks from completing. Consider managing executor lifecycle outside the per-optimization loop.

function_optimizer.executor.shutdown(wait=True)
Missing Future Error Handling

Futures from test generation and refinement are retrieved without try/except blocks; exceptions within those tasks will propagate and potentially crash the pipeline. Wrap future.result() calls with appropriate error handling and logging.

for future in future_tests:
    res = future.result()
    if res:
        (
            generated_test_source,
            instrumented_behavior_test_source,
            instrumented_perf_test_source,
            test_behavior_path,
            test_perf_path,
        ) = res
AST Normalization Robustness

AST-based deduplication uses ast.parse and ast.unparse, which may not normalize all semantically equivalent code and can miss edge cases. Validate the normalization strategy on diverse code patterns and consider fallback mechanisms.

normalized_code = ast.unparse(ast.parse(candidate.source_code.flat.strip()))
if normalized_code in ast_code_to_id:
    past_opt_id = ast_code_to_id[normalized_code]["optimization_id"]
    # update speedup ratio, is_correct, optimizations_post, optimized_line_profiler_results, optimized_runtimes
    speedup_ratios[candidate.optimization_id] = speedup_ratios[past_opt_id]
    is_correct[candidate.optimization_id] = is_correct[past_opt_id]

@github-actions
Copy link

github-actions bot commented Aug 5, 2025

PR Code Suggestions ✨

Latest suggestions up to 86e1a72
Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
Possible issue
Check metrics exist before copy

Guard against missing entries when copying metrics from a past optimization to avoid
KeyErrors. Check that past_opt_id exists in each dict before assignment and skip
deduplication if any metric is missing.

codeflash/optimization/function_optimizer.py [445-447]

+if past_opt_id not in speedup_ratios or past_opt_id not in is_correct or past_opt_id not in optimized_runtimes:
+    logger.warning(f"Missing metrics for {past_opt_id}, skipping deduplication")
+    continue
 speedup_ratios[candidate.optimization_id] = speedup_ratios[past_opt_id]
 is_correct[candidate.optimization_id] = is_correct[past_opt_id]
 optimized_runtimes[candidate.optimization_id] = optimized_runtimes[past_opt_id]
Suggestion importance[1-10]: 5

__

Why: Guarding against missing past_opt_id entries avoids KeyError when copying metrics, improving robustness without significantly altering functionality.

Low
Preserve code whitespace for AST

Avoid stripping leading/trailing whitespace before parsing, as it can remove
necessary indentation and potentially introduce syntax errors. Parse the raw code
string to preserve its structure.

codeflash/optimization/function_optimizer.py [440]

-normalized_code = ast.unparse(ast.parse(candidate.source_code.flat.strip()))
+normalized_code = ast.unparse(ast.parse(candidate.source_code.flat))
Suggestion importance[1-10]: 4

__

Why: Avoiding strip() preserves leading/trailing whitespace and prevents potential indentation errors when parsing, though AST.parse typically tolerates extra whitespace so impact is minor.

Low
General
Do not overwrite original mapping

Avoid overriding the original post-processed code entry for the past optimization
ID. Only set optimizations_post for the new candidate to preserve the initial
mapping.

codeflash/optimization/function_optimizer.py [453-456]

 optimizations_post[candidate.optimization_id] = ast_code_to_id[normalized_code]["shorter_source_code"].markdown
-optimizations_post[past_opt_id] = ast_code_to_id[normalized_code]["shorter_source_code"].markdown
Suggestion importance[1-10]: 5

__

Why: Removing the assignment to optimizations_post[past_opt_id] prevents overwriting the original mapping and preserves the initial post-processed code entry.

Low

Previous suggestions

Suggestions up to commit ecb10ab
CategorySuggestion                                                                                                                                    Impact
Possible issue
Add missing helper classes argument

The call to replace_function_and_helpers_with_optimized_code is missing the
file_path_to_helper_classes argument, which will raise a TypeError. Add the missing
parameter so the helper classes can be correctly located.

codeflash/optimization/function_optimizer.py [421-425]

 did_update = self.replace_function_and_helpers_with_optimized_code(
     code_context=code_context,
     optimized_code=candidate.source_code,
     original_helper_code=original_helper_code,
+    file_path_to_helper_classes=file_path_to_helper_classes,
 )
Suggestion importance[1-10]: 9

__

Why: The call to replace_function_and_helpers_with_optimized_code is missing the required file_path_to_helper_classes argument, which will raise a TypeError and prevent helper classes from being located correctly.

High
General
Defer shared executor shutdown

Shutting down the shared executor inside the per-function loop can lead to rejected
submissions on subsequent calls. Defer or remove this shutdown so the executor
remains available until all tasks complete.

codeflash/optimization/optimizer.py [335-336]

-if function_optimizer is not None:
-    function_optimizer.executor.shutdown(wait=True)
+# if function_optimizer is not None:
+#     function_optimizer.executor.shutdown(wait=True)
Suggestion importance[1-10]: 8

__

Why: Calling executor.shutdown inside the per-function loop prematurely closes the shared ThreadPoolExecutor and leads to rejected task submissions on subsequent iterations.

Medium
Guard AST normalization with try/except

Parsing and unparsing arbitrary candidate.source_code may throw syntax errors and
crash the loop. Wrap the AST normalization in a try/except to skip invalid code
strings gracefully.

codeflash/optimization/function_optimizer.py [438-439]

-normalized_code = ast.unparse(ast.parse(candidate.source_code.strip()))
+try:
+    normalized_code = ast.unparse(ast.parse(candidate.source_code.strip()))
+except (SyntaxError, ValueError, cst.ParserSyntaxError):
+    continue
 if normalized_code in ast_code_to_id:
Suggestion importance[1-10]: 5

__

Why: Wrapping ast.parse/ast.unparse in a try/except prevents uncaught SyntaxError or ParserSyntaxError from crashing the candidate processing loop.

Low

@aseembits93 aseembits93 marked this pull request as ready for review August 7, 2025 21:18
@github-actions
Copy link

github-actions bot commented Aug 7, 2025

Persistent review updated to latest commit 86e1a72

@aseembits93 aseembits93 merged commit 5823492 into main Aug 7, 2025
20 checks passed
@aseembits93 aseembits93 deleted the ast-deduplication branch August 17, 2025 19:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants