Ranking different Optimization candidates based on their speedup and diff #717

aseembits93 · 2025-09-05T02:21:19Z

PR Type

Enhancement

Description

Add AI-powered candidate ranking endpoint
Compute unified diffs for ranking
Improve refinement logging conditionally
Fallback to heuristic ranking on failure

Diagram Walkthrough

flowchart LR
  A["Collect valid candidates"] --> B["Compute diff strings & speedups"]
  B --> C["Call AI /rank with diffs/speedups"]
  C -- "success" --> D["Use returned ranking"]
  C -- "failure/None" --> E["Fallback heuristic ranking"]
  D --> F["Select best candidate"]
  E --> F

File Walkthrough

Relevant files

Enhancement

aiservice.py `Add AI service ranking request handler` codeflash/api/aiservice.py Add generate_ranking API method. Build payload with diffs, speedups, ids. Handle request/response, errors, timeouts. Log telemetry and return ranking.	+53/-0
code_utils.py `Introduce unified diff string utility` codeflash/code_utils/code_utils.py Add unified_diff_strings helper. Return unified diff between two strings.	+17/-0
function_optimizer.py `Integrate AI ranking into candidate selection` codeflash/optimization/function_optimizer.py Use unified diffs, speedups, ids for ranking. Call AI ranking asynchronously with fallback. Adjust index from 1-based to 0-based result. Log refinement additions only when non-empty.	+36/-8

github-actions · 2025-09-05T02:22:19Z

PR Reviewer Guide 🔍

(Review updated until commit `95114bd`)

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 3 🔵🔵🔵⚪⚪
🧪 No relevant tests
🔒 No security concerns identified
⚡ Recommended focus areas for review Index Handling The AI ranking appears to be 1-based and is converted to 0-based via `[x - 1 for x in ranking]`, but there’s no validation that the returned indices map correctly to the local candidate list length/order. If the service returns unexpected values (e.g., out-of-range or empty), this could raise or pick the wrong candidate. if ranking: ranking = [x - 1 for x in ranking] min_key = ranking[0] else: diff_lens_ranking = create_rank_dictionary_compact(diff_lens_list) runtimes_ranking = create_rank_dictionary_compact(runtimes_list) # TODO: better way to resolve conflicts with same min ranking overall_ranking = {key: diff_lens_ranking[key] + runtimes_ranking[key] for key in diff_lens_ranking.keys()} # noqa: SIM118 min_key = min(overall_ranking, key=overall_ranking.get) Blocking Wait The code always waits on the ranking future with `concurrent.futures.wait([future_ranking])` and then `.result()` without a timeout or fallback at the wait level; consider using a timeout or `as_completed` to avoid potential indefinite blocking if the request hangs despite internal timeout. concurrent.futures.wait([future_ranking]) ranking = future_ranking.result() if ranking: ranking = [x - 1 for x in ranking] min_key = ranking[0] Diff Formatting `unified_diff_strings` uses `lineterm=""`, which removes newlines between diff lines; some consumers expect newline-terminated hunks. Verify downstream parsing and consider defaulting to newline-terminated lines for compatibility. diff = difflib.unified_diff(code1_lines, code2_lines, fromfile=fromfile, tofile=tofile, lineterm="") return "".join(diff)

github-actions · 2025-09-05T02:22:50Z

PR Code Suggestions ✨

Latest suggestions up to 95114bd
Explore these optional code suggestions:

Category	Suggestion	Impact
Possible issue	Validate external ranking indices Validate the AI service ranking indices to prevent out-of-range or negative values after converting to 0-based. Fallback to the local heuristic if the ranking is empty, mismatched in length, contains duplicates, or any index is invalid. codeflash/optimization/function_optimizer.py [711-719] ranking = future_ranking.result() -if ranking: - ranking = [x - 1 for x in ranking] - min_key = ranking[0] +use_ai_ranking = ( + isinstance(ranking, list) + and len(ranking) == len(valid_candidates_with_shorter_code) + and all(isinstance(x, int) for x in ranking) +) +if use_ai_ranking: + zero_based = [x - 1 for x in ranking] + valid_indices = set(range(len(valid_candidates_with_shorter_code))) + use_ai_ranking = ( + len(set(zero_based)) == len(zero_based) + and all(x in valid_indices for x in zero_based) + ) +if use_ai_ranking: + min_key = zero_based[0] else: diff_lens_ranking = create_rank_dictionary_compact(diff_lens_list) runtimes_ranking = create_rank_dictionary_compact(runtimes_list) - # TODO: better way to resolve conflicts with same min ranking overall_ranking = {key: diff_lens_ranking[key] + runtimes_ranking[key] for key in diff_lens_ranking.keys()} # noqa: SIM118 min_key = min(overall_ranking, key=overall_ranking.get) Suggestion importance[1-10]: 8 __ Why: Adds robust validation for AI-provided indices, preventing crashes or misselection if the service returns invalid data; aligns with the new ranking flow and cleanly falls back to local heuristics.	Medium
Possible issue	Validate AI ranking payload Validate the response payload to ensure `ranking` is a list of integers before returning it. Log and return None when the structure is invalid to prevent downstream crashes. codeflash/api/aiservice.py [396-399] if response.status_code == 200: - ranking: list[int] = response.json()["ranking"] + data = response.json() + ranking = data.get("ranking") + if isinstance(ranking, list) and all(isinstance(x, int) for x in ranking): + console.rule() + return ranking + logger.error(f"Invalid ranking payload: {data}") + ph("cli-optimize-error-response", {"response_status_code": response.status_code, "error": "invalid ranking payload"}) console.rule() - return ranking + return None Suggestion importance[1-10]: 7 __ Why: Ensures the response contains a proper integer list before returning, reducing downstream errors; fits the new endpoint handling with a clear fallback to `None`.	Medium
General	Sanitize speedup values Guard against non-finite or negative speedup values to avoid sending misleading inputs to the ranking service. Clamp to a minimum of 1.0 and replace NaN/inf with 1.0 to represent no improvement. codeflash/optimization/function_optimizer.py [694-699] -speedups_list.append( - 1 - + performance_gain( - original_runtime_ns=original_code_baseline.runtime, optimized_runtime_ns=new_best_opt.runtime - ) +gain = performance_gain( + original_runtime_ns=original_code_baseline.runtime, + optimized_runtime_ns=new_best_opt.runtime, ) +speedup = 1 + (gain if gain is not None else 0) +# Ensure finite, non-negative speedup (at least 1.0 meaning no speedup) +if not isinstance(speedup, (int, float)) or not math.isfinite(speedup) or speedup < 1.0: + speedup = 1.0 +speedups_list.append(speedup) Suggestion importance[1-10]: 6 __ Why: Reasonable defensive check to avoid sending NaN/inf/invalid speedups to the service; moderate impact and requires importing `math`, but conceptually sound.	Low

Previous suggestions

Suggestions up to commit 3cb5a2d

Category	Suggestion	Impact
Possible issue	Await Future and validate result You are using `self.executor.submit(...)` but never call `.result()`, so `ranking` is a Future and list operations will fail at runtime. Retrieve the result with a timeout and handle `None` or exceptions gracefully. Also validate the result type before indexing. codeflash/optimization/function_optimizer.py [700-715] -ranking = self.executor.submit( +future = self.executor.submit( ai_service_client.generate_ranking, diffs=diff_strs, optimization_ids=optimization_ids, speedups=speedups_list, trace_id=self.function_trace_id[:-4] + exp_type if self.experiment_id else self.function_trace_id, ) -ranking = [x - 1 for x in ranking] -if ranking: - min_key = ranking[0] +try: + ranking = future.result(timeout=70) +except Exception: + ranking = None +if isinstance(ranking, list) and ranking: + # service returns 1-based indices; convert to 0-based safely + zero_based = [x - 1 for x in ranking if isinstance(x, int)] + min_key = zero_based[0] if zero_based else None else: + min_key = None +if min_key is None: diff_lens_ranking = create_rank_dictionary_compact(diff_lens_list) runtimes_ranking = create_rank_dictionary_compact(runtimes_list) # TODO: better way to resolve conflicts with same min ranking overall_ranking = {key: diff_lens_ranking[key] + runtimes_ranking[key] for key in diff_lens_ranking.keys()} # noqa: SIM118 min_key = min(overall_ranking, key=overall_ranking.get) Suggestion importance[1-10]: 9 __ Why: The code assigns the Future from `self.executor.submit(...)` to `ranking` and then treats it as a list, which would fail; adding `.result()` with error handling is correct and critical. The improved code accurately reflects the fix and preserves the fallback path if the service fails.	High
Possible issue	Guard index normalization and bounds Blindly subtracting 1 can produce negative indices if the service returns 0-based indices or invalid values, leading to wrong selection. Clamp to valid range and ignore out-of-bounds entries before choosing. codeflash/optimization/function_optimizer.py [707-710] -ranking = [x - 1 for x in ranking] -if ranking: - min_key = ranking[0] +if isinstance(ranking, list): + zero_based = [] + for x in ranking: + if isinstance(x, int): + idx = x - 1 + if 0 <= idx < len(valid_candidates_with_shorter_code): + zero_based.append(idx) + if zero_based: + min_key = zero_based[0] Suggestion importance[1-10]: 7 __ Why: Validates the assumption about 1-based indices and prevents negative or out-of-range indices, improving robustness. Moderate impact since it depends on service contract but reduces potential runtime errors.	Medium
General	Use unshifted speedup metric `performance_gain` likely returns a ratio; adding 1 shifts the baseline and can mislead the ranker. Pass the actual speedup (baseline/optimized) or ratio consistently with the backend expectation to avoid incorrect ordering. codeflash/optimization/function_optimizer.py [692-697] speedups_list.append( - 1 - + performance_gain( - original_runtime_ns=original_code_baseline.runtime, optimized_runtime_ns=new_best_opt.runtime + performance_gain( + original_runtime_ns=original_code_baseline.runtime, + optimized_runtime_ns=new_best_opt.runtime, ) ) Suggestion importance[1-10]: 5 __ Why: The concern about adding 1 to the `performance_gain` output may be valid but depends on backend expectations not shown in the diff. It’s a reasonable improvement suggestion but uncertain in correctness without more context.	Low

github-actions · 2025-09-11T21:00:09Z

Persistent review updated to latest commit 95114bd

aseembits93 · 2025-09-12T17:21:31Z

codeflash/api/aiservice.py

+
+        Parameters
+        ----------
+        - source_code (str): The python code to optimize.


TODO : change the docstring

codeflash/code_utils/version_check.py

tests/test_version_check.py

codeflash/optimization/function_optimizer.py

aseembits93 · 2025-09-12T22:39:09Z

@misrasaurabh1 ready to merge

aseembits93 added 3 commits September 4, 2025 19:05

ranker wip

b205516

todo logging message, db logging

8c36180

Merge remote-tracking branch 'origin/main' into ranker

3cb5a2d

github-actions bot added the Review effort 3/5 label Sep 5, 2025

aseembits93 added 3 commits September 4, 2025 19:29

int to float

9238fe5

minor bugfix

bf066a9

bugfix

dec3c1a

mohammedahmed18 mentioned this pull request Sep 7, 2025

[Enhancement] Ranking importance (prefer lower runtimes over smaller diff lines) #685

Closed

Merge branch 'main' into ranker

95114bd

aseembits93 marked this pull request as ready for review September 11, 2025 20:59

aseembits93 added 2 commits September 11, 2025 14:01

precommit fix

6752e0f

fix tests

3ba531e

misrasaurabh1 requested a review from KRRT7 September 12, 2025 07:12

aseembits93 requested a review from misrasaurabh1 September 12, 2025 17:20

aseembits93 commented Sep 12, 2025

View reviewed changes

codeflash/code_utils/version_check.py Show resolved Hide resolved

aseembits93 commented Sep 12, 2025

View reviewed changes

tests/test_version_check.py Show resolved Hide resolved

misrasaurabh1 reviewed Sep 12, 2025

View reviewed changes

codeflash/optimization/function_optimizer.py Outdated Show resolved Hide resolved

misrasaurabh1 reviewed Sep 12, 2025

View reviewed changes

codeflash/optimization/function_optimizer.py Outdated Show resolved Hide resolved

misrasaurabh1 reviewed Sep 12, 2025

View reviewed changes

codeflash/optimization/function_optimizer.py Show resolved Hide resolved

aseembits93 added 2 commits September 12, 2025 11:10

code review changes

8de932e

potential bug

bdd1461

aseembits93 requested a review from misrasaurabh1 September 12, 2025 22:36

misrasaurabh1 enabled auto-merge September 12, 2025 22:39

misrasaurabh1 approved these changes Sep 12, 2025

View reviewed changes

misrasaurabh1 merged commit 2802ae6 into main Sep 12, 2025
18 of 21 checks passed

aseembits93 deleted the ranker branch September 22, 2025 18:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Ranking different Optimization candidates based on their speedup and diff #717

Ranking different Optimization candidates based on their speedup and diff #717

Uh oh!

aseembits93 commented Sep 5, 2025 •

edited by github-actions bot

Loading

Uh oh!

github-actions bot commented Sep 5, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Sep 5, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Sep 11, 2025

Uh oh!

aseembits93 Sep 12, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

aseembits93 commented Sep 12, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Ranking different Optimization candidates based on their speedup and diff #717

Ranking different Optimization candidates based on their speedup and diff #717

Uh oh!

Conversation

aseembits93 commented Sep 5, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Type

Description

Diagram Walkthrough

File Walkthrough

Uh oh!

github-actions bot commented Sep 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Reviewer Guide 🔍

(Review updated until commit 95114bd)

Uh oh!

github-actions bot commented Sep 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Code Suggestions ✨

Previous suggestions

Uh oh!

github-actions bot commented Sep 11, 2025

Uh oh!

aseembits93 Sep 12, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

aseembits93 commented Sep 12, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

aseembits93 commented Sep 5, 2025 •

edited by github-actions bot

Loading

github-actions bot commented Sep 5, 2025 •

edited

Loading

(Review updated until commit `95114bd`)

github-actions bot commented Sep 5, 2025 •

edited

Loading