Skip to content

Conversation

aseembits93
Copy link
Contributor

@aseembits93 aseembits93 commented Sep 5, 2025

PR Type

Enhancement, Bug fix


Description

  • Add AI-driven candidate ranking endpoint usage

  • Compute unified diffs for candidates

  • Fallback to local ranking when API fails

  • Log and handle ranking request errors


Diagram Walkthrough

flowchart LR
  A["Generate candidates"] -- "compute diff/speedup" --> B["Build payload (diffs, ids, speedups)"]
  B -- "POST /ranker" --> C["AI service ranking"]
  C -- "ranking ok" --> D["Select best candidate by AI order"]
  C -- "error/None" --> E["Fallback: diff+runtimes ranking"]
  A --> F["Unified diff strings helper"]
Loading

File Walkthrough

Relevant files
Enhancement
aiservice.py
Add AI service ranking request with error handling             

codeflash/api/aiservice.py

  • Add generate_ranking method calling /ranker.
  • Build payload with diffs, ids, speedups, python version.
  • Robust error handling and logging, returns list or None.
  • Telemetry events for errors.
+53/-0   
code_utils.py
Add unified diff string utility                                                   

codeflash/code_utils/code_utils.py

  • Introduce unified_diff_strings utility.
  • Produce unified diff between two code strings.
+17/-0   
function_optimizer.py
Use AI ranking with diff/speedup and fallback                       

codeflash/optimization/function_optimizer.py

  • Compute diffs, speedups, and ids per candidate.
  • Call generate_ranking and use ranking result.
  • Fallback to local ranking if API unavailable.
  • Import and use unified_diff_strings.
+30/-5   

Copy link

github-actions bot commented Sep 5, 2025

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 3 🔵🔵🔵⚪⚪
🧪 No relevant tests
🔒 No security concerns identified
⚡ Recommended focus areas for review

Possible Issue

The result of submitting the async ranking task is used directly as if it were the ranking list. self.executor.submit(...) returns a Future; you likely need to call .result() and handle timeouts/exceptions. As written, list comprehension and truthiness checks on the Future will fail.

ranking = self.executor.submit(
    ai_service_client.generate_ranking,
    diffs=diff_strs,
    optimization_ids=optimization_ids,
    speedups=speedups_list,
    trace_id=self.function_trace_id[:-4] + exp_type if self.experiment_id else self.function_trace_id,
)
ranking = [x - 1 for x in ranking]
if ranking:
None Handling

After adjusting indices with [x - 1 for x in ranking], there is no validation that the values are within range and not negative. Also if the API returns None, this will raise. Ensure type/length checks before indexing valid_candidates_with_shorter_code.

if ranking:
    min_key = ranking[0]
else:
    diff_lens_ranking = create_rank_dictionary_compact(diff_lens_list)
    runtimes_ranking = create_rank_dictionary_compact(runtimes_list)
    # TODO: better way to resolve conflicts with same min ranking
    overall_ranking = {key: diff_lens_ranking[key] + runtimes_ranking[key] for key in diff_lens_ranking.keys()}  # noqa: SIM118
    min_key = min(overall_ranking, key=overall_ranking.get)
best_optimization = valid_candidates_with_shorter_code[min_key]
Mismatched Types

The speedups parameter is typed as list[int], but speedups are built as 1 + performance_gain(...) which likely yields floats. Align typing and JSON payload expectations to avoid serialization/precision issues.

def generate_ranking(  # noqa: D417
    self, trace_id: str, diffs: list[str], optimization_ids: list[str], speedups: list[int]
) -> list[int] | None:
    """Optimize the given python code for performance by making a request to the Django endpoint.

    Parameters
    ----------
    - source_code (str): The python code to optimize.
    - optimized_code (str): The python code generated by the AI service.
    - dependency_code (str): The dependency code used as read-only context for the optimization
    - original_line_profiler_results: str - line profiler results for the baseline code
    - optimized_line_profiler_results: str - line profiler results for the optimized code
    - original_code_runtime: str - runtime for the baseline code
    - optimized_code_runtime: str - runtime for the optimized code
    - speedup: str - speedup of the optimized code
    - annotated_tests: str - test functions annotated with runtime
    - optimization_id: str - unique id of opt candidate
    - original_explanation: str - original_explanation generated for the opt candidate

    Returns
    -------
    - List[OptimizationCandidate]: A list of Optimization Candidates.

    """
    payload = {
        "trace_id": trace_id,
        "diffs": diffs,
        "speedups": speedups,
        "optimization_ids": optimization_ids,
        "python_version": platform.python_version(),
    }
    logger.info("Generating ranking")
    console.rule()
    try:
        response = self.make_ai_service_request("/ranker", payload=payload, timeout=60)
    except requests.exceptions.RequestException as e:
        logger.exception(f"Error generating ranking: {e}")
        ph("cli-optimize-error-caught", {"error": str(e)})
        return None

    if response.status_code == 200:
        ranking: list[int] = response.json()["ranking"]
        console.rule()
        return ranking
    try:
        error = response.json()["error"]
    except Exception:
        error = response.text
    logger.error(f"Error generating ranking: {response.status_code} - {error}")
    ph("cli-optimize-error-response", {"response_status_code": response.status_code, "error": error})
    console.rule()
    return None

Copy link

github-actions bot commented Sep 5, 2025

PR Code Suggestions ✨

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
Possible issue
Await Future and validate result

You are using self.executor.submit(...) but never call .result(), so ranking is a
Future and list operations will fail at runtime. Retrieve the result with a timeout
and handle None or exceptions gracefully. Also validate the result type before
indexing.

codeflash/optimization/function_optimizer.py [700-715]

-ranking = self.executor.submit(
+future = self.executor.submit(
     ai_service_client.generate_ranking,
     diffs=diff_strs,
     optimization_ids=optimization_ids,
     speedups=speedups_list,
     trace_id=self.function_trace_id[:-4] + exp_type if self.experiment_id else self.function_trace_id,
 )
-ranking = [x - 1 for x in ranking]
-if ranking:
-    min_key = ranking[0]
+try:
+    ranking = future.result(timeout=70)
+except Exception:
+    ranking = None
+if isinstance(ranking, list) and ranking:
+    # service returns 1-based indices; convert to 0-based safely
+    zero_based = [x - 1 for x in ranking if isinstance(x, int)]
+    min_key = zero_based[0] if zero_based else None
 else:
+    min_key = None
+if min_key is None:
     diff_lens_ranking = create_rank_dictionary_compact(diff_lens_list)
     runtimes_ranking = create_rank_dictionary_compact(runtimes_list)
     # TODO: better way to resolve conflicts with same min ranking
     overall_ranking = {key: diff_lens_ranking[key] + runtimes_ranking[key] for key in diff_lens_ranking.keys()}  # noqa: SIM118
     min_key = min(overall_ranking, key=overall_ranking.get)
Suggestion importance[1-10]: 9

__

Why: The code assigns the Future from self.executor.submit(...) to ranking and then treats it as a list, which would fail; adding .result() with error handling is correct and critical. The improved code accurately reflects the fix and preserves the fallback path if the service fails.

High
Guard index normalization and bounds

Blindly subtracting 1 can produce negative indices if the service returns 0-based
indices or invalid values, leading to wrong selection. Clamp to valid range and
ignore out-of-bounds entries before choosing.

codeflash/optimization/function_optimizer.py [707-710]

-ranking = [x - 1 for x in ranking]
-if ranking:
-    min_key = ranking[0]
+if isinstance(ranking, list):
+    zero_based = []
+    for x in ranking:
+        if isinstance(x, int):
+            idx = x - 1
+            if 0 <= idx < len(valid_candidates_with_shorter_code):
+                zero_based.append(idx)
+    if zero_based:
+        min_key = zero_based[0]
Suggestion importance[1-10]: 7

__

Why: Validates the assumption about 1-based indices and prevents negative or out-of-range indices, improving robustness. Moderate impact since it depends on service contract but reduces potential runtime errors.

Medium
General
Use unshifted speedup metric

performance_gain likely returns a ratio; adding 1 shifts the baseline and can
mislead the ranker. Pass the actual speedup (baseline/optimized) or ratio
consistently with the backend expectation to avoid incorrect ordering.

codeflash/optimization/function_optimizer.py [692-697]

 speedups_list.append(
-    1
-    + performance_gain(
-        original_runtime_ns=original_code_baseline.runtime, optimized_runtime_ns=new_best_opt.runtime
+    performance_gain(
+        original_runtime_ns=original_code_baseline.runtime,
+        optimized_runtime_ns=new_best_opt.runtime,
     )
 )
Suggestion importance[1-10]: 5

__

Why: The concern about adding 1 to the performance_gain output may be valid but depends on backend expectations not shown in the diff. It’s a reasonable improvement suggestion but uncertain in correctness without more context.

Low

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant