[Enhancement] Ranking importance (prefer lower runtimes over smaller diff lines) #685

mohammedahmed18 · 2025-08-24T22:42:04Z

PR Type

Enhancement

This is still experimental, but the core idea is the
normalization: which is better than sorting in that case because if there is a very low runtime or a low diff it will standout relatively to other values.

So suppose if we have these metrics
runtime = [60, 40, 4]
diffs = [10, 9, 10]

the total score we would have currently is {0: 3, 1: 1, 2: 2} so it will pick the second candidate of 40 runtime and 9 lines diff (which is not the best here)

but with normalization and weights the ranking dict would be {0: 1.0, 1: 0.4821428571428572, 2: 0.25} so it would pick the 4 runtime with 10 lines of diff

so it's all about which candidate is better related to other candidates

weights: for the 3 and 1 weights, it's for saying runtime is 3 times more important than diff, I still need to play with these two numbers to see what is the best percentage

Description

Add weight utilities for normalized importances
Implement metric normalization and scoring
Update determine_best_candidate with weighting
Remove old rank summation approach

Diagram Walkthrough

flowchart LR
  R["runtimes_list"] --> NR["normalize(runtimes)"]
  D["diff_lens_list"] --> ND["normalize(diffs)"]
  W["weights(runtime=3,diff=1)"] --> E["choose_weights()"]
  NR --> C["create_score_dictionary_from_metrics()"]
  ND --> C
  E --> C
  C --> S["score_dict"]
  S --> M["min_key"]
  M --> B["best_optimization"]

File Walkthrough

Relevant files

Enhancement

code_utils.py `Introduce weighted metrics utility functions` codeflash/code_utils/code_utils.py Added `choose_weights` utility for normalization Added `normalize` function for metrics scaling Added `create_score_dictionary_from_metrics` for scoring Added error checks for zero or mismatched inputs	+57/-0
function_optimizer.py `Switch to weighted metric ranking logic` codeflash/optimization/function_optimizer.py Imported new weighting and normalization functions Replaced rank summation with weighted scoring Normalize `runtimes_list` and `diff_lens_list` Removed old `create_rank_dictionary_compact` logic	+12/-6

github-actions · 2025-08-24T22:43:21Z

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 3 🔵🔵🔵⚪⚪
🧪 No relevant tests
🔒 No security concerns identified
⚡ Recommended focus areas for review Type Annotation Mismatch The return type annotation of create_score_dictionary_from_metrics specifies dict[int, int], but the function produces float scores. This inconsistency can mislead maintainers or break type checks. def create_score_dictionary_from_metrics(weights: list[float], metrics: list[float]) -> dict[int, int]: """Combine multiple metrics into a single weighted score dictionary. Each metric is a list of values (smaller = better). The total score for each index is the weighted sum of its values across all metrics: score[index] = Σ (value weight) Args: weights: A list of weights, one per metric. Larger weight = more influence. metrics: Lists of values (one list per metric, aligned by index). Returns: A dictionary mapping each index to its combined weighted score. """ if len(weights) != len(metrics): raise ValueError("Number of weights must match number of metrics") combined: dict[int, float] = {} for weight, metric in zip(weights, metrics): for idx, value in enumerate(metric): combined[idx] = combined.get(idx, 0) + value weight return combined Metrics Length Validation The function does not verify that all metric lists have the same length, which may lead to incomplete scoring or unexpected behavior if inputs differ in size. if len(weights) != len(metrics): raise ValueError("Number of weights must match number of metrics") combined: dict[int, float] = {} for weight, metric in zip(weights, metrics): for idx, value in enumerate(metric): combined[idx] = combined.get(idx, 0) + value * weight Tie-breaking Behavior When all normalized metric values are equal, normalize returns zeros and all scores tie. min(score_dict) will always pick the first index, introducing bias. Consider adding explicit tie-break logic. weights = choose_weights(runtime=3, diff=1) runtime_norm = normalize(runtimes_list) diffs_norm = normalize(diff_lens_list) score_dict = create_score_dictionary_from_metrics(weights, runtime_norm, diffs_norm) min_key = min(score_dict, key=score_dict.get) best_optimization = valid_candidates_with_shorter_code[min_key]

github-actions · 2025-08-24T22:43:58Z

PR Code Suggestions ✨

Explore these optional code suggestions:

Category	Suggestion	Impact
Possible issue	Handle empty input lists Guard against empty input to avoid a ValueError from min()/max(). Return an empty list immediately if no values are provided. This ensures callers won’t crash on zero-length metrics. codeflash/code_utils/code_utils.py [88-92] def normalize(values: list[float]) -> list[float]: + if not values: + return [] mn, mx = min(values), max(values) if mx == mn: return [0.0] * len(values) return [(v - mn) / (mx - mn) for v in values] Suggestion importance[1-10]: 7 __ Why: Adding a guard for empty `values` prevents a `ValueError` from `min()/max()` and avoids runtime crashes when no metrics are provided.	Medium
General	Validate metric lengths and types Enforce that all metric lists have the same length and correct the return type annotation to float scores. This ensures every index is scored uniformly and the type hint matches actual output. codeflash/code_utils/code_utils.py [95-115] -def create_score_dictionary_from_metrics(weights: list[float], metrics: list[float]) -> dict[int, int]: +def create_score_dictionary_from_metrics(weights: list[float], metrics: list[float]) -> dict[int, float]: if len(weights) != len(metrics): raise ValueError("Number of weights must match number of metrics") + if len({len(m) for m in metrics}) > 1: + raise ValueError("All metrics must have equal length") combined: dict[int, float] = {} Suggestion importance[1-10]: 7 __ Why: Enforcing equal metric lengths and correcting the return annotation fixes a type mismatch and prevents indexing errors across metrics.	Medium
General	Reject negative importance values Validate that no importance value is negative before normalization. Raise an error if any provided weight is below zero. This prevents unexpected score inversions. codeflash/code_utils/code_utils.py [67-85] def choose_weights(**importance: float) -> list[float]: total = sum(importance.values()) if total == 0: raise ValueError("At least one importance value must be > 0") + if any(v < 0 for v in importance.values()): + raise ValueError("Importance values must be non-negative") return [v / total for v in importance.values()] Suggestion importance[1-10]: 5 __ Why: Checking for negative weights avoids unintended score inversions, but is a minor validation enhancement rather than a critical fix.	Low

misrasaurabh1 · 2025-08-25T01:46:11Z

talk to @aseembits93 about this since there is some ideas on this problem

mohammedahmed18 · 2025-09-07T12:43:58Z

closing in favor of #717

prefer runtime speed over diff lines count

e3f0fde

github-actions bot added the Review effort 3/5 label Aug 24, 2025

mohammedahmed18 closed this Sep 7, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Enhancement] Ranking importance (prefer lower runtimes over smaller diff lines) #685

[Enhancement] Ranking importance (prefer lower runtimes over smaller diff lines) #685

Uh oh!

mohammedahmed18 commented Aug 24, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Aug 24, 2025

Uh oh!

github-actions bot commented Aug 24, 2025

Uh oh!

misrasaurabh1 commented Aug 25, 2025

Uh oh!

mohammedahmed18 commented Sep 7, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[Enhancement] Ranking importance (prefer lower runtimes over smaller diff lines) #685

[Enhancement] Ranking importance (prefer lower runtimes over smaller diff lines) #685

Uh oh!

Conversation

mohammedahmed18 commented Aug 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Type

so it's all about which candidate is better related to other candidates

Description

Diagram Walkthrough

File Walkthrough

Uh oh!

github-actions bot commented Aug 24, 2025

PR Reviewer Guide 🔍

Uh oh!

github-actions bot commented Aug 24, 2025

PR Code Suggestions ✨

Uh oh!

misrasaurabh1 commented Aug 25, 2025

Uh oh!

mohammedahmed18 commented Sep 7, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

mohammedahmed18 commented Aug 24, 2025 •

edited

Loading