Skip to content

Conversation

mohammedahmed18
Copy link
Contributor

@mohammedahmed18 mohammedahmed18 commented Aug 24, 2025

PR Type

Enhancement

This is still experimental, but the core idea is the
normalization: which is better than sorting in that case because if there is a very low runtime or a low diff it will standout relatively to other values.

So suppose if we have these metrics
runtime = [60, 40, 4]
diffs = [10, 9, 10]

the total score we would have currently is {0: 3, 1: 1, 2: 2} so it will pick the second candidate of 40 runtime and 9 lines diff (which is not the best here)

but with normalization and weights the ranking dict would be {0: 1.0, 1: 0.4821428571428572, 2: 0.25} so it would pick the 4 runtime with 10 lines of diff

so it's all about which candidate is better related to other candidates

weights: for the 3 and 1 weights, it's for saying runtime is 3 times more important than diff, I still need to play with these two numbers to see what is the best percentage


Description

  • Add weight utilities for normalized importances

  • Implement metric normalization and scoring

  • Update determine_best_candidate with weighting

  • Remove old rank summation approach


Diagram Walkthrough

flowchart LR
  R["runtimes_list"] --> NR["normalize(runtimes)"]
  D["diff_lens_list"] --> ND["normalize(diffs)"]
  W["weights(runtime=3,diff=1)"] --> E["choose_weights()"]
  NR --> C["create_score_dictionary_from_metrics()"]
  ND --> C
  E --> C
  C --> S["score_dict"]
  S --> M["min_key"]
  M --> B["best_optimization"]
Loading

File Walkthrough

Relevant files
Enhancement
code_utils.py
Introduce weighted metrics utility functions                         

codeflash/code_utils/code_utils.py

  • Added choose_weights utility for normalization
  • Added normalize function for metrics scaling
  • Added create_score_dictionary_from_metrics for scoring
  • Added error checks for zero or mismatched inputs
+57/-0   
function_optimizer.py
Switch to weighted metric ranking logic                                   

codeflash/optimization/function_optimizer.py

  • Imported new weighting and normalization functions
  • Replaced rank summation with weighted scoring
  • Normalize runtimes_list and diff_lens_list
  • Removed old create_rank_dictionary_compact logic
+12/-6   

Copy link

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 3 🔵🔵🔵⚪⚪
🧪 No relevant tests
🔒 No security concerns identified
⚡ Recommended focus areas for review

Type Annotation Mismatch

The return type annotation of create_score_dictionary_from_metrics specifies dict[int, int], but the function produces float scores. This inconsistency can mislead maintainers or break type checks.

def create_score_dictionary_from_metrics(weights: list[float], *metrics: list[float]) -> dict[int, int]:
    """Combine multiple metrics into a single weighted score dictionary.

    Each metric is a list of values (smaller = better).
    The total score for each index is the weighted sum of its values
    across all metrics:

        score[index] = Σ (value * weight)

    Args:
        weights: A list of weights, one per metric. Larger weight = more influence.
        *metrics: Lists of values (one list per metric, aligned by index).

    Returns:
        A dictionary mapping each index to its combined weighted score.

    """
    if len(weights) != len(metrics):
        raise ValueError("Number of weights must match number of metrics")

    combined: dict[int, float] = {}

    for weight, metric in zip(weights, metrics):
        for idx, value in enumerate(metric):
            combined[idx] = combined.get(idx, 0) + value * weight

    return combined
Metrics Length Validation

The function does not verify that all metric lists have the same length, which may lead to incomplete scoring or unexpected behavior if inputs differ in size.

if len(weights) != len(metrics):
    raise ValueError("Number of weights must match number of metrics")

combined: dict[int, float] = {}

for weight, metric in zip(weights, metrics):
    for idx, value in enumerate(metric):
        combined[idx] = combined.get(idx, 0) + value * weight
Tie-breaking Behavior

When all normalized metric values are equal, normalize returns zeros and all scores tie. min(score_dict) will always pick the first index, introducing bias. Consider adding explicit tie-break logic.

weights = choose_weights(runtime=3, diff=1)

runtime_norm = normalize(runtimes_list)
diffs_norm = normalize(diff_lens_list)
score_dict = create_score_dictionary_from_metrics(weights, runtime_norm, diffs_norm)

min_key = min(score_dict, key=score_dict.get)
best_optimization = valid_candidates_with_shorter_code[min_key]

Copy link

PR Code Suggestions ✨

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
Possible issue
Handle empty input lists

Guard against empty input to avoid a ValueError from min()/max(). Return an empty
list immediately if no values are provided. This ensures callers won’t crash on
zero-length metrics.

codeflash/code_utils/code_utils.py [88-92]

 def normalize(values: list[float]) -> list[float]:
+    if not values:
+        return []
     mn, mx = min(values), max(values)
     if mx == mn:
         return [0.0] * len(values)
     return [(v - mn) / (mx - mn) for v in values]
Suggestion importance[1-10]: 7

__

Why: Adding a guard for empty values prevents a ValueError from min()/max() and avoids runtime crashes when no metrics are provided.

Medium
General
Validate metric lengths and types

Enforce that all metric lists have the same length and correct the return type
annotation to float scores. This ensures every index is scored uniformly and the
type hint matches actual output.

codeflash/code_utils/code_utils.py [95-115]

-def create_score_dictionary_from_metrics(weights: list[float], *metrics: list[float]) -> dict[int, int]:
+def create_score_dictionary_from_metrics(weights: list[float], *metrics: list[float]) -> dict[int, float]:
     if len(weights) != len(metrics):
         raise ValueError("Number of weights must match number of metrics")
+    if len({len(m) for m in metrics}) > 1:
+        raise ValueError("All metrics must have equal length")
     combined: dict[int, float] = {}
Suggestion importance[1-10]: 7

__

Why: Enforcing equal metric lengths and correcting the return annotation fixes a type mismatch and prevents indexing errors across metrics.

Medium
Reject negative importance values

Validate that no importance value is negative before normalization. Raise an error
if any provided weight is below zero. This prevents unexpected score inversions.

codeflash/code_utils/code_utils.py [67-85]

 def choose_weights(**importance: float) -> list[float]:
     total = sum(importance.values())
     if total == 0:
         raise ValueError("At least one importance value must be > 0")
+    if any(v < 0 for v in importance.values()):
+        raise ValueError("Importance values must be non-negative")
     return [v / total for v in importance.values()]
Suggestion importance[1-10]: 5

__

Why: Checking for negative weights avoids unintended score inversions, but is a minor validation enhancement rather than a critical fix.

Low

@misrasaurabh1
Copy link
Contributor

talk to @aseembits93 about this since there is some ideas on this problem

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants