-
Notifications
You must be signed in to change notification settings - Fork 650
Open
Description
Right now, this is the default code diversity measure:
def _fast_code_diversity(self, code1: str, code2: str) -> float:
"""
Fast approximation of code diversity using simple metrics
Returns diversity score (higher = more diverse)
"""
if code1 == code2:
return 0.0
# Length difference (scaled to reasonable range)
len1, len2 = len(code1), len(code2)
length_diff = abs(len1 - len2)
# Line count difference
lines1 = code1.count("\n")
lines2 = code2.count("\n")
line_diff = abs(lines1 - lines2)
# Simple character set difference
chars1 = set(code1)
chars2 = set(code2)
char_diff = len(chars1.symmetric_difference(chars2))
# Combine metrics (scaled to match original edit distance range)
diversity = length_diff * 0.1 + line_diff * 10 + char_diff * 0.5
return diversity
This could easily be replaced with https://pypi.org/project/python-Levenshtein/
def _fast_code_diversity(self, code1: str, code2: str) -> float:
"""
Fast approximation of code diversity using simple metrics
Returns diversity score (higher = more diverse)
"""
return Levenshtein.distance(code1, code2)
Which I think is a much more theoretically sound basis for measuring diversity
Metadata
Metadata
Assignees
Labels
No labels