Skip to content

Conversation

@ai-symphony
Copy link

Improving performance when calculating the edit distances of large programs (i.e. 3K+ characters).

@CLAassistant
Copy link

CLAassistant commented Jun 6, 2025

CLA assistant check
All committers have signed the CLA.

@benjaminy
Copy link

Is this PR waiting on something? The edit distance calculation can be an annoying bottleneck.

@ai-symphony
Copy link
Author

ai-symphony commented Jun 16, 2025

Is this PR waiting on something? The edit distance calculation can be an annoying bottleneck.

I don't see an option to Merge this. Can someone with permissions please do this for me? @codelion, @jvm123, @DavyMorgan

@codelion
Copy link
Member

This is fixed in the https://github.com/codelion/openevolve/tree/feat/MLX-kernel-optimization branch I will merge that into main soon.

@codelion
Copy link
Member

This should be fixed in main now.

@ai-symphony ai-symphony reopened this Jun 20, 2025
@ai-symphony
Copy link
Author

This should be fixed in main now.

Looks like calculate_edit_distance is still used here: https://github.com/codelion/openevolve/blob/656e153b8ce1e74dbb0af80ca75fdc62cf545d70/openevolve/database.py#L560
Will this also be fixed?

@codelion
Copy link
Member

Looks like calculate_edit_distance is still used here:

Yes but now we only sample up to 5 programs so it doesn't take that long to calculate it. https://github.com/codelion/openevolve/blob/656e153b8ce1e74dbb0af80ca75fdc62cf545d70/openevolve/database.py#L556

I am open to moving to more efficient edit distance calculation but in my experiments it was necessary to restrict the number of programs anyways as it can get quite large.

@benjaminy
Copy link

Sorry if this is too off-topic, but I'm curious what the thinking is regarding edit distance. I guess the point is to approximate the behavioral diversity of a group of programs. The edit distance of their code seems like at best a pretty rough estimate of that. Maybe a system with app-specific scores/traces/descriptions would be a better approximation?

@codelion
Copy link
Member

The edit distance of their code seems like at best a pretty rough estimate of that. Maybe a system with app-specific scores/traces/descriptions would be a better approximation?

Sure, the edit distance comes from other evolutionary algorithms like the ones that target strings like DNA strands etc. There can be other choices for diversity so if we can experiment and benchmark with some examples to see what can work better? The goal would can be to reach the same best_program.py from the given initial_program.py for some of the examples faster. That would show convergence and we can see if particular choice of diversity of programs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants