Feature: Parallel iterations #63

MashAliK · 2025-06-09T06:06:42Z

Add parallel iterations along with a couple of bug fixes/improvements. I would consider this a pretty important feature because of the significant speedup it provides to training. This is the implementation that I tried and has worked for my use cases.

Primary changes:

Use concurrent.futures in controller.py to spawn processes for performing the iterations of training
Moved most of the iteration logic inside the function run_iteration_sync in a new file iteration.py which is run by the workers to allow concurrent execution
Pass config into spawned processes to initialize new instances of LLMEnsemble and ProgramDatabase (these classes are not pickleable so this is the best approach I could think of to use these classes in each of the worker processes)
As a consequence of this approach a snapshot of the database needs to be saved in the root process every iteration and each new process needs to load it

Minor changes:

The calculate_edit_distance function was crashing the database when I was using it and since there's already libraries that do this routine I ended up using one of them (levenshtein)
Replaced the use of the Levenshtein distance with a ratio in _calculate_island_diversity since it's easier to read
Replaced the use of the Levenshtein distance with a ratio in _calculate_feature_coords since it's normalized for code length
Introduced allowed_population_overflow since otherwise the database was adding and removing a program every iteration when it reach the allowed program limit
Added logging for MAP-Elite features

MashAliK · 2025-06-09T06:24:39Z

Solves: #32
I believe it also contains the solution to #60 because I remember moving the call to _enforce_population_limit later in add fixed an error where the program was removed from the list while it was still being added
Taking a closer look this bug looks different then the one I saw (I encountered mine when calling add, not sample)

codelion · 2025-06-19T07:01:09Z

Thanks for contributing can you rebase from main, I can test and then review this PR.

SuhailB · 2025-07-03T00:57:22Z

Hello @codelion, thank you for this great project, and thank you @MashAliK for the parallelization effort.

I saw the request to rebase this PR to the latest main. I needed the parallel evaluation as well, so I rebased it and resolved the conflicts in my fork:
https://github.com/SuhailB/openevolve/tree/updated-parallel-iterations

One thing I am not sure about, which was causing loading/storing large artifacts to fail is this code snippet:

  # Create directory and remove old path if it exists
  if os.path.exists(save_path):
      shutil.rmtree(save_path)

Lines 361-363 in the save method in database.py

The code now passes all the tests, but I am not sure what I did is correct.

Let me know if you'd like me to open a new PR, or if the @MashAliK prefers to pull from this branch to update this one.

Thank you

codelion · 2025-07-03T02:35:58Z

I don't think that code in main? It was done in this PR, perhaps to handle multiple processes updating the DB

As a consequence of this approach a snapshot of the database needs to be saved in the root process every iteration and each new process needs to load it

How have you solved it? Happy to look at the PR if you can send it across. Also, can you test with some of the existing examples with and without parallel execution to see if there is a speed up and they both reach similar convergence in terms of the best_program found.

SuhailB · 2025-07-03T02:53:09Z

@codelion yes, it’s not in main, I was referring to the PR version done by @MashAliK. Erasing the path was causing a failure in loading large artifacts which are stored in that path, so I commented it out in my rebased version:

https://github.com/SuhailB/openevolve/tree/updated-parallel-iterations

I am not sure if this breaks the dependencies or not. @MashAliK probably would know more about this.

I will try to run examples tomorrow.

MashAliK · 2025-07-03T04:42:04Z

Hello, sorry I wasn't able to get to updating this PR earlier. I will rebase it soon.

@SuhailB I originally deleted the folder containing the programs each time to prevent duplicate programs from being added, but now I'm seeing that the files are just overwritten inside _save_program so you're right that this probably isn't necessary.

The only other benefit I can think of for deleting the folder is if a new database is saved at the same location, then the previous programs will all be cleared. Again, this doesn't matter for how it's being used right now because controller.py creates a temp folder each time.

Since you're saying that deleting tends to cause issues for larger databases I'd prefer to just remove those lines for now so I will do that here. Thanks for pointing this out!

SuhailB · 2025-07-03T04:48:57Z

@MashAliK Sounds good. And to clarify, deleting it is causing an issue for large artifacts (a feature added after your PR), not large databases. Also feel free to use my branch for rebasing (if it has no issues)

SuhailB · 2025-07-03T21:39:09Z

Hi @codelion and @MashAliK,

I've tested the updated-parallel-iteration version for the circle_packing_with_artifacts example for 50 iterations for each stage instead of 100 to reduce API costs. I used gemini-2.0-flash-lite and gemini-2.0-flash-lite.

Results:
Sequential:
Total Runtime: 30 minutes
Best Score (sum_radii): 1.88
Stage1:

{
  "id": "788596e9-6d2b-4134-ade9-c309fdf812c2",
  "generation": 2,
  "iteration": 6,
  "timestamp": 1751574058.320835,
  "parent_id": "6eb73296-9b3c-42e1-91da-2063ce7acfaf",
  "metrics": {
    "validity": 1.0,
    "sum_radii": 1.8801588665719113,
    "target_ratio": 0.7135327766876325,
    "combined_score": 0.7135327766876325,
    "eval_time": 0.11612915992736816
  },
  "language": "python",
  "saved_at": 1751574423.6830008
}

Stage2:

{
  "id": "084465de-efce-4e42-81b0-79f6a044e819",
  "generation": 0,
  "iteration": 0,
  "timestamp": 1751574425.2020102,
  "parent_id": null,
  "metrics": {
    "validity": 1.0,
    "sum_radii": 1.8801588665719113,
    "target_ratio": 0.7135327766876325,
    "combined_score": 0.7135327766876325,
    "eval_time": 0.11661648750305176
  },
  "language": "python",
  "saved_at": 1751575856.2608302
}

Parallel (25 cores):
Total Runtime: 2 minutes
Best score (sum_radii): 2.038
Stage1:

{
  "id": "b96d6e67-76bb-4765-9f51-584d92e5d2c2",
  "generation": 1,
  "iteration": 20,
  "timestamp": 1751577546.0471964,
  "parent_id": "49307652-3ead-485f-8c21-1ce11a16ec29",
  "metrics": {
    "validity": 1.0,
    "sum_radii": 1.8598312591600656,
    "target_ratio": 0.7058183146717517,
    "combined_score": 0.7058183146717517,
    "eval_time": 0.22881340980529785
  },
  "language": "python",
  "saved_at": 1751577556.0780315
}

Stage2:

{
  "id": "1a044db7-811c-4ea6-92e3-1918c34b28a1",
  "generation": 1,
  "iteration": 16,
  "timestamp": 1751577569.0512633,
  "parent_id": "74a8ac96-f519-49e5-b3da-f4b9058cdc2e",
  "metrics": {
    "validity": 1.0,
    "sum_radii": 2.038107874942713,
    "target_ratio": 0.7734754743615609,
    "combined_score": 0.7734754743615609,
    "eval_time": 0.3030838966369629
  },
  "language": "python",
  "saved_at": 1751577604.0499
}

The rebased version is in here: https://github.com/SuhailB/openevolve/tree/updated-parallel-iterations

@MashAliK could you rebase for @codelion to review, or should I open a new PR?

MashAliK · 2025-07-03T22:47:38Z

@SuhailB Got it, thanks.
Since you've already fixed and rebased it you could open a new PR from your branch and we can test things there. I will then close this one.

MashAliK added 4 commits June 9, 2025 01:29

init

f46233d

revert ensemble

095a5cb

fix tmp removal

3b3e9c0

lint

474b778

SuhailB mentioned this pull request Jul 3, 2025

Updated parallel iterations #120

Merged

codelion closed this Jul 7, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feature: Parallel iterations #63

Feature: Parallel iterations #63

Uh oh!

MashAliK commented Jun 9, 2025

Uh oh!

MashAliK commented Jun 9, 2025 •

edited

Loading

Uh oh!

codelion commented Jun 19, 2025

Uh oh!

SuhailB commented Jul 3, 2025

Uh oh!

codelion commented Jul 3, 2025

Uh oh!

SuhailB commented Jul 3, 2025

Uh oh!

MashAliK commented Jul 3, 2025

Uh oh!

SuhailB commented Jul 3, 2025

Uh oh!

SuhailB commented Jul 3, 2025

Uh oh!

MashAliK commented Jul 3, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Feature: Parallel iterations #63

Feature: Parallel iterations #63

Uh oh!

Conversation

MashAliK commented Jun 9, 2025

Uh oh!

MashAliK commented Jun 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codelion commented Jun 19, 2025

Uh oh!

SuhailB commented Jul 3, 2025

Uh oh!

codelion commented Jul 3, 2025

Uh oh!

SuhailB commented Jul 3, 2025

Uh oh!

MashAliK commented Jul 3, 2025

Uh oh!

SuhailB commented Jul 3, 2025

Uh oh!

SuhailB commented Jul 3, 2025

Uh oh!

MashAliK commented Jul 3, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

MashAliK commented Jun 9, 2025 •

edited

Loading