-
Notifications
You must be signed in to change notification settings - Fork 652
Open
Description
Recently was running an experiment with openevolve and our compute cluster was, unbenownst to me, down. This lead to some interesting openevolve behavior -- it would fail, retry a bunch of times based on the config, and then instead of actually exiting when all the retries failed, it would...keep going?
Here's an example of the output logs:
2025-10-11 21:36:42,862 - ERROR - All 1 attempts failed with error: Connection error.
2025-10-11 21:36:42,862 - DEBUG - Raising connection error
2025-10-11 21:36:42,862 - ERROR - LLM generation failed: Connection error.
2025-10-11 21:36:42,862 - ERROR - LLM generation failed: Connection error.
2025-10-11 21:36:42,862 - ERROR - All 1 attempts failed with error: Connection error.
2025-10-11 21:36:42,862 - ERROR - LLM generation failed: Connection error.
2025-10-11 21:36:42,863 - ERROR - LLM generation failed: Connection error.
2025-10-11 21:36:42,864 - WARNING - Iteration 147 error: LLM generation failed: Connection error.
2025-10-11 21:36:42,864 - DEBUG - Sampled parent 3b530fc4-8c86-4afe-809a-2773e5d8d917 and 0
inspirations from island 0
2025-10-11 21:36:42,864 - WARNING - Iteration 148 error: LLM generation failed: Connection error.
2025-10-11 21:36:42,865 - DEBUG - Sampled parent 3b530fc4-8c86-4afe-809a-2773e5d8d917 and 0
inspirations from island 0
2025-10-11 21:36:42,865 - WARNING - Iteration 149 error: LLM generation failed: Connection error.
2025-10-11 21:36:42,865 - DEBUG - Using selector: EpollSelector
2025-10-11 21:36:42,865 - DEBUG - Sampled parent 3b530fc4-8c86-4afe-809a-2773e5d8d917 and 0
inspirations from island 0
2025-10-11 21:36:42,865 - INFO - Sampled model: gpt-oss-120b
2025-10-11 21:36:42,865 - WARNING - Iteration 150 error: LLM generation failed: Connection error.
2025-10-11 21:36:42,865 - DEBUG - Using selector: EpollSelector
2025-10-11 21:36:42,865 - DEBUG - Sampled parent 3b530fc4-8c86-4afe-809a-2773e5d8d917 and 0
inspirations from island 0
2025-10-11 21:36:42,866 - INFO - Sampled model: gpt-oss-120b
2025-10-11 21:36:42,866 - WARNING - Iteration 151 error: LLM generation failed: Connection error.
2025-10-11 21:36:42,866 - DEBUG - Using selector: EpollSelector
2025-10-11 21:36:42,866 - DEBUG - Sampled parent 3b530fc4-8c86-4afe-809a-2773e5d8d917 and 0
inspirations from island 0
2025-10-11 21:36:42,866 - INFO - Sampled model: gpt-oss-120b
2025-10-11 21:36:42,866 - WARNING - Iteration 152 error: LLM generation failed: Connection error.
2025-10-11 21:36:42,866 - DEBUG - Using selector: EpollSelector
2025-10-11 21:36:42,866 - DEBUG - Sampled parent 3b530fc4-8c86-4afe-809a-2773e5d8d917 and 0
inspirations from island 0
2025-10-11 21:36:42,867 - INFO - Sampled model: gpt-oss-120b
2025-10-11 21:36:42,867 - WARNING - Iteration 153 error: LLM generation failed: Connection error.
2025-10-11 21:36:42,867 - DEBUG - Using selector: EpollSelector
2025-10-11 21:36:42,867 - DEBUG - Sampled parent 3b530fc4-8c86-4afe-809a-2773e5d8d917 and 0
inspirations from island 0
2025-10-11 21:36:42,867 - INFO - Sampled model: gpt-oss-120b
2025-10-11 21:36:42,867 - WARNING - Iteration 154 error: LLM generation failed: Connection error.
Is there any kind of error handling that I am missing here? It seems like this is all happening inside the openevolve run function, so I can't figure out how to stop this behavior without monkeypatching the library.
More generally, I can imagine all kinds of issues with our server that I'd want openevolve to robustly fail for, e.g. if openevolve hits a max token limit or times out our server
Metadata
Metadata
Assignees
Labels
No labels