Feature: Artifact side channel #52

nerdsane · 2025-06-03T20:44:14Z

Why

AlphaEvolve injects “rendered evaluation results – usually a program, the result of executing that program, and the scores assigned by the evaluate function” into every prompt, arguing that rich execution feedback speeds convergence.
OpenEvolve presently forwards only the metrics dictionary with numeric values, so LLMs never see build logs, failing-test traces, perf profiles or any other output that may be useful.

In an example I'm currently experimenting with I used this to pass formal model (TLA+) check output and memory profiling results to the LLM. (This is currently work in progress and not included in this PR, so I added the circle packing example with slightly updated evaluator to show how artifacts work).

What’s in this PR

EvaluationResult dataclass — retains the original metrics dict and adds an optional artifacts field for text / binary payloads.
Two-tier storage — artifacts ≤ 32 KB are JSON-encoded in a new artifacts_json column; larger blobs are written under artifact_dir/ on disk and referenced from the DB.
Prompt support — templates now accept {artifacts}; the sampler injects sanitized, size-capped content so the LLM sees exact failure text without blowing context.
Config & env flags — ENABLE_ARTIFACTS, max_artifact_bytes, and base-path knobs let users toggle or tune the feature with zero code changes.
Examples & tests — updated circle-packing example shows compile-failure recovery; 26 new tests cover unit, integration, and perf to keep coverage green.
Backward compatibility — plain metrics dicts are auto-wrapped, so existing tasks run unmodified.
No impact on selection logic — Best program ranking is still based on pure floats; artifacts are for the LLM only.

How it works

Evaluator returns

return EvaluationResult(
    metrics={"build_ok": 0.0},
    artifacts={"stderr": compile_log}
)

DB stores metrics + artifact blob.
Prompt sampler tacks on a block like
```
### Last-run stderr
...undefined reference to `foo`...
```
giving the LLM concrete tokens to fix next round while selection logic still ranks on pure floats.

Impact

Richer context – faster self-repair of broken candidates, ability to steer the LLM with additional data.
Zero breaking changes – old evaluators are auto-wrapped; artifacts pipeline is fully disable-able.

CLAassistant · 2025-06-03T20:44:20Z

All committers have signed the CLA.

codelion · 2025-06-04T04:39:04Z

This PR addresses #37 as well I believe.

codelion · 2025-06-04T05:38:53Z

Can you please update from main, we merged a couple of PRs that were already in testing.

nerdsane · 2025-06-04T18:50:52Z

I resolved the merge conflicts that came with the changes from PRs #47 and #54, and cleaned up a unit test had a Python version-specific assertion, so tests should now pass.

This PR addresses #37 as well I believe.

Yes, any evaluation errors can now be included in the prompt.

nileshtrivedi · 2025-06-10T02:55:30Z

It might be good to document the convention for artifact names? For eg: build_stdout, build_stderr, run_stdout, run_stderr etc.

nerdsane added 2 commits June 3, 2025 16:18

Add artifact side-channel with prompt integration

287952c

removed extra for loop in sampler

7bcbc5e

nerdsane changed the title ~~Feat artifact side channel~~ Feature: Artifact side channel Jun 3, 2025

nerdsane added 2 commits June 3, 2025 19:01

fixed unit test for truncated artifact content

45e3452

fixed flaky test by setting up an event loop to test Evaluator

213e0f7

nerdsane added 2 commits June 4, 2025 13:09

resolved merge conflicts

4eeb558

removed Python version-specific error messages from tests

72016e2

codelion merged commit c779ac9 into algorithmicsuperintelligence:main Jun 9, 2025
3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feature: Artifact side channel #52

Feature: Artifact side channel #52

Uh oh!

nerdsane commented Jun 3, 2025

Uh oh!

CLAassistant commented Jun 3, 2025 •

edited

Loading

Uh oh!

codelion commented Jun 4, 2025

Uh oh!

codelion commented Jun 4, 2025

Uh oh!

nerdsane commented Jun 4, 2025 •

edited

Loading

Uh oh!

Uh oh!

nileshtrivedi commented Jun 10, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Feature: Artifact side channel #52

Feature: Artifact side channel #52

Uh oh!

Conversation

nerdsane commented Jun 3, 2025

Why

What’s in this PR

How it works

Impact

Uh oh!

CLAassistant commented Jun 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codelion commented Jun 4, 2025

Uh oh!

codelion commented Jun 4, 2025

Uh oh!

nerdsane commented Jun 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

nileshtrivedi commented Jun 10, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

CLAassistant commented Jun 3, 2025 •

edited

Loading

nerdsane commented Jun 4, 2025 •

edited

Loading