Open-ended ("free range") research optimization #197

BradKML · 2025-08-11T09:04:34Z

BradKML
Aug 11, 2025

There are a lot of papers that are similar to Absolute Zero, and I do think that having the same LLM split between two different roles and self-duel is a good idea. However, they are often stuck on a single task type. Also, not all self-play has to be so adversarial; it can be that one side suggests fun things to do, and the other has to discover something, anything.

List of stuff with the same Self-Play theme

Absolute Zero Reasoner (AZR) https://github.com/LeapLabTHU/Absolute-Zero-Reasoner
Self-Play Fine-Tuning (SPIN) https://github.com/uclaml/SPIN
Self-Play Preference Optimization (SPPO) https://github.com/uclaml/SPPO
Self-Play Adversarial Language Game (SPAG) https://github.com/Linear95/SPAG
Self-Play on Zero-Sum Games (SPIRAL) https://github.com/spiral-rl/spiral
Self-Play Reinforcement Learning (SeRL) https://github.com/wantbook-book/SeRL
Self-play Critic (SPC) https://github.com/chen-judge/SPC
Some stuff under peer-review https://openreview.net/forum?id=ed75tWzgt0

In the context of OpenEvolve, there are a few things to consider:

Since AI-Scientist kind of already exists, instead of reinventing the wheel multiple times, it should also be able to pass existing novel research to something like OpenEvolve to enhance the corresponding algorithms, before passing back to AI-Scientist for further remixing and diversification of goals Something new to mix in: Research/Paper Database #96 Adding PKMs into the system SakanaAI/AI-Scientist-v2#48
DGM and ASI-ARCH exist alongside OpenEvolve, and there has to be a way of unifying all three under a single meta-architecture. The optimization metric would be acceleration per compute-hours (instead of research per compute-hours, which is more of an ASI-ARCH thing) [feature] Implement ideas from the Darwin-Godel Machine paper to evolve the openevolve agent itself #53 AI Scientist 3 with SEAL + DGM/AlphaEvolve? SakanaAI/AI-Scientist-v2#53
Multi-task reinforcement learning by self-play is still a hot problem to solve, since given the same skill list and the same environment, there have to be ways that RL itself can be accelerated and made more effective.

Cross-reference to simulated open world training with a focus on "interestingness" jennyzzt/omni#2 (comment)

BradKML · 2025-08-12T01:51:18Z

BradKML
Aug 12, 2025
Author

We kinda need human-supervised research programs since not all data on Kaggle is clean, so it can have GIGO not at a "bad idea" level but a "bad data" level, coding tasks are easier to self-correct than data science in this regad SakanaAI/AI-Scientist-v2#2 (comment)

0 replies

BradKML · 2025-09-15T09:32:13Z

BradKML
Sep 15, 2025
Author

To lower friction, this tool seemed useful https://github.com/SWE-agent/mini-SWE-agent

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Open-ended ("free range") research optimization #197

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 2 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Open-ended ("free range") research optimization #197

Uh oh!

Uh oh!

BradKML Aug 11, 2025

Replies: 2 comments

Uh oh!

BradKML Aug 12, 2025 Author

Uh oh!

BradKML Sep 15, 2025 Author

BradKML
Aug 11, 2025

BradKML
Aug 12, 2025
Author

BradKML
Sep 15, 2025
Author