A minimal, non-glamorous evolutionary search for better text compression, no pareto fronts, no multi-objective hand-waving, just greedy brutalist improvement.
Given a simple LZ77-like compression algorithm as a starting point, this project applies LLM suggested mutations to its code, generation after generation, searching for a better compression ratio (i mistook ratio = compressed size / original size, lower is better).
Select is straightforward: pick 3 best performers (elites), keep 2 random survivors, and spawn 4 mutated children per parent. The process continues until improvement stalls or your OpenAI credits run dry.
No fancy diversity measures. No crowding distance. No magical fronts. Just evolve, test, keep what works, throw away that doesn't.
Because its fun and I had $100 of OpenAI credits I've been wanting to burn.
- Install dependencies:
uv sync
- Set your API key:
export OPENAI_API_KEY=sk...
- Run:
uv run main.py
- By default, uses
big.txt
as the input to compress. Check or replace it for your own tests.
You can tweak the core evolutionary settings with CLI arguments:
--children
: How many mutated children per parent (default: 4)--top_k
: Elites selected per generation (default: 3)--survivors
: Random survivors per generation (default: 2)--generations
: Maximum generations (default: 100)--stagnation
: Early stop if no improvement for N generations (default: 5)
Example:
uv run main.py --children 5 --top_k 4 --generations 50
In a run with 30 generations, the (incorrectly computed) compression ratio reached 0.54.
Note: Correct ratio is
raw_size / compressed_size
~= 1.85.
Check algorithm.py
for the latest evolved compressor.
- Mutations are proposed and applied by
GPT-4.1
(or your OpenAI model of your choice). - Fitness = compression ratio on fixed text file.
- If round trip decompress fails, candidate is tossed.
- SQLite DB logs all code and scores for reproducibility.
- API cost: This chews through OpenAI credits fast.
- No multi-objective anything: Only single-minded compression optimization.
- No theoretical guarantees: Sometimes you just get junk mutations.
MIT, do whatever.