Canonical Post: AlphaPetri: Automating LLM Safety Testing with Evolutionary Search on Petri. A Proposal and Pilot Study.
(Please update this link if you re-post the draft.)
Warning
This repository contains data from the AlphaPetri project, including seed prompts that are intentionally designed to elicit harmful, deceptive, or otherwise dangerous behaviours from Large Language Models.
These artefacts are shared exclusively for AI safety research, replication, and analysis. They are not intended for any other use. Do not use these prompts for malicious purposes or deploy them against non-research systems.
This repository provides the dataset of autonomously generated seed prompts from the AlphaPetri pilot study, as detailed in our LessWrong article.
AlphaPetri is a system for automating the seed-generation bottleneck in LLM safety evaluations (like Anthropic’s Petri) using an AlphaEvolve-inspired evolutionary search.
This dataset is shared for full transparency, to allow for replication of our pilot results, and to encourage further research into autonomous safety testing.
The seed prompts are provided in two raw text files:
- kimi_deception_seeds.txt — The 15 seeds generated in Experiment 1, targeting deceptive behaviours on Kimi K2 Instruct.
- sonnet_all_seeds.txt — The 43 seeds evaluated in Experiment 2, tested for cross-model generalisation on Claude Sonnet 4.5.
Note: Prompts in these files are comma-and-newline-delimited.
If you use this data in your research, please cite the original LessWrong article:
Nav Kumar. (2025). AlphaPetri: Automating LLM Safety Testing with Evolutionary Search on Petri. A Proposal and Pilot Study. LessWrong. https://www.lesswrong.com/posts/S5qadHipGh9G6rKPD/alphapetri-fully-autonomous-llm-safety-testing-using-petri