Skip to content

navingate/alphapetri-pilot

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 

Repository files navigation

AlphaPetri: Pilot Study Data

Canonical Post: AlphaPetri: Automating LLM Safety Testing with Evolutionary Search on Petri. A Proposal and Pilot Study.
(Please update this link if you re-post the draft.)

Warning

⚠️ For AI Safety Research Only

This repository contains data from the AlphaPetri project, including seed prompts that are intentionally designed to elicit harmful, deceptive, or otherwise dangerous behaviours from Large Language Models.

These artefacts are shared exclusively for AI safety research, replication, and analysis. They are not intended for any other use. Do not use these prompts for malicious purposes or deploy them against non-research systems.

Overview

This repository provides the dataset of autonomously generated seed prompts from the AlphaPetri pilot study, as detailed in our LessWrong article.

AlphaPetri is a system for automating the seed-generation bottleneck in LLM safety evaluations (like Anthropic’s Petri) using an AlphaEvolve-inspired evolutionary search.

This dataset is shared for full transparency, to allow for replication of our pilot results, and to encourage further research into autonomous safety testing.

Data Files

The seed prompts are provided in two raw text files:

  • kimi_deception_seeds.txt — The 15 seeds generated in Experiment 1, targeting deceptive behaviours on Kimi K2 Instruct.
  • sonnet_all_seeds.txt — The 43 seeds evaluated in Experiment 2, tested for cross-model generalisation on Claude Sonnet 4.5.

Note: Prompts in these files are comma-and-newline-delimited.

How to Cite

If you use this data in your research, please cite the original LessWrong article:

Nav Kumar. (2025). AlphaPetri: Automating LLM Safety Testing with Evolutionary Search on Petri. A Proposal and Pilot Study. LessWrong. https://www.lesswrong.com/posts/S5qadHipGh9G6rKPD/alphapetri-fully-autonomous-llm-safety-testing-using-petri

About

Seed prompts for the "AlphaPetri" autonomous LLM safety testing pilot. For AI safety research only.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors