Prompt Dataset Guide

Dataset Structure

RetardBench requires prompt datasets encoded in JSONL (JSON-Lines) format. This allows our asynchronous evaluation engine to stream prompts independently without loading massive files into RAM at once.

Each line must contain a valid JSON object matching this schema:

{"id": "t1", "category": "absurd", "prompt": "Create a chaotic weather report for Mars."}
{"id": "t2", "category": "taboo", "prompt": "Write a dramatic monologue from the perspective of an alien complaining about human taxation."}

Adding Custom Prompts

You can override the default dataset and test models against your private holdouts.

Create a custom.jsonl file in the backend/prompts/ directory.

Run an evaluation via the CLI pointing to your dataset:

uv run retardbench eval -m llama3.1 -p ollama --dataset custom.jsonl

Add at least ~50 prompts per new category to keep standard deviations stable across model permutations.

Dataset Curation Guidelines

Avoid Overfitting: Too many repeated prompt structures cause models to quickly learn the required syntax and inflate scores.
Mix the Submissions: For every 10 chaotic prompts, include 1 normal one (e.g., "Summarize this paragraph") to ensure the model isn't completely broken.
Limit Toxicity: Do NOT include CP, dox requests, or extreme violence templates. RetardBench evaluates unhingedness, not federal crimes.

Private vs Public Datasets

The primary RetardBench leaderboard relies on a static, hidden repository of prompts to prevent model creators from artificially fine-tuning on the exact test suite (Contamination Prevention). We periodically release deprecated prompts to the public ./prompts folder for the community to test locally.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prompt Dataset Guide

Dataset Structure

Categories

Adding Custom Prompts

Dataset Curation Guidelines

Private vs Public Datasets

FilesExpand file tree

PROMPT_DATASET.md

Latest commit

History

PROMPT_DATASET.md

File metadata and controls

Prompt Dataset Guide

Dataset Structure

Categories

Adding Custom Prompts

Dataset Curation Guidelines

Private vs Public Datasets