Run the probes yourself with different models or configurations:
python probes/proof_engine.py probe claude # Single model
python probes/proof_engine.py probe all # All modelsIf you get interesting results, open an issue or PR with your findings.
The core questions are in proof_engine.py. If you have questions that might reveal structural limits, propose them via issue.
The probe engine uses litellm, so adding models is straightforward. See MODELS dict in proof_engine.py.
The probe_runs/ folder contains raw JSON responses from 6 models across 57 questions. Analysis welcome:
- Statistical patterns in responses
- Convergence metrics
- Response length/complexity analysis
- Cross-model comparison
The strongest contribution is a genuine counter-argument. If you can find a flaw in BST or the methodology, that's valuable.
- Fork the repo
- Create a branch (
git checkout -b feature/your-idea) - Run the probes to verify your changes work
- Submit a PR with clear description of what you're adding/changing
- Python 3.8+
- Keep probe scripts self-contained
- Store results in appropriate
*_runs/directories - Include timestamps in output filenames
Open an issue or reach out: @MoKetchups