Expand scoring evaluation into arXiv preprint

## Summary
The scoring evaluation in docs/scoring-evaluation.md has compelling findings (Copeland-Borda 86% concordance, weighted as outlier) that deserve a proper academic write-up. An arXiv preprint would:
1. Establish thinktank as a research-backed tool, not just another CLI
2. Make the work discoverable by researchers working on ensemble code generation
3. Create a citable reference for the social-choice-theory approach to agent selection
4. Attract academic contributors and reviewers

## Proposed paper structure

### Title
"Ensemble AI Coding: Applying Social Choice Theory to Multi-Agent Code Selection"

### Sections
1. **Introduction** — the ensemble coding hypothesis, pass@k evidence
2. **Related Work** — AlphaCode, CodeT, MBR-Exec, SWE-bench, Kambhampati LLM-Modulo
3. **System Design** — thinktank architecture, worktree isolation, convergence analysis
4. **Scoring Methods** — Weighted Sum, Copeland, Borda (formal definitions)
5. **Experimental Setup** — controlled experiments with fixed N=5 across diverse tasks
6. **Results** — agreement rates, Friedman test, Kendall's W, effect sizes
7. **Discussion** — why pairwise methods outperform, limitations, when weighted is appropriate
8. **Conclusion** — Copeland as default, future work (LLM-as-judge, cross-project)

### What needs to happen
- [ ] Run controlled experiments: 30+ tasks with fixed N=5 agents each
- [ ] Diverse task set: bug fixes, features, refactors across multiple languages
- [ ] Formal Friedman test with Nemenyi post-hoc
- [ ] Kendall's W for inter-method concordance
- [ ] Figures: agreement heatmaps, score distributions, convergence vs accuracy plots
- [ ] LLM-as-judge ground truth for a subset of runs
- [ ] LaTeX formatting per arXiv cs.SE guidelines
- [ ] The repo README links to the arXiv paper
- [ ] The paper links to the repo as reference implementation

## References to cite
- Arrow (1951) Social Choice and Individual Values
- Merlin & Valognes (2004) Condorcet-Borda coincidence
- Tetlock & Gardner (2015) Superforecasting
- Kambhampati (2024) LLM-Modulo framework
- Li et al (2022) AlphaCode
- Chen et al (2022) CodeT
- Wang et al (2022) Self-Consistency

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Expand scoring evaluation into arXiv preprint #108

Summary

Proposed paper structure

Title

Sections

What needs to happen

References to cite

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Expand scoring evaluation into arXiv preprint #108

Description

Summary

Proposed paper structure

Title

Sections

What needs to happen

References to cite

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions