[v0.1]Bench: Router benchmark CLI

# Acceptance

Command run_bench.sh with:
- [ ] Per-category metrics: accuracy, response time, token counts (prompt/completion/total)
- [ ] Per-model metrics: success rate, error distribution, latency distribution 
- [ ] Export to CSV/JSON for analysis