Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -501,8 +501,8 @@ TSWLatexianTemp*
.cursor*/

# Benchmark results (autogenerated)
reproduce/benchmarks/results/autogenerated/
reproduce/benchmarks/results/latest.json
README_IF_YOU_ARE_AN_AI/benchmarks/results/autogenerated/
README_IF_YOU_ARE_AN_AI/benchmarks/results/latest.json
REMARK

# MyST PDF build artifacts (generated PNG files in tex export directories)
Expand Down
48 changes: 0 additions & 48 deletions BENCHMARKING-PLAN.md

This file was deleted.

2 changes: 1 addition & 1 deletion README_IF_YOU_ARE_AN_AI/CODE_MAP.md
Original file line number Diff line number Diff line change
Expand Up @@ -129,7 +129,7 @@ This document explains what each file in the repository does.
2. Run tests
3. Build HTML only

### `reproduce/benchmarks/`
### `README_IF_YOU_ARE_AN_AI/benchmarks/`
**Purpose**: Benchmarking infrastructure

**Files**:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -10,26 +10,26 @@ The Method of Moderation benchmarking system follows industry-standard practices

```bash
# Benchmark minimal reproduction (~5 minutes)
./reproduce/benchmarks/benchmark.sh --min
./README_IF_YOU_ARE_AN_AI/benchmarks/benchmark.sh --min

# Benchmark full reproduction (all tests, paper, notebooks)
./reproduce/benchmarks/benchmark.sh
./README_IF_YOU_ARE_AN_AI/benchmarks/benchmark.sh

# With notes
./reproduce/benchmarks/benchmark.sh --min --notes "Testing M1 Max performance"
./README_IF_YOU_ARE_AN_AI/benchmarks/benchmark.sh --min --notes "Testing M1 Max performance"
```

### Viewing Results

```bash
# View latest benchmark
cat reproduce/benchmarks/results/latest.json | jq .
cat README_IF_YOU_ARE_AN_AI/benchmarks/results/latest.json | jq .

# View system info from latest
cat reproduce/benchmarks/results/latest.json | jq '.system'
cat README_IF_YOU_ARE_AN_AI/benchmarks/results/latest.json | jq '.system'

# Check duration
cat reproduce/benchmarks/results/latest.json | jq '.duration_seconds'
cat README_IF_YOU_ARE_AN_AI/benchmarks/results/latest.json | jq '.duration_seconds'
```

## What Gets Captured
Expand Down Expand Up @@ -96,7 +96,7 @@ Example structure:
## Storage Location

```
reproduce/benchmarks/
README_IF_YOU_ARE_AN_AI/benchmarks/
├── README.md # Overview and documentation
├── BENCHMARKING_GUIDE.md # This file
├── schema.json # JSON schema for validation
Expand Down Expand Up @@ -134,7 +134,7 @@ jq 'del(.system.hostname, .metadata.user, .environment.virtual_env)' \
results/benchmark.json > benchmark_anonymous.json

# Commit reference benchmark
git add -f reproduce/benchmarks/results/saved/20250117_reference_m1max.json
git add -f README_IF_YOU_ARE_AN_AI/benchmarks/results/saved/20250117_reference_m1max.json
git commit -m "Add reference benchmark: M1 Max 2021"
```

Expand All @@ -145,10 +145,10 @@ Track reproduction time over different code versions:

```bash
# Before optimization
./reproduce/benchmarks/benchmark.sh --min --notes "Before optimization"
./README_IF_YOU_ARE_AN_AI/benchmarks/benchmark.sh --min --notes "Before optimization"

# After optimization
./reproduce/benchmarks/benchmark.sh --min --notes "After optimization"
./README_IF_YOU_ARE_AN_AI/benchmarks/benchmark.sh --min --notes "After optimization"

# Compare
jq '.duration_seconds' results/autogenerated/*.json
Expand All @@ -159,10 +159,10 @@ Compare performance across different machines:

```bash
# On laptop
./reproduce/benchmarks/benchmark.sh --min --notes "MacBook Pro M1 2021"
./README_IF_YOU_ARE_AN_AI/benchmarks/benchmark.sh --min --notes "MacBook Pro M1 2021"

# On desktop
./reproduce/benchmarks/benchmark.sh --min --notes "AMD Ryzen 9 5900X"
./README_IF_YOU_ARE_AN_AI/benchmarks/benchmark.sh --min --notes "AMD Ryzen 9 5900X"
```

### 3. Environment Comparison
Expand All @@ -171,23 +171,23 @@ Compare UV vs Conda:
```bash
# With UV
source .venv-linux-aarch64/bin/activate
./reproduce/benchmarks/benchmark.sh --min --notes "UV environment"
./README_IF_YOU_ARE_AN_AI/benchmarks/benchmark.sh --min --notes "UV environment"

# With Conda
conda activate moderation
./reproduce/benchmarks/benchmark.sh --min --notes "Conda environment"
./README_IF_YOU_ARE_AN_AI/benchmarks/benchmark.sh --min --notes "Conda environment"
```

### 4. CI/CD Integration
Automated performance regression detection:

```yaml
- name: Run Benchmark
run: ./reproduce/benchmarks/benchmark.sh --min
run: ./README_IF_YOU_ARE_AN_AI/benchmarks/benchmark.sh --min

- name: Upload Results
uses: actions/upload-artifact@v3
with:
name: benchmark-results
path: reproduce/benchmarks/results/latest.json
path: README_IF_YOU_ARE_AN_AI/benchmarks/results/latest.json
```
Original file line number Diff line number Diff line change
Expand Up @@ -17,26 +17,26 @@ Track and document the time required to reproduce Method of Moderation results a

```bash
# Benchmark minimal reproduction (<5 minutes)
./reproduce/benchmarks/benchmark.sh --min
./README_IF_YOU_ARE_AN_AI/benchmarks/benchmark.sh --min

# Benchmark full reproduction (all tests, paper, notebooks)
./reproduce/benchmarks/benchmark.sh
./README_IF_YOU_ARE_AN_AI/benchmarks/benchmark.sh

# With notes
./reproduce/benchmarks/benchmark.sh --min --notes "Testing M1 Max performance"
./README_IF_YOU_ARE_AN_AI/benchmarks/benchmark.sh --min --notes "Testing M1 Max performance"
```

### Viewing Results

```bash
# View latest benchmark
cat reproduce/benchmarks/results/latest.json | jq .
cat README_IF_YOU_ARE_AN_AI/benchmarks/results/latest.json | jq .

# View system info from latest
cat reproduce/benchmarks/results/latest.json | jq '.system'
cat README_IF_YOU_ARE_AN_AI/benchmarks/results/latest.json | jq '.system'

# Check duration
cat reproduce/benchmarks/results/latest.json | jq '.duration_seconds'
cat README_IF_YOU_ARE_AN_AI/benchmarks/results/latest.json | jq '.duration_seconds'
```

## Benchmark Format
Expand All @@ -54,7 +54,7 @@ Each benchmark includes:
## Directory Structure

```
reproduce/benchmarks/
README_IF_YOU_ARE_AN_AI/benchmarks/
├── README.md # This file
├── BENCHMARKING_GUIDE.md # Detailed usage guide
├── schema.json # JSON schema for validation
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -6,9 +6,9 @@
# and system information for benchmarking purposes.
#
# Usage:
# ./reproduce/benchmarks/benchmark.sh # Benchmark full reproduction
# ./reproduce/benchmarks/benchmark.sh --min # Benchmark minimal reproduction
# ./reproduce/benchmarks/benchmark.sh --min --notes "Testing M1 Mac"
# ./README_IF_YOU_ARE_AN_AI/benchmarks/benchmark.sh # Benchmark full reproduction
# ./README_IF_YOU_ARE_AN_AI/benchmarks/benchmark.sh --min # Benchmark minimal reproduction
# ./README_IF_YOU_ARE_AN_AI/benchmarks/benchmark.sh --min --notes "Testing M1 Mac"

set -euo pipefail

Expand Down Expand Up @@ -38,7 +38,7 @@ while [[ $# -gt 0 ]]; do
;;
*)
echo "Unknown option: $1"
echo "Usage: $0 [--min] [--notes "note"]"
echo "Usage: $0 [--min] [--notes \"note\"]"
exit 1
;;
esac
Expand Down
Loading