|
| 1 | +# Real-World Benchmarking with AFFiNE Dataset |
| 2 | + |
| 3 | +This directory contains a comprehensive benchmark suite that uses real JavaScript/TypeScript code from the [AFFiNE v0.23.2 release](https://github.com/toeverything/AFFiNE/releases/tag/v0.23.2) to evaluate JSON string escaping performance. |
| 4 | + |
| 5 | +## Why AFFiNE? |
| 6 | + |
| 7 | +AFFiNE is a modern, production TypeScript/JavaScript codebase that provides: |
| 8 | + |
| 9 | +- **Real-world complexity**: 6,448 source files totaling ~22MB |
| 10 | +- **Diverse content**: Mix of TypeScript, React JSX, configuration files |
| 11 | +- **Realistic escaping scenarios**: Actual strings, comments, and code patterns found in production |
| 12 | +- **Large scale**: Sufficient data volume to trigger SIMD optimizations |
| 13 | + |
| 14 | +## Dataset Characteristics |
| 15 | + |
| 16 | +- **Source**: AFFiNE v0.23.2 JavaScript/TypeScript files |
| 17 | +- **File count**: 6,448 files (.js, .jsx, .ts, .tsx) |
| 18 | +- **Total size**: ~22MB of source code |
| 19 | +- **Content types**: |
| 20 | + - React components with JSX |
| 21 | + - TypeScript interfaces and types |
| 22 | + - Configuration files |
| 23 | + - Test files |
| 24 | + - Documentation |
| 25 | + |
| 26 | +## Quick Start |
| 27 | + |
| 28 | +### 1. Automatic Setup |
| 29 | +```bash |
| 30 | +# Run the benchmark script - it will guide you through setup |
| 31 | +./benchmark.sh |
| 32 | +``` |
| 33 | + |
| 34 | +### 2. Manual Setup |
| 35 | +```bash |
| 36 | +# Download AFFiNE v0.23.2 |
| 37 | +mkdir -p /tmp/affine && cd /tmp/affine |
| 38 | +curl -L "https://github.com/toeverything/AFFiNE/archive/refs/tags/v0.23.2.tar.gz" -o affine-v0.23.2.tar.gz |
| 39 | +tar -xzf affine-v0.23.2.tar.gz |
| 40 | + |
| 41 | +# Collect JavaScript/TypeScript files |
| 42 | +mkdir -p benchmark_data |
| 43 | +find /tmp/affine/AFFiNE-0.23.2 -name "*.ts" -o -name "*.tsx" -o -name "*.js" -o -name "*.jsx" -type f | \ |
| 44 | + while IFS= read -r file; do |
| 45 | + echo "// File: $file" >> benchmark_data/all_files.js |
| 46 | + cat "$file" >> benchmark_data/all_files.js |
| 47 | + echo -e "\n\n" >> benchmark_data/all_files.js |
| 48 | + done |
| 49 | + |
| 50 | +# Create file list for individual processing |
| 51 | +find /tmp/affine/AFFiNE-0.23.2 -name "*.ts" -o -name "*.tsx" -o -name "*.js" -o -name "*.jsx" -type f > benchmark_data/file_list.txt |
| 52 | +``` |
| 53 | + |
| 54 | +### 3. Run Benchmarks |
| 55 | +```bash |
| 56 | +# Quick comparison |
| 57 | +./benchmark.sh compare |
| 58 | + |
| 59 | +# Hyperfine benchmark (requires hyperfine) |
| 60 | +./benchmark.sh hyperfine |
| 61 | + |
| 62 | +# All benchmarks |
| 63 | +./benchmark.sh all |
| 64 | +``` |
| 65 | + |
| 66 | +## Benchmark Modes |
| 67 | + |
| 68 | +### 1. Quick Comparison (`compare`) |
| 69 | +Uses internal timing to compare SIMD vs fallback implementations: |
| 70 | +```bash |
| 71 | +cargo run --release --bin affine_bench -- compare |
| 72 | +# or |
| 73 | +./benchmark.sh compare |
| 74 | +``` |
| 75 | + |
| 76 | +### 2. Hyperfine Benchmark (`hyperfine`) |
| 77 | +Uses the `hyperfine` tool for precise, statistical benchmarking: |
| 78 | +```bash |
| 79 | +hyperfine --warmup 3 --runs 10 \ |
| 80 | + './target/release/affine_bench hyperfine simd' \ |
| 81 | + './target/release/affine_bench hyperfine fallback' |
| 82 | +# or |
| 83 | +./benchmark.sh hyperfine |
| 84 | +``` |
| 85 | + |
| 86 | +### 3. Individual Files (`individual`) |
| 87 | +Processes each file separately to measure cumulative performance: |
| 88 | +```bash |
| 89 | +cargo run --release --bin affine_bench -- individual |
| 90 | +# or |
| 91 | +./benchmark.sh individual |
| 92 | +``` |
| 93 | + |
| 94 | +### 4. Single Implementation Testing |
| 95 | +Test specific implementations in isolation: |
| 96 | +```bash |
| 97 | +# SIMD only |
| 98 | +./benchmark.sh simd |
| 99 | + |
| 100 | +# Fallback only |
| 101 | +./benchmark.sh fallback |
| 102 | +``` |
| 103 | + |
| 104 | +## Binary Usage |
| 105 | + |
| 106 | +The `affine_bench` binary provides several modes: |
| 107 | + |
| 108 | +```bash |
| 109 | +# Build the binary |
| 110 | +cargo build --release --bin affine_bench |
| 111 | + |
| 112 | +# Usage |
| 113 | +./target/release/affine_bench <mode> [options] |
| 114 | + |
| 115 | +# Modes: |
| 116 | +# simd - Benchmark optimized SIMD implementation |
| 117 | +# fallback - Benchmark fallback implementation |
| 118 | +# compare - Compare both implementations |
| 119 | +# individual - Process individual files from AFFiNE |
| 120 | +# hyperfine - Silent mode for hyperfine benchmarking |
| 121 | +``` |
| 122 | + |
| 123 | +## Installing Hyperfine |
| 124 | + |
| 125 | +### Option 1: Package Manager |
| 126 | +```bash |
| 127 | +# Debian/Ubuntu |
| 128 | +sudo apt install hyperfine |
| 129 | + |
| 130 | +# macOS |
| 131 | +brew install hyperfine |
| 132 | + |
| 133 | +# Arch Linux |
| 134 | +pacman -S hyperfine |
| 135 | +``` |
| 136 | + |
| 137 | +### Option 2: Cargo |
| 138 | +```bash |
| 139 | +cargo install hyperfine |
| 140 | +``` |
| 141 | + |
| 142 | +### Option 3: Direct Download |
| 143 | +```bash |
| 144 | +# Linux x86_64 |
| 145 | +curl -L https://github.com/sharkdp/hyperfine/releases/download/v1.18.0/hyperfine-v1.18.0-x86_64-unknown-linux-gnu.tar.gz | tar xz |
| 146 | +sudo mv hyperfine-v1.18.0-x86_64-unknown-linux-gnu/hyperfine /usr/local/bin/ |
| 147 | +``` |
| 148 | + |
| 149 | +## Expected Results |
| 150 | + |
| 151 | +### On x86_64 |
| 152 | +Both implementations should perform similarly since the SIMD optimizations are aarch64-specific: |
| 153 | + |
| 154 | +``` |
| 155 | +SIMD implementation: 38.5 ms ± 0.5 ms |
| 156 | +Fallback implementation: 38.6 ms ± 0.2 ms |
| 157 | +Result: Equivalent performance (expected) |
| 158 | +``` |
| 159 | + |
| 160 | +### On aarch64 (Apple Silicon, AWS Graviton, etc.) |
| 161 | +The SIMD implementation should show significant improvements: |
| 162 | + |
| 163 | +``` |
| 164 | +SIMD implementation: 25.2 ms ± 0.3 ms |
| 165 | +Fallback implementation: 38.6 ms ± 0.2 ms |
| 166 | +Result: SIMD is 53% faster |
| 167 | +``` |
| 168 | + |
| 169 | +## Data File Structure |
| 170 | + |
| 171 | +``` |
| 172 | +benchmark_data/ |
| 173 | +├── all_files.js # All JS/TS files concatenated (22MB) |
| 174 | +└── file_list.txt # List of original file paths (6,448 lines) |
| 175 | +``` |
| 176 | + |
| 177 | +The `all_files.js` contains all source files with headers indicating the original file path: |
| 178 | + |
| 179 | +```javascript |
| 180 | +// File: /tmp/affine/AFFiNE-0.23.2/vitest.config.ts |
| 181 | +import { resolve } from 'node:path'; |
| 182 | +// ... file content ... |
| 183 | + |
| 184 | + |
| 185 | +// File: /tmp/affine/AFFiNE-0.23.2/packages/common/infra/src/index.ts |
| 186 | +export * from './framework'; |
| 187 | +// ... file content ... |
| 188 | +``` |
| 189 | + |
| 190 | +## Performance Insights |
| 191 | + |
| 192 | +This real-world benchmark reveals: |
| 193 | + |
| 194 | +1. **Large file handling**: How the library performs with production-scale codebases |
| 195 | +2. **Mixed content patterns**: Performance across different JavaScript/TypeScript constructs |
| 196 | +3. **Memory efficiency**: Behavior with substantial string processing workloads |
| 197 | +4. **SIMD effectiveness**: Real-world impact of vectorized processing |
| 198 | + |
| 199 | +The AFFiNE dataset is ideal because it contains the complex, nested string patterns found in modern web applications, making it a much more realistic test than synthetic benchmarks. |
0 commit comments