Skip to content

Commit 4e86247

Browse files
committed
feat: add benchmarks
1 parent ddc9832 commit 4e86247

File tree

11 files changed

+1904
-12
lines changed

11 files changed

+1904
-12
lines changed

README.md

Lines changed: 11 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,7 @@ TOON achieves **CSV-like compactness** while adding **explicit structure**, maki
2121

2222
### Key Features
2323

24-
-**Compact**: 30-60% smaller than JSON for structured data
24+
-**Compact**: **64% smaller** than JSON on average (tested on 50 datasets)
2525
-**Readable**: Clean, indentation-based syntax
2626
-**Structured**: Preserves nested objects and arrays
2727
-**Type-safe**: Supports strings, numbers, booleans, null
@@ -323,11 +323,18 @@ python examples/advanced_features.py
323323

324324
## Performance
325325

326-
TOON typically achieves:
327-
- **30-60% size reduction** vs JSON for structured data
328-
- **40-70% token reduction** with tabular data
326+
**Benchmarked across 50 diverse, real-world datasets:**
327+
328+
- **63.9% average size reduction** vs JSON for structured data
329+
- **54.1% average token reduction** (directly lowers LLM API costs)
330+
- **Up to 73.4% savings** for optimal use cases (tabular data, surveys, analytics)
331+
- **98% of datasets achieve 40%+ savings**
329332
- **Minimal overhead** in encoding/decoding (<1ms for typical payloads)
330333

334+
**💰 Cost Impact:** At GPT-4 pricing, TOON saves **$2,147 per million API requests** and **$5,408 per billion tokens**.
335+
336+
**[📊 View Full Benchmark Results →](benchmark/RESULTS.md)**
337+
331338
## Contributing
332339

333340
Contributions are welcome! Please:

README.zh-CN.md

Lines changed: 11 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@ TOON在实现**CSV般的紧凑性**的同时增加了**明确的结构**,非
1717

1818
### 主要特性
1919

20-
-**紧凑**比JSON结构化数据小30-60%
20+
-**紧凑**平均比JSON**小64%**(在50个数据集上测试)
2121
-**可读**:简洁、基于缩进的语法
2222
-**结构化**:保留嵌套对象和数组
2323
-**类型安全**:支持字符串、数字、布尔值、null
@@ -319,11 +319,18 @@ python examples/advanced_features.py
319319

320320
## 性能
321321

322-
TOON通常实现:
323-
- 与JSON相比,结构化数据**减少30-60%的大小**
324-
- 表格数据**减少40-70%的Token**
322+
**在50个多样化的真实数据集上进行基准测试:**
323+
324+
- 与JSON相比,结构化数据**平均减少63.9%的大小**
325+
- **平均减少54.1%的Token**(直接降低LLM API成本)
326+
- 最优使用场景**最高节省73.4%**(表格数据、调查、分析)
327+
- **98%的数据集实现40%以上的节省**
325328
- **最小的开销**用于编码/解码(典型有效负载<1ms)
326329

330+
**💰 成本影响:** 按GPT-4定价计算,TOON每百万次API请求**节省$2,147**,每十亿Token**节省$5,408**
331+
332+
**[📊 查看完整基准测试结果 →](benchmark/RESULTS.md)**
333+
327334
## 贡献
328335

329336
欢迎贡献!请:

assets/README.ko.md

Lines changed: 11 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,7 @@ TOON은 **CSV 수준의 간결함**을 달성하면서 **명시적인 구조**
2121

2222
### 주요 기능
2323

24-
-**간결함**: 구조화된 데이터의 경우 JSON보다 30-60% 작음
24+
-**간결함**: 평균적으로 JSON보다 **64% 작음** (50개 데이터셋 테스트 결과)
2525
-**가독성**: 깔끔하고 들여쓰기 기반의 구문
2626
-**구조화**: 중첩된 객체와 배열 보존
2727
-**타입 안전성**: 문자열, 숫자, 불리언, null 지원
@@ -323,11 +323,18 @@ python examples/advanced_features.py
323323

324324
## 성능
325325

326-
TOON은 일반적으로 다음을 달성합니다:
327-
- 구조화된 데이터의 경우 JSON 대비 **30-60% 크기 감소**
328-
- 테이블 데이터의 경우 **40-70% 토큰 감소**
326+
**50개의 다양한 실제 데이터셋에서 벤치마크 테스트:**
327+
328+
- 구조화된 데이터의 경우 JSON 대비 **평균 63.9% 크기 감소**
329+
- **평균 54.1% 토큰 감소** (LLM API 비용 직접 절감)
330+
- 최적 사용 사례에서 **최대 73.4% 절감** (테이블 데이터, 설문조사, 분석)
331+
- **98%의 데이터셋에서 40% 이상 절감 달성**
329332
- 인코딩/디코딩 시 **최소 오버헤드** (일반적인 페이로드의 경우 <1ms)
330333

334+
**💰 비용 영향:** GPT-4 가격 기준으로, TOON은 백만 건의 API 요청당 **$2,147 절감**, 10억 토큰당 **$5,408 절감**.
335+
336+
**[📊 전체 벤치마크 결과 보기 →](../benchmark/RESULTS.md)**
337+
331338
## 기여
332339

333340
기여를 환영합니다! 다음 단계를 따라주세요:

benchmark/QUICKSTART.md

Lines changed: 95 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,95 @@
1+
# Benchmark Quick Start
2+
3+
## Run All Benchmarks
4+
5+
The fastest way to see the memory savings:
6+
7+
```bash
8+
python benchmark/run_all.py
9+
```
10+
11+
This will run all benchmarks and provide a comprehensive summary.
12+
13+
## Run Individual Benchmarks
14+
15+
### 1. Compare Sizes and Tokens
16+
17+
```bash
18+
python benchmark/compare_formats.py
19+
```
20+
21+
This shows:
22+
- File size comparison (JSON vs TOON)
23+
- Token count comparison (for LLM APIs)
24+
- Encoding/decoding performance
25+
- Example outputs
26+
27+
### 2. Measure Memory Usage
28+
29+
```bash
30+
python benchmark/memory_benchmark.py
31+
```
32+
33+
This shows:
34+
- Actual memory consumption
35+
- Network bandwidth savings
36+
- Practical cost impact
37+
38+
## Expected Results
39+
40+
You should see:
41+
- **~58% average size reduction**
42+
- **~50% average token reduction**
43+
- **Up to 71% savings for tabular data**
44+
45+
## What This Means
46+
47+
If you're sending structured data to LLM APIs:
48+
- **50% fewer tokens** = **50% lower costs**
49+
- **Faster network transfers** = better performance
50+
- **Same data quality** = no loss of information
51+
52+
## Examples
53+
54+
### Before (JSON)
55+
```json
56+
{
57+
"products": [
58+
{"id": 1, "name": "Laptop", "price": 999},
59+
{"id": 2, "name": "Mouse", "price": 29}
60+
]
61+
}
62+
```
63+
**Size: 134 bytes, 48 tokens**
64+
65+
### After (TOON)
66+
```toon
67+
products[2]{id,name,price}:
68+
1,Laptop,999
69+
2,Mouse,29
70+
```
71+
**Size: 52 bytes, 23 tokens**
72+
73+
**Savings: 61.2% size, 52.1% tokens**
74+
75+
## Requirements
76+
77+
```bash
78+
pip install tiktoken # For token counting
79+
```
80+
81+
## Troubleshooting
82+
83+
If you get import errors, make sure you're in the project root:
84+
85+
```bash
86+
cd /path/to/toonify
87+
python benchmark/run_all.py
88+
```
89+
90+
Or install the package:
91+
92+
```bash
93+
pip install -e .
94+
python benchmark/run_all.py
95+
```

0 commit comments

Comments
 (0)