Skip to content

Commit d28bbb9

Browse files
committed
Add Rust compression/decompression benchmark
Adds a benchmark testing compression and decompression performance for two popular algorithms: Gzip (deflate) and Brotli. The benchmark processes ~1MB of mixed data (repeated patterns, structured data, natural text, and random bytes). Tests both compression and decompression in a single benchmark cycle, with verification that decompressed data matches the original.
1 parent 0145307 commit d28bbb9

File tree

11 files changed

+2708
-0
lines changed

11 files changed

+2708
-0
lines changed
Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
benchmark.wasm
Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
../Dockerfile.rust
Lines changed: 44 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,44 @@
1+
# Rust Compression/Decompression Benchmark
2+
3+
This benchmark tests compression and decompression performance for multiple algorithms commonly used in web and systems programming.
4+
5+
## What it tests
6+
7+
The benchmark performs compression and decompression using two algorithms:
8+
9+
1. **Gzip (Deflate)** - via `flate2` with pure Rust backend
10+
- Classic compression used in HTTP, gzip files, PNG images
11+
- Good balance of speed and compression ratio
12+
13+
2. **Brotli** - via `brotli` crate
14+
- Modern compression by Google, optimized for web content
15+
- Better compression ratios than gzip, especially for text/HTML/JSON
16+
17+
Each algorithm:
18+
- Compresses the input data
19+
- Decompresses it back
20+
- Verifies the output matches the original
21+
22+
## Input Data
23+
24+
The `default.input` file (~1 MB) contains a mix of:
25+
- Repeated patterns (compress very well)
26+
- Structured JSON-like data (compresses well)
27+
- Natural language text (compresses moderately)
28+
- Random bytes (doesn't compress well)
29+
30+
This mix provides a realistic workload showing how algorithms perform on different data types.
31+
32+
## Implementation
33+
34+
Uses:
35+
- `flate2` 1.0 with pure Rust backend for WASM compatibility
36+
- `brotli` 7.0 for Brotli compression
37+
38+
## Performance Notes
39+
40+
Expected compression ratios on the test data:
41+
- Gzip: ~24% (4:1 compression)
42+
- Brotli: ~7% (14:1 compression)
43+
44+
Brotli achieves better compression but typically requires more CPU time.
Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
[rust-compression] original size: 1731587 bytes
2+
[rust-compression] gzip compressed: 192012 bytes (11.1%)
3+
[rust-compression] brotli compressed: 167110 bytes (9.7%)

benchmarks/rust-compression/benchmark.stdout.expected

Whitespace-only changes.

benchmarks/rust-compression/default.input

Lines changed: 2391 additions & 0 deletions
Large diffs are not rendered by default.
Lines changed: 86 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,86 @@
1+
#!/usr/bin/env python3
2+
"""Generate input data for compression benchmark.
3+
4+
Creates a file with mixed content that compresses reasonably well:
5+
- Repeated text patterns
6+
- JSON-like structured data
7+
- Some random data
8+
"""
9+
10+
import random
11+
import string
12+
13+
14+
def generate_text_block(size):
15+
"""Generate semi-random text that compresses well."""
16+
words = [
17+
"the",
18+
"quick",
19+
"brown",
20+
"fox",
21+
"jumps",
22+
"over",
23+
"lazy",
24+
"dog",
25+
"hello",
26+
"world",
27+
"test",
28+
"data",
29+
"benchmark",
30+
"compression",
31+
"algorithm",
32+
"performance",
33+
"measure",
34+
"speed",
35+
"quality",
36+
]
37+
38+
text = []
39+
while len(" ".join(text)) < size:
40+
text.append(random.choice(words))
41+
42+
return " ".join(text)[:size]
43+
44+
45+
def generate_json_like_data():
46+
"""Generate JSON-like structured data."""
47+
data = []
48+
for i in range(1000):
49+
record = f'{{"id": {i}, "name": "user_{i}", "email": "user{i}@example.com", '
50+
record += f'"status": "active", "score": {random.randint(0, 100)}, '
51+
record += f'"tags": ["tag1", "tag2", "tag3"]}}\n'
52+
data.append(record)
53+
return "".join(data)
54+
55+
56+
def generate_repeated_pattern():
57+
"""Generate data with lots of repetition (compresses very well)."""
58+
pattern = "ABCDEFGHIJ" * 100
59+
return (pattern + "\n") * 1000
60+
61+
62+
def main():
63+
with open("default.input", "wb") as f:
64+
# Mix of different data types for realistic compression testing
65+
66+
# 1. Repeated patterns (compress very well)
67+
f.write(generate_repeated_pattern().encode("utf-8"))
68+
69+
# 2. Structured data (compresses well)
70+
f.write(generate_json_like_data().encode("utf-8"))
71+
72+
# 3. Natural language-like text (compresses moderately)
73+
f.write(generate_text_block(500000).encode("utf-8"))
74+
75+
# 4. Some random data (doesn't compress well)
76+
random_bytes = bytes(random.randint(0, 255) for _ in range(100000))
77+
f.write(random_bytes)
78+
79+
import os
80+
81+
size = os.path.getsize("default.input")
82+
print(f"Generated input file: {size} bytes ({size / 1024 / 1024:.2f} MB)")
83+
84+
85+
if __name__ == "__main__":
86+
main()
Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
target

benchmarks/rust-compression/rust-benchmark/Cargo.lock

Lines changed: 101 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.
Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
[package]
2+
name = "benchmark"
3+
version = "0.1.0"
4+
edition = "2021"
5+
6+
[dependencies]
7+
flate2 = { version = "1.0", default-features = false, features = ["rust_backend"] }
8+
brotli = "7.0"
9+
sightglass-api = "0.1"
10+
11+
[workspace]

0 commit comments

Comments
 (0)