Skip to content

Commit d6574a5

Browse files
Thomas StrombergThomas Stromberg
authored andcommitted
more benchmark tuning
1 parent 30502c2 commit d6574a5

File tree

5 files changed

+205
-80
lines changed

5 files changed

+205
-80
lines changed

Makefile

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@ bench:
1313
go test -bench=. -benchmem
1414

1515
benchmark:
16-
@cd benchmarks && go test -run=TestBenchmarkSuite -v -timeout=120s
16+
@cd benchmarks && go test -run=TestBenchmarkSuite -v -timeout=300s
1717

1818
clean:
1919
go clean -testcache

README.md

Lines changed: 105 additions & 40 deletions
Original file line numberDiff line numberDiff line change
@@ -58,51 +58,116 @@ cache, _ := bdcache.New[string, User](ctx,
5858
- **Graceful degradation** - Cache works even if persistence fails
5959
- **Per-item TTL** - Optional expiration
6060

61-
## Performance
61+
## Performance against the Competition
6262

63-
For performance, bdcache biases toward:
63+
bdcache biases toward being the highest hit-rate for real-world workloads with the lowest read latency. We're not bad either!
6464

65-
* the highest hit-rate in real-world workloads
66-
* the lowest CPU overhead for reads (high ns/op)
65+
* #1 in hit-rate for real-world workloads (zipf)
66+
* #1 in single-threaded read latency (9 ns/op) - half the competition
67+
* #1 for read/write throughput - 10X faster writes than otter!
6768

68-
### CPU Overhead
69-
70-
Benchmarks on MacBook Pro M4 Max comparing memory-only operations:
71-
72-
| Operation | bdcache | LRU | ristretto | otter |
73-
|-----------|-------------|-------------|-------------|-------------|
74-
| Get | 8.63 ns/op | 14.03 ns/op | 30.24 ns/op | 15.25 ns/op |
75-
| Set | 14.03 ns/op | 13.94 ns/op | 96.24 ns/op | 141.3 ns/op |
76-
77-
bdcache is faster than anyone else for Get operations, while still faster than most implementations for Set.
78-
79-
See [benchmarks/](benchmarks/) for detailed methodology and running instructions.
80-
81-
## Hit Rates
82-
83-
For hit rates, bdcache is competitive with otter & tinylfu, often nudging out both depending on the benchmark scenario. Here's an independent benchmark using [scalalang2/go-cache-benchmark](https://github.com/scalalang2/go-cache-benchmark):
69+
Here's the results from an M4 MacBook Pro - run `make bench` to see the results for yourself:
8470

8571
```
86-
itemSize=500000, workloads=7500000, cacheSize=1.00%, zipf's alpha=0.99, concurrency=16
87-
88-
CACHE | HITRATE | QPS | HITS | MISSES
89-
------------------+---------+----------+---------+----------
90-
bdcache | 64.45% | 5572065 | 4833482 | 2666518
91-
tinylfu | 63.94% | 2357008 | 4795685 | 2704315
92-
s3-fifo | 63.57% | 2899111 | 4767672 | 2732328
93-
sieve | 63.40% | 2697842 | 4754699 | 2745301
94-
slru | 62.88% | 2655807 | 4715817 | 2784183
95-
s4lru | 62.67% | 2877974 | 4700060 | 2799940
96-
two-queue | 61.99% | 2362205 | 4649519 | 2850481
97-
otter | 61.86% | 9457755 | 4639781 | 2860219
98-
clock | 56.11% | 2956248 | 4208167 | 3291833
99-
freelru-sharded | 55.45% | 21067416 | 4159005 | 3340995
100-
freelru-synced | 55.38% | 4244482 | 4153156 | 3346844
101-
lru-groupcache | 55.37% | 2463863 | 4153022 | 3346978
102-
lru-hashicorp | 55.36% | 2776749 | 4152099 | 3347901
103-
```
104-
105-
The QPS in this benchmark represents mixed Get/Set - otter in particular shines at concurrency
72+
### Hit Rate (Zipf α=0.99, 1M ops, 1M keyspace)
73+
74+
| Cache | Size=2.5% | Size=5% | Size=10% |
75+
|------------|-----------|---------|----------|
76+
| bdcache | 94.89% | 95.09% | 95.09% |
77+
| otter | 94.69% | 95.09% | 95.09% |
78+
| ristretto | 92.45% | 93.02% | 93.55% |
79+
| tinylfu | 94.87% | 95.09% | 95.09% |
80+
| freecache | 94.15% | 94.75% | 95.09% |
81+
| lru | 94.84% | 95.09% | 95.09% |
82+
83+
### Single-Threaded Latency (sorted by Get)
84+
85+
| Cache | Get ns/op | Get B/op | Get allocs | Set ns/op | Set B/op | Set allocs |
86+
|------------|-----------|----------|------------|-----------|----------|------------|
87+
| bdcache | 9.0 | 0 | 0 | 21.0 | 1 | 0 |
88+
| lru | 23.0 | 0 | 0 | 22.0 | 0 | 0 |
89+
| ristretto | 32.0 | 14 | 0 | 67.0 | 119 | 3 |
90+
| otter | 35.0 | 0 | 0 | 139.0 | 51 | 1 |
91+
| freecache | 71.0 | 15 | 1 | 56.0 | 4 | 0 |
92+
| tinylfu | 83.0 | 3 | 0 | 106.0 | 175 | 3 |
93+
94+
### Single-Threaded Throughput (mixed read/write)
95+
96+
| Cache | Get QPS | Set QPS |
97+
|------------|------------|------------|
98+
| bdcache | 75.69M | 43.02M |
99+
| lru | 36.51M | 36.91M |
100+
| ristretto | 27.79M | 13.96M |
101+
| otter | 25.36M | 7.43M |
102+
| freecache | 13.12M | 16.20M |
103+
| tinylfu | 11.27M | 9.07M |
104+
105+
### Concurrent Throughput (mixed read/write): 4 threads
106+
107+
| Cache | Get QPS | Set QPS |
108+
|------------|------------|------------|
109+
| otter | 29.15M | 4.33M |
110+
| bdcache | 28.75M | 30.41M |
111+
| ristretto | 26.98M | 13.33M |
112+
| freecache | 25.14M | 21.76M |
113+
| lru | 9.22M | 9.32M |
114+
| tinylfu | 5.42M | 4.93M |
115+
116+
### Concurrent Throughput (mixed read/write): 8 threads
117+
118+
| Cache | Get QPS | Set QPS |
119+
|------------|------------|------------|
120+
| bdcache | 21.98M | 18.58M |
121+
| otter | 19.51M | 2.98M |
122+
| ristretto | 18.43M | 11.15M |
123+
| freecache | 16.79M | 15.99M |
124+
| lru | 7.72M | 7.71M |
125+
| tinylfu | 4.88M | 4.18M |
126+
127+
### Concurrent Throughput (mixed read/write): 12 threads
128+
129+
| Cache | Get QPS | Set QPS |
130+
|------------|------------|------------|
131+
| bdcache | 23.59M | 23.84M |
132+
| ristretto | 22.35M | 11.22M |
133+
| otter | 21.69M | 2.83M |
134+
| freecache | 16.95M | 16.43M |
135+
| lru | 7.43M | 7.44M |
136+
| tinylfu | 4.50M | 4.07M |
137+
138+
### Concurrent Throughput (mixed read/write): 16 threads
139+
140+
| Cache | Get QPS | Set QPS |
141+
|------------|------------|------------|
142+
| bdcache | 16.04M | 15.65M |
143+
| otter | 15.76M | 2.82M |
144+
| ristretto | 15.36M | 12.12M |
145+
| freecache | 14.61M | 14.59M |
146+
| lru | 7.36M | 7.46M |
147+
| tinylfu | 4.66M | 3.31M |
148+
149+
### Concurrent Throughput (mixed read/write): 24 threads
150+
151+
| Cache | Get QPS | Set QPS |
152+
|------------|------------|------------|
153+
| otter | 16.04M | 2.84M |
154+
| bdcache | 16.04M | 15.41M |
155+
| ristretto | 16.03M | 12.90M |
156+
| freecache | 14.62M | 14.35M |
157+
| lru | 7.52M | 7.74M |
158+
| tinylfu | 4.95M | 3.34M |
159+
160+
### Concurrent Throughput (mixed read/write): 32 threads
161+
162+
| Cache | Get QPS | Set QPS |
163+
|------------|------------|------------|
164+
| bdcache | 16.45M | 15.37M |
165+
| otter | 15.62M | 2.84M |
166+
| ristretto | 15.47M | 13.35M |
167+
| freecache | 14.58M | 14.29M |
168+
| lru | 7.77M | 7.92M |
169+
| tinylfu | 5.23M | 3.50M |
170+
```
106171

107172
## License
108173

benchmarks/benchmark_test.go

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,6 @@ import (
77
"fmt"
88
"math"
99
"math/rand/v2"
10-
"runtime"
1110
"strconv"
1211
"sync"
1312
"sync/atomic"
@@ -409,7 +408,7 @@ type concurrentResult struct {
409408
}
410409

411410
func runConcurrentBenchmark() {
412-
threadCounts := []int{1, 4, 8, runtime.NumCPU()}
411+
threadCounts := []int{1, 4, 8, 12, 16, 24, 32}
413412
caches := []string{"bdcache", "otter", "ristretto", "tinylfu", "freecache", "lru"}
414413

415414
for _, threads := range threadCounts {

0 commit comments

Comments
 (0)