Skip to content

Commit 0daaba8

Browse files
committed
Add stratum fastpath benchmarks and benchmark summaries
1 parent 441ef1f commit 0daaba8

File tree

2 files changed

+216
-0
lines changed

2 files changed

+216
-0
lines changed

documentation/TESTING.md

Lines changed: 123 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -72,6 +72,129 @@ go test -race ./...
7272
### Performance / Timing
7373
- **`submit_timing_test.go`** - Measures latency from `handleBlockShare` entry to `submitblock` invocation
7474
- Benchmark suites live alongside the code as `*_bench_test.go` files; run them with `go test -run '^$' -bench . -benchmem ./...`.
75+
- **`miner_decode_bench_test.go`** - Stratum decode microbenchmarks comparing full JSON unmarshal vs fast/manual sniffing for `ping`, `subscribe`, `authorize`, and `submit`.
76+
- **`stratum_fastpath_bench_test.go`** - Stratum encode microbenchmarks comparing normal vs fast-path response encoding (`true`, `pong`, subscribe response in CKPool and expanded modes).
77+
78+
### Stratum Fast-Path Benchmarks
79+
80+
Use these commands to compare normal vs fast decode/encode paths without running unit tests:
81+
82+
```bash
83+
# Decode comparison (full JSON unmarshal vs fast/manual sniff path)
84+
go test -run '^$' -bench 'BenchmarkStratumDecode(FastJSON|Manual)' -benchmem .
85+
86+
# Encode comparison (normal vs fast response encoding)
87+
go test -run '^$' -bench 'BenchmarkStratumEncode' -benchmem .
88+
89+
# Run both together
90+
go test -run '^$' -bench 'BenchmarkStratum(Decode(FastJSON|Manual)|Encode)' -benchmem .
91+
```
92+
93+
For more stable comparisons across changes/machines, run multiple samples and (optionally) compare with `benchstat`:
94+
95+
```bash
96+
# Baseline / candidate example
97+
go test -run '^$' -bench 'BenchmarkStratum(Decode(FastJSON|Manual)|Encode)' -benchmem -count=5 . > before.txt
98+
go test -run '^$' -bench 'BenchmarkStratum(Decode(FastJSON|Manual)|Encode)' -benchmem -count=5 . > after.txt
99+
100+
# Optional (if benchstat is installed)
101+
benchstat before.txt after.txt
102+
```
103+
104+
### Stratum Fast-Path Benchmark Snapshot (example)
105+
106+
Example local run command:
107+
108+
```bash
109+
go test -run '^$' -bench 'BenchmarkStratum(Decode(FastJSON|Manual)|Encode)' -benchmem -benchtime=100ms .
110+
```
111+
112+
Environment for the sample numbers below:
113+
114+
- `goos`: `linux`
115+
- `goarch`: `amd64`
116+
- `cpu`: `AMD Ryzen 9 7950X 16-Core Processor`
117+
- `pkg`: `goPool`
118+
119+
Key results (microbenchmarks):
120+
121+
- **Decode (`mining.submit`)**
122+
- Full decode (`fastJSONUnmarshal`): `366.6 ns/op`, `461 B/op`, `11 allocs/op`
123+
- Fast/manual sniff path: `107.3 ns/op`, `0 B/op`, `0 allocs/op`
124+
- Roughly **3.4x faster** with the fast path in this benchmark
125+
- **Decode (`mining.ping`)**
126+
- Full decode: `129.8 ns/op`, `106 B/op`, `3 allocs/op`
127+
- Fast/manual sniff path: `39.22 ns/op`, `0 B/op`, `0 allocs/op`
128+
- Roughly **3.3x faster**
129+
- **Encode (`true` response)**
130+
- Normal encode: `157.6 ns/op`, `204 B/op`, `4 allocs/op`
131+
- Fast encode: `48.34 ns/op`, `0 B/op`, `0 allocs/op`
132+
- Roughly **3.3x faster**
133+
- **Encode (`pong` response)**
134+
- Normal encode: `168.9 ns/op`, `205 B/op`, `4 allocs/op`
135+
- Fast encode: `45.13 ns/op`, `0 B/op`, `0 allocs/op`
136+
- Roughly **3.7x faster**
137+
- **Encode (`mining.subscribe`, CKPool mode)**
138+
- Normal encode: `346.7 ns/op`, `501 B/op`, `11 allocs/op`
139+
- Fast encode: `62.73 ns/op`, `0 B/op`, `0 allocs/op`
140+
- Roughly **5.5x faster**
141+
- **Encode (`mining.subscribe`, expanded mode)**
142+
- Normal encode: `630.7 ns/op`, `1063 B/op`, `17 allocs/op`
143+
- Fast encode: `105.9 ns/op`, `0 B/op`, `0 allocs/op`
144+
- Roughly **6.0x faster**
145+
146+
Notes:
147+
148+
- These are **microbenchmarks** of parsing/encoding paths, not full end-to-end pool throughput benchmarks.
149+
- Re-run on your target hardware and compare with `benchstat` before using the numbers for capacity planning.
150+
151+
### Hex Fast-Path Benchmarks
152+
153+
Hex encode/decode microbenchmarks live in `job_utils_hex_bench_test.go` and compare LUT-based helpers vs stdlib (`encoding/hex`) and alternate implementations.
154+
155+
Example focused command (decode + encode + uint32 hex parse):
156+
157+
```bash
158+
go test -run '^$' -bench 'Benchmark(DecodeHexToFixedBytesBytes_(32_(PoolPairLUT|Std)|4_(PoolPairLUT|Std))|ParseUint32BEHexBytes_(LUT|Switch)|Encode(BytesToFixedHex_32_Std|32ToHex64Lower_(Unrolled|2ByteLUTLoop|LUTLoop)|ToString_32_(Std|StdStackBuf|Unrolled)))' -benchmem -benchtime=100ms .
159+
```
160+
161+
Environment for the sample numbers below:
162+
163+
- `goos`: `linux`
164+
- `goarch`: `amd64`
165+
- `cpu`: `AMD Ryzen 9 7950X 16-Core Processor`
166+
- `pkg`: `goPool`
167+
168+
Key results (microbenchmarks):
169+
170+
- **Decode 32-byte hex into fixed bytes**
171+
- stdlib `hex.Decode`: `20.64 ns/op`, `0 allocs/op`
172+
- goPool pair-LUT helper (`decodeHexToFixedBytesBytes`): `16.37 ns/op`, `0 allocs/op`
173+
- Roughly **1.26x faster** in this benchmark
174+
- **Decode 4-byte hex into fixed bytes**
175+
- stdlib `hex.Decode`: `3.450 ns/op`, `0 allocs/op`
176+
- goPool pair-LUT helper (`decodeHexToFixedBytesBytes`): `3.360 ns/op`, `0 allocs/op`
177+
- Essentially **similar** performance in this benchmark
178+
- **Parse 8-char uint32 hex (`parseUint32BEHexBytes`)**
179+
- LUT parser: `2.018 ns/op` (lower), `2.000 ns/op` (upper), `0 allocs/op`
180+
- switch parser: `4.042 ns/op` (lower), `4.489 ns/op` (upper), `0 allocs/op`
181+
- LUT path is roughly **2x faster**
182+
- **Encode 32 bytes -> 64 hex bytes (byte buffer output)**
183+
- stdlib `hex.Encode`: `17.97 ns/op`, `0 allocs/op`
184+
- LUT loop: `15.03 ns/op`, `0 allocs/op`
185+
- 2-byte LUT loop: `18.73 ns/op`, `0 allocs/op`
186+
- Unrolled LUT encode: `8.139 ns/op`, `0 allocs/op`
187+
- Unrolled path is roughly **2.2x faster** than stdlib in this benchmark
188+
- **Encode 32 bytes -> hex string**
189+
- `hex.EncodeToString`: `55.35 ns/op`, `128 B/op`, `2 allocs/op`
190+
- stdlib with stack buffer + `string(out[:])`: `33.65 ns/op`, `64 B/op`, `1 alloc/op`
191+
- unrolled encode + `string(out[:])`: `20.63 ns/op`, `64 B/op`, `1 alloc/op`
192+
- Fast path significantly reduces CPU time and cuts one allocation
193+
194+
Notes:
195+
196+
- These are **microbenchmarks** of helper functions (not end-to-end share processing).
197+
- For change comparisons, use `-count` and `benchstat` as shown in the Stratum benchmark section above.
75198

76199
## CPU Profiling with Simulated Miners
77200

stratum_fastpath_bench_test.go

Lines changed: 93 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,93 @@
1+
package main
2+
3+
import (
4+
"net"
5+
"testing"
6+
"time"
7+
)
8+
9+
type benchDiscardConn struct{}
10+
11+
func (benchDiscardConn) Read([]byte) (int, error) { return 0, nil }
12+
func (benchDiscardConn) Write(b []byte) (int, error) { return len(b), nil }
13+
func (benchDiscardConn) Close() error { return nil }
14+
func (benchDiscardConn) LocalAddr() net.Addr { return &net.IPAddr{} }
15+
func (benchDiscardConn) RemoteAddr() net.Addr { return &net.IPAddr{} }
16+
func (benchDiscardConn) SetDeadline(time.Time) error { return nil }
17+
func (benchDiscardConn) SetReadDeadline(time.Time) error { return nil }
18+
func (benchDiscardConn) SetWriteDeadline(time.Time) error { return nil }
19+
20+
func benchmarkEncodeMinerConn(fastEncode bool, ckpool bool) *MinerConn {
21+
return &MinerConn{
22+
id: "bench-encode",
23+
conn: benchDiscardConn{},
24+
cfg: Config{
25+
StratumFastEncodeEnabled: fastEncode,
26+
CKPoolEmulate: ckpool,
27+
},
28+
}
29+
}
30+
31+
func BenchmarkStratumEncodeTrueResponse_Normal(b *testing.B) {
32+
mc := benchmarkEncodeMinerConn(false, true)
33+
b.ReportAllocs()
34+
for i := 0; i < b.N; i++ {
35+
mc.writeTrueResponse(1)
36+
}
37+
}
38+
39+
func BenchmarkStratumEncodeTrueResponse_Fast(b *testing.B) {
40+
mc := benchmarkEncodeMinerConn(true, true)
41+
b.ReportAllocs()
42+
for i := 0; i < b.N; i++ {
43+
mc.writeTrueResponse(1)
44+
}
45+
}
46+
47+
func BenchmarkStratumEncodePongResponse_Normal(b *testing.B) {
48+
mc := benchmarkEncodeMinerConn(false, true)
49+
b.ReportAllocs()
50+
for i := 0; i < b.N; i++ {
51+
mc.writePongResponse(7)
52+
}
53+
}
54+
55+
func BenchmarkStratumEncodePongResponse_Fast(b *testing.B) {
56+
mc := benchmarkEncodeMinerConn(true, true)
57+
b.ReportAllocs()
58+
for i := 0; i < b.N; i++ {
59+
mc.writePongResponse(7)
60+
}
61+
}
62+
63+
func BenchmarkStratumEncodeSubscribeResponse_CKPool_Normal(b *testing.B) {
64+
mc := benchmarkEncodeMinerConn(false, true)
65+
b.ReportAllocs()
66+
for i := 0; i < b.N; i++ {
67+
mc.writeSubscribeResponse(2, "01020304", 4, "sid")
68+
}
69+
}
70+
71+
func BenchmarkStratumEncodeSubscribeResponse_CKPool_Fast(b *testing.B) {
72+
mc := benchmarkEncodeMinerConn(true, true)
73+
b.ReportAllocs()
74+
for i := 0; i < b.N; i++ {
75+
mc.writeSubscribeResponse(2, "01020304", 4, "sid")
76+
}
77+
}
78+
79+
func BenchmarkStratumEncodeSubscribeResponse_Expanded_Normal(b *testing.B) {
80+
mc := benchmarkEncodeMinerConn(false, false)
81+
b.ReportAllocs()
82+
for i := 0; i < b.N; i++ {
83+
mc.writeSubscribeResponse(2, "01020304", 4, "sid")
84+
}
85+
}
86+
87+
func BenchmarkStratumEncodeSubscribeResponse_Expanded_Fast(b *testing.B) {
88+
mc := benchmarkEncodeMinerConn(true, false)
89+
b.ReportAllocs()
90+
for i := 0; i < b.N; i++ {
91+
mc.writeSubscribeResponse(2, "01020304", 4, "sid")
92+
}
93+
}

0 commit comments

Comments
 (0)