Skip to content

Commit d34379e

Browse files
committed
docs(madr): add 094 modern error handling strategy
Addresses inconsistency between MADR 073 (stdlib) and codebase (pkg/errors). Recommends migrating to cockroachdb/errors for stack traces + performance. Signed-off-by: Marcin Skalski <skalskimarcin33@gmail.com>
1 parent 64d373a commit d34379e

File tree

1 file changed

+293
-0
lines changed

1 file changed

+293
-0
lines changed
Lines changed: 293 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,293 @@
1+
# Modern Go Error Handling Strategy
2+
3+
* Status: accepted
4+
5+
## Context and Problem Statement
6+
7+
Kuma currently has **inconsistent error handling** across the codebase:
8+
9+
1. **MADR 073** (accepted 2023) recommends sentinel errors + `fmt.Errorf` with `%w`
10+
2. **Codebase reality**: Extensive use of `github.com/pkg/errors` (v0.9.1)
11+
- 30+ usages of `errors.Wrap/Wrapf/Errorf` in `pkg/`
12+
- Dependency in `go.mod`
13+
3. **New code** (e.g., `pkg/core/resources/store/error.go`) follows MADR 073 (stdlib)
14+
15+
**Key developments since MADR 073:**
16+
- `github.com/pkg/errors` **archived in 2024** - no longer maintained
17+
- Security/maintenance risk for frozen dependency
18+
- Modern alternatives emerged with better performance
19+
- Stack traces proven valuable for Kuma debugging
20+
21+
**Requirements:**
22+
- Stack traces for production debugging
23+
- Best performance (high-throughput control plane, many data planes)
24+
- Type-safe error checking (`errors.Is/As`)
25+
- Error wrapping with context
26+
- No migration timeline pressure
27+
28+
## Design
29+
30+
### Option 1: Standard Library Only (`fmt.Errorf` + `%w`)
31+
32+
**Current MADR 073 recommendation**
33+
34+
```go
35+
var ErrNotFound = errors.New("not found")
36+
37+
func ErrorResourceNotFound(rt, name, mesh string) error {
38+
return fmt.Errorf("resource %w: type=%q name=%q mesh=%q", ErrNotFound, rt, name, mesh)
39+
}
40+
41+
func IsNotFound(err error) bool {
42+
return errors.Is(err, ErrNotFound)
43+
}
44+
```
45+
46+
#### Pros
47+
48+
- Zero external dependencies
49+
- Actively maintained by Go team
50+
- Consistent with Go 1.13+ idioms
51+
- Works with `errors.Is/As`
52+
53+
#### Cons
54+
55+
- No stack traces - dealbreaker for debugging requirements
56+
- Limited error context extraction
57+
- Performance overhead with `errors.Is()` (up to 5x slower per benchmarks)
58+
59+
### Option 2: Keep `github.com/pkg/errors`
60+
61+
**Status quo**
62+
63+
```go
64+
return errors.Wrapf(err, "failed to sync: zone=%s", zone)
65+
```
66+
67+
#### Pros
68+
69+
- Already in use - no migration
70+
- Stack traces included
71+
- Familiar to team
72+
73+
#### Cons
74+
75+
- Archived dependency - security/maintenance risk
76+
- Performance: 500% slowdown with `errors.Is()` benchmarks
77+
- Inconsistent with MADR 073
78+
- Dave Cheney (author): "I no longer use this package"
79+
80+
### Option 3: Migrate to `github.com/cockroachdb/errors`
81+
82+
**Modern, actively maintained alternative**
83+
84+
```go
85+
import "github.com/cockroachdb/errors"
86+
87+
// Drop-in replacement
88+
return errors.Wrapf(err, "failed to sync: zone=%s", zone)
89+
90+
// Stack traces automatic
91+
errors.WithStack(ErrNotFound)
92+
93+
// Works with stdlib
94+
errors.Is(err, ErrNotFound) //
95+
errors.As(err, &targetErr) //
96+
```
97+
98+
#### Pros
99+
100+
- Stack traces with full call context
101+
- Better performance than pkg/errors
102+
- Drop-in replacement - minimal code changes
103+
- Actively maintained (CockroachDB production use)
104+
- Stdlib compatible (`errors.Is/As` work correctly)
105+
- Rich features: error formatting, PII redaction, error barriers
106+
- Structured error metadata
107+
- Production-proven at scale
108+
109+
#### Cons
110+
111+
- External dependency (but from reputable source)
112+
- Migration effort for existing `fmt.Errorf` usage
113+
- Slightly larger binary size
114+
115+
### Option 4: Hybrid Approach (Custom Types + Sentinels)
116+
117+
**Enhanced structured errors**
118+
119+
```go
120+
type ResourceError struct {
121+
sentinel error
122+
Type model.ResourceType
123+
Name string
124+
Mesh string
125+
stack []uintptr // manual stack capture
126+
}
127+
128+
func (e *ResourceError) Error() string {
129+
return fmt.Sprintf("resource %s: type=%q name=%q mesh=%q",
130+
e.sentinel, e.Type, e.Name, e.Mesh)
131+
}
132+
```
133+
134+
#### Pros
135+
136+
- Full control over error structure
137+
- Extractable metadata
138+
139+
#### Cons
140+
141+
- Significant implementation effort (stack capture, formatting)
142+
- Reinventing the wheel
143+
- Maintenance burden
144+
- Won't match cockroachdb/errors features
145+
146+
## Security implications and review
147+
148+
**Archived dependency risk:**
149+
- `pkg/errors` receives no security updates
150+
- Future CVEs won't be patched
151+
- Recommendation: Migrate away
152+
153+
**cockroachdb/errors:**
154+
- Active security maintenance
155+
- Used in production by CockroachDB (security-conscious)
156+
- Regular updates and CVE monitoring
157+
158+
## Reliability implications
159+
160+
**Performance (Critical for Kuma CP):**
161+
- Control plane handles many data planes
162+
- Error handling in hot paths (xDS updates)
163+
- Benchmarks show `pkg/errors` + `errors.Is()` = 5x slowdown
164+
- `cockroachdb/errors` optimized for performance
165+
166+
**Debugging:**
167+
- Stack traces essential for multi-zone issues
168+
- Error context helps diagnose cross-component failures
169+
- Structured errors improve observability/telemetry
170+
171+
**Error propagation:**
172+
- KDS (multi-zone sync) propagates errors between CPs
173+
- Stack traces help debug distributed failures
174+
- Consistent error wrapping improves troubleshooting
175+
176+
## Implications for Kong Mesh
177+
178+
Kong Mesh inherits Kuma's error handling:
179+
- Stack traces valuable for enterprise debugging
180+
- Performance improvements benefit enterprise deployments
181+
- Migration can be synchronized with Kuma changes
182+
- Error telemetry integration potential
183+
184+
## Decision
185+
186+
**Adopt Option 3: Migrate to `github.com/cockroachdb/errors`**
187+
188+
### Rationale
189+
190+
Meets all requirements:
191+
1. Stack traces (required)
192+
2. Better performance than pkg/errors (required)
193+
3. Drop-in replacement for existing code (low migration cost)
194+
4. Actively maintained (eliminates security risk)
195+
5. Stdlib compatible (future-proof)
196+
197+
### Migration Strategy
198+
199+
**Phase 1: Add dependency**
200+
```bash
201+
go get github.com/cockroachdb/errors
202+
```
203+
204+
**Phase 2: Update imports (gradual)**
205+
```go
206+
// Old
207+
import "github.com/pkg/errors"
208+
209+
// New
210+
import "github.com/cockroachdb/errors"
211+
```
212+
213+
**Phase 3: Update MADR 073**
214+
- Document cockroachdb/errors as standard
215+
- Update examples
216+
- Keep sentinel error pattern
217+
218+
**Phase 4: Convert fmt.Errorf usage**
219+
```go
220+
// Old (no stack trace)
221+
return fmt.Errorf("sync failed: %w", err)
222+
223+
// New (with stack trace)
224+
return errors.Wrap(err, "sync failed")
225+
```
226+
227+
**No timeline pressure** - can be done incrementally across releases.
228+
229+
### Usage Convention
230+
231+
```go
232+
// Sentinel errors (preserved from MADR 073)
233+
var ErrNotFound = errors.New("not found")
234+
235+
// Error construction with context
236+
func ErrorResourceNotFound(rt model.ResourceType, name, mesh string) error {
237+
return errors.Wrapf(ErrNotFound,
238+
"resource: type=%q name=%q mesh=%q", rt, name, mesh)
239+
}
240+
241+
// Error checking (unchanged)
242+
if errors.Is(err, ErrNotFound) {
243+
// handle
244+
}
245+
246+
// Extract structured errors
247+
var resourceErr *ResourceError
248+
if errors.As(err, &resourceErr) {
249+
log.Error("failed", "type", resourceErr.Type)
250+
}
251+
```
252+
253+
## Notes
254+
255+
**Alternative considered:**
256+
- `gitlab.com/tozd/go/errors` - similar features, less adoption
257+
- CockroachDB chosen for production track record
258+
259+
**Stack trace overhead:**
260+
- Only captured on error creation (not propagation)
261+
- Negligible for error-case performance
262+
- Can be disabled in production if needed (env var)
263+
264+
## References
265+
266+
### Go Error Handling Best Practices
267+
- [Mastering Go Error Handling: A Practical Guide - DEV Community](https://dev.to/leapcell/mastering-go-error-handling-a-practical-guide-3411)
268+
- [Best Practices for Error Handling in Go - JetBrains Guide](https://www.jetbrains.com/guide/go/tutorials/handle_errors_in_go/best_practices/)
269+
- [Mastering Error Handling in Go: Best Practices and Patterns - SkyCode](https://vickychhetri.com/2024/11/19/mastering-error-handling-in-go-best-practices-and-patterns/)
270+
- [A practical guide to error handling in Go - Datadog](https://www.datadoghq.com/blog/go-error-handling/)
271+
272+
### Sentinel Errors vs Custom Types
273+
- [Error Values vs Sentinel Errors in Go - Medium](https://medium.com/@AlexanderObregon/error-values-vs-sentinel-errors-in-go-a377c3d831d1)
274+
- [Go Error Handling Techniques: Exploring Sentinel Errors, Custom Types, and Client-Facing Errors](https://arashtaher.wordpress.com/2024/09/05/go-error-handling-techniques-exploring-sentinel-errors-custom-types-and-client-facing-errors/)
275+
- [Go Error Handling: Sentinel vs Custom Types - alesr](https://alesr.github.io/posts/go-errors/)
276+
277+
### Error Wrapping
278+
- [Working with Errors in Go 1.13 - Official Go Blog](https://go.dev/blog/go1.13-errors)
279+
- [Error Wrapping with Go's Standard Library - Medium](https://medium.com/@AlexanderObregon/error-wrapping-with-gos-standard-library-0a345eeea019)
280+
- [Error wrapping in Go - Bitfield Consulting](https://bitfieldconsulting.com/posts/wrapping-errors)
281+
- [Wrapping errors in Go - barney's tech blog](https://southcla.ws/wrapping-errors-in-go-a-new-approach)
282+
283+
### fmt.Errorf vs pkg/errors
284+
- [What's the difference between errors.Wrapf(), errors.Errorf(), and fmt.Errorf()? - Stack Overflow](https://stackoverflow.com/questions/61933650/whats-the-difference-between-errors-wrapf-errors-errorf-and-fmt-errorf)
285+
- [Can new Go errors wrapper replace pkg/errors? - Dmitry Harnitski's Blog](https://blog.dharnitski.com/2019/09/09/go-errors-are-not-pkg-errors/)
286+
- [decide on the future use of errors, pkg/errors - moby/moby Discussion](https://github.com/moby/moby/discussions/46358)
287+
288+
### Performance
289+
- [Sentinel errors and errors.Is() slow your code down by 500% - DoltHub Blog](https://www.dolthub.com/blog/2024-05-31-benchmarking-go-error-handling/)
290+
291+
### Modern Alternatives
292+
- [Beyond fmt.Errorf() - CockroachDB errors library](https://dr-knz.net/cockroachdb-errors-everyday.html)
293+
- [cockroachdb/errors - Go Packages](https://pkg.go.dev/github.com/cockroachdb/errors)

0 commit comments

Comments
 (0)