Skip to content

Commit 5bd3471

Browse files
schighclaude
andcommitted
Add functional test scenario with observability stack
Revamp the HTTP functional test to use the new circuit breaker API: - Server with Run[T], BreakerBox, MetricsCollector, /metrics endpoint - Client with Run[T], IsExcluded, cascading breaker protection - Docker Compose with Prometheus and Grafana - Pre-provisioned Grafana dashboard (state, rates, rejections, totals) - Thorough README with architecture, what to observe, PromQL queries Run with: cd _functional_tests/scenarios/http && docker compose up --build Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1 parent 0f0c938 commit 5bd3471

File tree

10 files changed

+637
-133
lines changed

10 files changed

+637
-133
lines changed
Lines changed: 150 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,150 @@
1+
# HTTP Functional Test — Circuit Breaker in Action
2+
3+
This scenario demonstrates the circuit breaker library in a realistic microservice environment: an unreliable server, a client generating load, and a full observability stack to visualize circuit breaker behavior in real time.
4+
5+
## Architecture
6+
7+
```
8+
┌────────┐ HTTP ┌────────┐
9+
│ Client │ ───────────────────► │ Server │
10+
│ │ 7 named breakers │ │
11+
│ │ 100ms intervals │ │──► /metrics (Prometheus format)
12+
└────────┘ └────────┘
13+
14+
│ scrape every 5s
15+
┌────┴─────┐
16+
│Prometheus │ :9090
17+
└────┬─────┘
18+
19+
┌────┴─────┐
20+
│ Grafana │ :3000
21+
└──────────┘
22+
```
23+
24+
**Server** — An HTTP service that wraps a simulated unreliable dependency with circuit breakers. The dependency fails ~90% of the time. Each unique breaker name (`foo`, `bar`, `baz`, etc.) gets its own independent circuit breaker via `BreakerBox`. The server exposes a `/metrics` endpoint with Prometheus-format counters.
25+
26+
**Client** — A load generator that sends requests to the server every 100ms, randomly selecting from 7 breaker names. The client also wraps outbound calls with its own circuit breakers, demonstrating **cascading circuit breaker protection** — breakers on both sides of the network boundary.
27+
28+
**Prometheus** — Scrapes the server's `/metrics` endpoint every 5 seconds.
29+
30+
**Grafana** — Pre-provisioned with a dashboard showing all circuit breaker metrics. No setup required.
31+
32+
## Running
33+
34+
```bash
35+
cd _functional_tests/scenarios/http
36+
docker compose up --build
37+
```
38+
39+
Wait for the build to complete and services to start. You'll see logs from both the server and client in your terminal.
40+
41+
## What to Observe
42+
43+
### Terminal Logs
44+
45+
Within seconds, you'll see a flurry of activity:
46+
47+
**Server logs** show state transitions as JSON:
48+
```
49+
server - state change: {"name":"foo","state":"open","opened":"2024-...","lockout_ends":"2024-..."}
50+
server - state change: {"name":"foo","state":"throttled","throttled":"2024-...","backoff_ends":"2024-..."}
51+
server - state change: {"name":"bar","state":"open","opened":"2024-...","lockout_ends":"2024-..."}
52+
```
53+
54+
**Client logs** show the client-side breaker reacting to server failures:
55+
```
56+
client - [foo] error: server error 502: error inside circuit breaker 'foo': dependency failure
57+
client - [foo] client breaker OPEN — request rejected
58+
client - [foo] client breaker OPEN — request rejected
59+
client - client breaker 'foo' → throttled
60+
client - [bar] server throttled (excluded from client tracking)
61+
```
62+
63+
### What's Happening
64+
65+
1. **Errors accumulate** — The server's dependency fails 90% of the time. After 5 failures within the 1-minute window, the server-side breaker opens.
66+
67+
2. **Server rejects requests** — While open, the server returns `503 Service Unavailable`. After a 5-second lockout, it enters the throttled state, gradually allowing more requests through using exponential backoff.
68+
69+
3. **Client detects failures** — The client sees 502/503 responses and its own breakers open after 3 failures within 10 seconds.
70+
71+
4. **Client sheds load** — While the client breaker is open, requests are rejected locally without hitting the network. This protects the server from thundering herd during recovery.
72+
73+
5. **Recovery** — After backoff periods expire and error counts drop, breakers transition back to closed. The cycle repeats because the dependency remains 90% unreliable.
74+
75+
6. **Excluded errors** — The client's `IsExcluded` callback treats server-side throttling (`429`) as excluded. These responses don't count against the client's error threshold, preventing the client from opening its own breaker just because the server is recovering.
76+
77+
### Grafana Dashboard
78+
79+
Open [http://localhost:3000](http://localhost:3000) (no login required — anonymous access enabled).
80+
81+
Navigate to **Dashboards → Circuit Breaker Dashboard**, or go directly to:
82+
```
83+
http://localhost:3000/d/circuit-breaker-demo
84+
```
85+
86+
The dashboard has 6 panels:
87+
88+
| Panel | What to Look For |
89+
|-------|-----------------|
90+
| **Circuit Breaker State** | Line chart showing each breaker cycling between 0 (Closed), 1 (Throttled), and 2 (Open). You should see breakers flipping open and recovering independently. |
91+
| **Successes (rate/s)** | Drops to zero when breakers open, gradually recovers during throttle, returns to normal when closed. |
92+
| **Errors (rate/s)** | Spikes that trigger breaker opens. Watch for the rate dropping during open state (errors aren't recorded when the breaker rejects requests). |
93+
| **Rejected Requests (rate/s)** | Stacked bars showing load being shed. High rejection during open state, tapering off during throttle. This is the breaker doing its job. |
94+
| **Timeouts (rate/s)** | Should be near zero in this scenario (the dependency fails fast, not slow). |
95+
| **Total Counters** | Running totals for each breaker — useful for comparing relative activity across breaker names. |
96+
97+
### Prometheus
98+
99+
Open [http://localhost:9090](http://localhost:9090) and try these queries:
100+
101+
```promql
102+
# Current state of all breakers (0=closed, 1=throttled, 2=open)
103+
circuit_breaker_state
104+
105+
# Error rate per breaker over the last 30 seconds
106+
rate(circuit_breaker_errors[30s])
107+
108+
# Ratio of rejected to total requests
109+
rate(circuit_breaker_rejected[30s]) / (rate(circuit_breaker_successes[30s]) + rate(circuit_breaker_errors[30s]) + rate(circuit_breaker_rejected[30s]))
110+
111+
# Which breakers are currently open?
112+
circuit_breaker_state == 2
113+
```
114+
115+
### Key Behaviors to Verify
116+
117+
- [ ] **Independent breakers** — Each of the 7 named breakers opens and recovers on its own schedule. One breaker opening doesn't affect the others.
118+
- [ ] **Lockout works** — After a breaker opens, it stays open for 5 seconds (server-side) before transitioning to throttled, even if errors drop below the threshold.
119+
- [ ] **Gradual recovery** — During the throttled state, the rejection rate decreases over the 10-second backoff period, not all at once.
120+
- [ ] **Error exclusion** — Client-side breakers stay closed longer than server-side ones because server throttling responses are excluded from the client's error count.
121+
- [ ] **No goroutine leaks** — The server's pprof endpoint at [http://localhost:6060/debug/pprof/goroutine?debug=1](http://localhost:6060/debug/pprof/goroutine?debug=1) should show a stable goroutine count (circuit breakers create zero background goroutines).
122+
123+
## Tuning
124+
125+
To experiment with different behaviors, edit the circuit breaker options in the server and client `main.go` files:
126+
127+
| Parameter | Server | Client | Effect of Increasing |
128+
|-----------|--------|--------|---------------------|
129+
| `Threshold` | 5 | 3 | More errors tolerated before opening |
130+
| `Window` | 1m | 10s | Longer error memory |
131+
| `LockOut` | 5s | 5s | Longer forced-open period |
132+
| `BackOff` | 10s | 10s | Slower recovery from throttle |
133+
| `errchance` | 9000 || Lower = more server-side errors (inverted: `> errchance` fails) |
134+
135+
To change the error rate without rebuilding, adjust the `errchance` query parameter the client sends (in `client/main.go`, line with `errchance=%d`). Value of `9000` means the server fails when `rand(10000) > 9000`, so ~90% failure rate.
136+
137+
## Stopping
138+
139+
```bash
140+
docker compose down
141+
```
142+
143+
## Ports
144+
145+
| Port | Service |
146+
|------|---------|
147+
| 3000 | Grafana (dashboard) |
148+
| 8080 | Server (API + metrics) |
149+
| 6060 | Server (pprof) |
150+
| 9090 | Prometheus |
Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
FROM golang:1.22-alpine AS build
2+
WORKDIR /src
3+
COPY go.mod ./
4+
RUN go mod download
5+
COPY . .
6+
RUN CGO_ENABLED=0 go build -o /client ./_functional_tests/scenarios/http/client
7+
8+
FROM alpine:3.19
9+
COPY --from=build /client /client
10+
CMD ["/client"]

_functional_tests/scenarios/http/client/main.go

Lines changed: 86 additions & 63 deletions
Original file line numberDiff line numberDiff line change
@@ -4,9 +4,9 @@ import (
44
"context"
55
"errors"
66
"fmt"
7-
"io/ioutil"
7+
"io"
88
"log"
9-
"math/rand"
9+
"math/rand/v2"
1010
"net/http"
1111
"os"
1212
"os/signal"
@@ -21,76 +21,99 @@ var theBox *circuit.BreakerBox
2121
func main() {
2222
stopChan := make(chan os.Signal, 1)
2323
signal.Notify(stopChan, syscall.SIGINT, syscall.SIGTERM)
24-
rand.Seed(time.Now().UnixNano())
2524
log.SetPrefix("client - ")
2625

26+
serverAddr := os.Getenv("SERVER_ADDR")
27+
if serverAddr == "" {
28+
serverAddr = "http://localhost:8080"
29+
}
30+
2731
theBox = circuit.NewBreakerBox()
2832

33+
// Log state changes from all client-side breakers
34+
go func() {
35+
for state := range theBox.StateChange() {
36+
log.Printf("client breaker '%s' → %s", state.Name, state.State)
37+
}
38+
}()
39+
40+
httpClient := &http.Client{Timeout: 5 * time.Second}
41+
breakerNames := []string{"foo", "bar", "baz", "fizz", "buzz", "herp", "derp"}
42+
43+
ticker := time.NewTicker(100 * time.Millisecond)
44+
defer ticker.Stop()
45+
46+
log.Printf("starting load against %s", serverAddr)
47+
2948
for {
3049
select {
3150
case <-stopChan:
32-
goto END
51+
log.Println("shutting down...")
52+
return
53+
case <-ticker.C:
54+
name := breakerNames[rand.IntN(len(breakerNames))]
55+
go makeRequest(httpClient, serverAddr, name)
56+
}
57+
}
58+
}
59+
60+
func makeRequest(httpClient *http.Client, serverAddr, name string) {
61+
breaker, err := theBox.LoadOrCreate(name,
62+
circuit.WithThreshold(3),
63+
circuit.WithTimeout(2*time.Second),
64+
circuit.WithBackOff(10*time.Second),
65+
circuit.WithWindow(10*time.Second),
66+
circuit.WithLockOut(5*time.Second),
67+
circuit.WithEstimationFunc(circuit.Exponential),
68+
circuit.WithIsExcluded(func(err error) bool {
69+
// Don't count server-side throttling against the client breaker
70+
return errors.Is(err, errServerThrottled)
71+
}),
72+
)
73+
if err != nil {
74+
log.Printf("failed to get breaker '%s': %v", name, err)
75+
return
76+
}
77+
78+
body, err := circuit.Run(breaker, context.Background(), func(ctx context.Context) (string, error) {
79+
uri := fmt.Sprintf("%s?cb=%s&errchance=%d", serverAddr, name, 9000)
80+
req, _ := http.NewRequestWithContext(ctx, http.MethodGet, uri, nil)
81+
resp, err := httpClient.Do(req)
82+
if err != nil {
83+
return "", fmt.Errorf("request failed: %w", err)
84+
}
85+
defer resp.Body.Close()
86+
data, _ := io.ReadAll(resp.Body)
87+
88+
switch resp.StatusCode {
89+
case http.StatusOK:
90+
return string(data), nil
91+
case http.StatusTooManyRequests:
92+
return string(data), errServerThrottled
93+
case http.StatusServiceUnavailable:
94+
return string(data), fmt.Errorf("server circuit open: %s", data)
95+
case http.StatusGatewayTimeout:
96+
return string(data), fmt.Errorf("server timeout: %s", data)
3397
default:
34-
<-time.After(100 * time.Millisecond)
35-
breakernames := []string{
36-
"foo",
37-
"bar",
38-
"baz",
39-
"fizz",
40-
"buzz",
41-
"herp",
42-
"derp",
43-
}
44-
breakerName := breakernames[rand.Intn(len(breakernames))]
45-
breaker, _ := theBox.LoadOrCreate(circuit.BreakerOptions{
46-
OpeningWillResetErrors: false,
47-
Threshold: 3,
48-
Timeout: 10 * time.Millisecond,
49-
BaudRate: 0,
50-
BackOff: 10 * time.Second,
51-
Window: 10 * time.Second,
52-
LockOut: 5 * time.Second,
53-
Name: breakerName,
54-
EstimationFunc: circuit.Exponential,
55-
})
56-
57-
go func(breaker *circuit.Breaker, name string) {
58-
_, err := breaker.Run(context.Background(), func(ctx context.Context) (interface{}, error) {
59-
uri := fmt.Sprintf("http://localhost:8080?cb=%s&errchance=%d", name, 9000)
60-
resp, err := http.DefaultClient.Get(uri)
61-
if err != nil {
62-
return []byte(nil), err
63-
}
64-
defer resp.Body.Close()
65-
data, _ := ioutil.ReadAll(resp.Body)
66-
switch resp.StatusCode {
67-
case http.StatusOK, http.StatusPartialContent:
68-
return data, nil
69-
case http.StatusForbidden:
70-
log.Printf("throttled on server: '%s'", name)
71-
return data, errors.New("throttled")
72-
case http.StatusInternalServerError:
73-
log.Printf("open on server: '%s'", name)
74-
return data, errors.New("open")
75-
}
76-
log.Println("wat")
77-
return data, nil
78-
})
79-
80-
if err != nil {
81-
if errors.Is(err, circuit.StateOpenError) {
82-
log.Printf("open on client: '%s'", name)
83-
} else if errors.Is(err, circuit.StateThrottledError) {
84-
log.Printf("throttled on client: '%s'", name)
85-
} else {
86-
log.Printf("error from inside breaker '%s': %v", name, err)
87-
}
88-
}
89-
90-
}(breaker, breakerName)
98+
return string(data), fmt.Errorf("server error %d: %s", resp.StatusCode, data)
9199
}
100+
})
101+
102+
if err != nil {
103+
switch {
104+
case errors.Is(err, circuit.ErrStateOpen):
105+
log.Printf("[%s] client breaker OPEN — request rejected", name)
106+
case errors.Is(err, circuit.ErrStateThrottled):
107+
log.Printf("[%s] client breaker THROTTLED — request shed", name)
108+
case errors.Is(err, errServerThrottled):
109+
log.Printf("[%s] server throttled (excluded from client tracking)", name)
110+
default:
111+
log.Printf("[%s] error: %v", name, err)
112+
}
113+
return
92114
}
93115

94-
END:
95-
println("bye")
116+
_ = body // success — quiet in logs to reduce noise
96117
}
118+
119+
var errServerThrottled = errors.New("server throttled")
Lines changed: 48 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,48 @@
1+
services:
2+
server:
3+
build:
4+
context: ../../..
5+
dockerfile: _functional_tests/scenarios/http/server/Dockerfile
6+
ports:
7+
- "8080:8080"
8+
- "6060:6060"
9+
healthcheck:
10+
test: ["CMD", "wget", "-q", "--spider", "http://localhost:8080/health"]
11+
interval: 2s
12+
timeout: 2s
13+
retries: 10
14+
15+
client:
16+
build:
17+
context: ../../..
18+
dockerfile: _functional_tests/scenarios/http/client/Dockerfile
19+
environment:
20+
- SERVER_ADDR=http://server:8080
21+
depends_on:
22+
server:
23+
condition: service_healthy
24+
25+
prometheus:
26+
image: prom/prometheus:v2.51.0
27+
volumes:
28+
- ./prometheus/prometheus.yml:/etc/prometheus/prometheus.yml:ro
29+
ports:
30+
- "9090:9090"
31+
depends_on:
32+
server:
33+
condition: service_healthy
34+
35+
grafana:
36+
image: grafana/grafana:10.4.0
37+
environment:
38+
- GF_SECURITY_ADMIN_USER=admin
39+
- GF_SECURITY_ADMIN_PASSWORD=admin
40+
- GF_AUTH_ANONYMOUS_ENABLED=true
41+
- GF_AUTH_ANONYMOUS_ORG_ROLE=Viewer
42+
volumes:
43+
- ./grafana/provisioning:/etc/grafana/provisioning:ro
44+
- ./grafana/dashboards:/var/lib/grafana/dashboards:ro
45+
ports:
46+
- "3000:3000"
47+
depends_on:
48+
- prometheus

0 commit comments

Comments
 (0)