Skip to content

Commit 3fd6e79

Browse files
gcmsgclaude
andcommitted
docs(internal): add federation multi-node architecture reference
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
1 parent 36e2230 commit 3fd6e79

File tree

1 file changed

+304
-0
lines changed

1 file changed

+304
-0
lines changed
Lines changed: 304 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,304 @@
1+
# Federation Multi-Node Architecture
2+
3+
## Overview
4+
5+
PeerClaw supports a federated multi-node architecture that enables multiple gateway nodes to forward signaling messages, discover agents from remote peers, and relay WebRTC signals across geographically distributed instances.
6+
7+
## Three-Layer Broker Architecture
8+
9+
Signal distribution uses a layered broker model:
10+
11+
```
12+
┌──────────────────────────────────────────────────────┐
13+
│ Agent A <-> WebSocket <-> Hub <-> Broker.Publish() │
14+
│ | │
15+
│ +------------+------------+ │
16+
│ v v v │
17+
│ LocalBroker RedisBroker FederationBroker
18+
│ (single) (cluster) (cross-region)
19+
└──────────────────────────────────────────────────────┘
20+
```
21+
22+
- **LocalBroker** -- In-memory forwarding, single-node development
23+
- **RedisBroker** -- Redis Pub/Sub, multi-node within same cluster
24+
- **FederationBroker** -- HTTP forwarding, cross-network/cross-organization relay
25+
26+
### Key Source Files
27+
28+
| File | Purpose |
29+
|------|---------|
30+
| `internal/signaling/broker.go` | Broker interface definition |
31+
| `internal/signaling/local_broker.go` | Single-node in-memory broker |
32+
| `internal/signaling/redis_broker.go` | Multi-node Redis Pub/Sub broker |
33+
| `internal/signaling/federation_broker.go` | Hybrid local + federation broker |
34+
| `internal/signaling/ws.go` | WebSocket hub and agent connection management |
35+
| `internal/federation/federation.go` | Core federation service, peer management, signal forwarding |
36+
| `internal/federation/dns.go` | DNS SRV-based peer discovery |
37+
38+
## Signal Forwarding Flow
39+
40+
```
41+
Agent A (Node 1, Beijing) sends signal to Agent B (Node 2, Shanghai)
42+
43+
1. Agent A sends SignalMessage{To: "agent-b"} via WebSocket
44+
2. Hub.Forward() -> Broker.Publish()
45+
3. FederationBroker checks: hub.HasAgent("agent-b")?
46+
+-- YES -> Local WebSocket delivery
47+
+-- NO -> Iterate all peer nodes
48+
POST /api/v1/federation/signal -> Node 2
49+
Header: Authorization: Bearer <peer-token>
50+
4. Node 2 receives request:
51+
- Validates Bearer token (constant-time comparison)
52+
- Calls HandleIncomingSignal()
53+
- Hub.DeliverLocal() -> delivers to Agent B via WebSocket
54+
```
55+
56+
Automatic fallback: if federation forwarding fails, it falls back to the local broker.
57+
58+
## Peer Discovery
59+
60+
### Method 1: Static Configuration
61+
62+
```yaml
63+
federation:
64+
enabled: true
65+
node_name: "beijing-1"
66+
auth_token: "${FEDERATION_TOKEN}"
67+
peers:
68+
- name: "shanghai-1"
69+
address: "https://sh.example.com"
70+
token: "${PEER_SH_TOKEN}"
71+
- name: "guangzhou-1"
72+
address: "https://gz.example.com"
73+
token: "${PEER_GZ_TOKEN}"
74+
```
75+
76+
### Method 2: DNS SRV Auto-Discovery
77+
78+
```yaml
79+
federation:
80+
enabled: true
81+
dns_enabled: true
82+
dns_domain: "peerclaw.example.com"
83+
```
84+
85+
Queries `_peerclaw._tcp.peerclaw.example.com` SRV records to automatically discover all node host:port pairs and join the federation.
86+
87+
Implementation in `internal/federation/dns.go`:
88+
- `DiscoverPeers(domain)` performs `net.LookupSRV("peerclaw", "tcp", domain)`
89+
- Constructs HTTP URLs from SRV target and port
90+
- Called during initialization if `dns_enabled=true`
91+
92+
## Redis Multi-Node Mode (Intra-Cluster)
93+
94+
Unlike federation, Redis mode is for horizontal scaling of multiple PeerClaw instances **within the same cluster**:
95+
96+
```
97+
+--- Node 1 (peerclawd) ---+
98+
Agent A -| |-- Redis Pub/Sub --+
99+
+--------------------------+ channel: |
100+
"peerclaw: |
101+
+--- Node 2 (peerclawd) ---+ signaling" |
102+
Agent B -| |-------------------+
103+
+--------------------------+
104+
```
105+
106+
Key design decisions:
107+
- **Echo prevention**: Each node has a unique `NodeID` (UUID). On receiving a message, it skips messages it sent itself (`env.NodeID == b.nodeID`).
108+
- **256-message buffer**: Non-blocking channel prevents slow consumers from blocking.
109+
- **Automatic fallback**: If Redis is unavailable, falls back to LocalBroker.
110+
111+
### Redis Envelope Format
112+
113+
```go
114+
type redisEnvelope struct {
115+
NodeID string `json:"node_id"`
116+
Message signaling.SignalMessage `json:"message"`
117+
}
118+
```
119+
120+
Published to Redis channel `peerclaw:signaling`.
121+
122+
## Federation API Endpoints
123+
124+
| Endpoint | Method | Purpose |
125+
|----------|--------|---------|
126+
| `/api/v1/federation/signal` | POST | Receive signal forwarding from other nodes |
127+
| `/api/v1/federation/discover` | POST | Receive agent discovery queries from other nodes |
128+
129+
Handlers are in `internal/server/http.go`:
130+
- `handleFederationSignal` -- validates Bearer token, parses SignalMessage, calls `HandleIncomingSignal()`
131+
- `handleFederationDiscover` -- queries local registry, returns matching agents
132+
133+
## Cross-Node Agent Discovery
134+
135+
```go
136+
// Node 1 wants to find agents with "translation" capability
137+
agents := fedService.QueryAgents(ctx, []string{"translation"})
138+
// -> Concurrently queries all peers' /api/v1/federation/discover
139+
// -> Aggregates results from all nodes
140+
```
141+
142+
Implementation in `federation.go` `QueryAgents()`:
143+
1. Queries `/api/v1/discover` on each peer
144+
2. Filters by capabilities
145+
3. Aggregates results from all peers
146+
4. Returns combined agent list
147+
148+
## Security Model
149+
150+
| Layer | Mechanism |
151+
|-------|-----------|
152+
| Local signaling | Agent signature-based (Ed25519 public key verification) |
153+
| Federation | Bearer token per peer (`crypto/subtle.ConstantTimeCompare`) |
154+
| Redis | No authentication (internal trusted network) |
155+
| Federation HTTP client | TLS 1.2 minimum, 10-second timeout |
156+
157+
## Core Data Structures
158+
159+
### FederationService
160+
161+
```go
162+
type FederationService struct {
163+
mu sync.RWMutex
164+
nodeName string
165+
peers map[string]*FederationPeer
166+
authToken string
167+
logger *slog.Logger
168+
client *http.Client
169+
onSignal func(ctx context.Context, msg signaling.SignalMessage)
170+
stopCh chan struct{}
171+
}
172+
```
173+
174+
### FederationPeer
175+
176+
```go
177+
type FederationPeer struct {
178+
Name string
179+
Address string // URL to peer node
180+
Token string // Auth token for this peer
181+
Connected bool
182+
LastSync time.Time
183+
}
184+
```
185+
186+
### FederationBroker -- Hybrid Routing Logic
187+
188+
```go
189+
func (fb *FederationBroker) Publish(ctx context.Context, msg SignalMessage) error {
190+
if fb.hub.HasAgent(msg.To) {
191+
return fb.local.Publish(ctx, msg) // Local delivery
192+
}
193+
if err := fb.federation.ForwardSignal(ctx, msg); err != nil {
194+
return fb.local.Publish(ctx, msg) // Fallback to local
195+
}
196+
return nil
197+
}
198+
```
199+
200+
## Deployment Topologies
201+
202+
### Topology 1: Load-Balanced Cluster (Redis Mode)
203+
204+
```
205+
+--- Load Balancer ---+
206+
| |
207+
+-----+-----+ +------+-----+
208+
| Node 1 | | Node 2 |
209+
| peerclawd | | peerclawd |
210+
+-----+-----+ +------+-----+
211+
| Shared Redis |
212+
+--------+-----------+
213+
|
214+
+-----+-----+
215+
| Redis |
216+
+-----+-----+
217+
|
218+
+-----+------+
219+
| PostgreSQL |
220+
+------------+
221+
```
222+
223+
### Topology 2: Cross-Region Federation
224+
225+
```
226+
Beijing DC Shanghai DC
227+
+-------------+ HTTP/TLS +-------------+
228+
| Node BJ-1 |<------------->| Node SH-1 |
229+
| + Redis | Federation | + Redis |
230+
| + PG | Signal | + PG |
231+
| Agents: | | Agents: |
232+
| A, B, C | | D, E, F |
233+
+-------------+ +-------------+
234+
^ ^
235+
| DNS SRV: _peerclaw._tcp.corp |
236+
+---------- Auto-discovery ----+
237+
```
238+
239+
### Topology 3: Hybrid (Redis + Federation)
240+
241+
```
242+
Cluster A (3 nodes + Redis) Cluster B (2 nodes + Redis)
243+
+-------------------------+ +-------------------------+
244+
| Node1 <-Redis-> Node2 |<--->| Node4 <-Redis-> Node5 |
245+
| ^ | Fed | |
246+
| Node3 | +-------------------------+
247+
+-------------------------+
248+
```
249+
250+
Intra-cluster uses Redis broadcast, inter-cluster uses Federation HTTP forwarding.
251+
252+
## Initialization Flow
253+
254+
```
255+
main.go
256+
+-- Load config
257+
+-- Initialize logger
258+
+-- Initialize OpenTelemetry
259+
+-- Initialize database
260+
+-- Initialize signaling hub (if enabled)
261+
| +-- SetBroker()
262+
| +-- LocalBroker (if no Redis)
263+
| +-- RedisBroker (if Redis available)
264+
+-- Initialize federation (if enabled)
265+
| +-- Create FederationService
266+
| +-- Add static peers from config
267+
| +-- Discover DNS SRV peers (if dns_enabled)
268+
+-- Create HTTP server
269+
| +-- SetFederation()
270+
| +-- Register routes:
271+
| +-- POST /api/v1/federation/signal
272+
| +-- POST /api/v1/federation/discover
273+
+-- Start servers and wait for shutdown
274+
+-- Clean shutdown: fedService.Close()
275+
```
276+
277+
## Comparison
278+
279+
| Feature | LocalBroker | RedisBroker | FederationBroker |
280+
|---------|------------|-------------|-----------------|
281+
| Node count | 1 | N (same cluster) | N (cross-network) |
282+
| Transport | In-memory | Redis Pub/Sub | HTTPS |
283+
| Latency | ~0 | ~1ms | ~10-100ms |
284+
| Auth | None | None (trusted LAN) | Bearer Token |
285+
| Agent discovery | Local | Shared DB | HTTP query |
286+
| Use case | Development | Production cluster | Multi-region/multi-org |
287+
288+
## Configuration Reference
289+
290+
```yaml
291+
federation:
292+
enabled: false # Enable federation
293+
node_name: "node-1" # This node's unique name
294+
auth_token: "${FED_TOKEN}" # Auth token (required when enabled)
295+
dns_enabled: false # Enable DNS SRV discovery
296+
dns_domain: "" # Domain for SRV lookup
297+
peers: # Static peer list
298+
- name: "node-2"
299+
address: "https://node2.example.com"
300+
token: "${PEER_TOKEN}"
301+
```
302+
303+
Validation: `federation.auth_token` is required when `federation.enabled = true`.
304+
Environment variable substitution (`${VAR}`) is supported for all token fields.

0 commit comments

Comments
 (0)