Skip to content

Commit 348ae33

Browse files
committed
doc: add peer manager spec for shrex
1 parent c6f56e4 commit 348ae33

File tree

1 file changed

+288
-0
lines changed

1 file changed

+288
-0
lines changed

specs/src/shrex/peer_manager.md

Lines changed: 288 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,288 @@
1+
# Peer Manager Specification for SHREX Protocol
2+
3+
## Abstract
4+
5+
This specification defines the Peer Manager component for the SHREX protocol in the Celestia network. The Peer Manager is responsible for collecting, organizing, validating, and selecting peers for efficient data retrieval operations based on data availability notifications and peer discovery.
6+
7+
## Table of Contents
8+
9+
- [Terminology](#terminology)
10+
- [Overview](#overview)
11+
- [Core Components](#core-components)
12+
- [Protocol Specification](#protocol-specification)
13+
- [Peer Pools](#peer-pools)
14+
- [Parameters](#parameters)
15+
- [Result Types](#result-types)
16+
- [Manager Operations](#manager-operations)
17+
- [Peer Selection](#peer-selection)
18+
- [Pool Management](#pool-management)
19+
- [Validation](#validation)
20+
- [Garbage Collection](#garbage-collection)
21+
- [Blacklisting](#blacklisting)
22+
- [References](#references)
23+
- [Requirements Language](#requirements-language)
24+
25+
## Terminology
26+
27+
- **Peer Manager**: The component responsible for collecting, organizing, and providing peers for data retrieval operations
28+
- **Peer Pool**: A collection of peer IDs organized by data hash, storing peers known to have specific data
29+
- **Sync Pool**: A pool with additional metadata including validation status, height, and creation time
30+
- **Validated Pool**: A peer pool that has been confirmed through header subscription to contain legitimate data
31+
- **Cooldown**: A temporary state where a peer is unavailable for selection but not permanently blocked
32+
- **Blacklist**: A permanent block list preventing all future communication with specific peers
33+
- **Discovered Nodes**: Peers found through the discovery service, independent of specific data hashes
34+
- **Initial Height**: The height of the first header received from header subscription, used as a validation baseline
35+
- **Store From**: The biggest height received from header subscription
36+
37+
## Overview
38+
39+
The Peer Manager serves as the central coordination point for peer selection in the SHREX protocol. It aggregates peers from two primary sources:
40+
41+
1. **ShrEx/Sub notifications**: Peers that announce specific data availability through the pubsub system
42+
2. **Discovery service**: Peers found through DHT-based discovery mechanisms
43+
44+
The Peer Manager maintains data-hash-specific pools and a general pool of discovered nodes, enabling efficient peer selection for data retrieval while implementing validation, cooldown, and blacklisting mechanisms to maintain network quality.
45+
46+
## Core Components
47+
48+
### Header Subscription
49+
50+
1. **Subscription Setup**: Manager MUST subscribe to header updates during start
51+
2. **Pool Validation**: Each received header MUST trigger validation of corresponding pool
52+
3. **Height Tracking**:
53+
- First header sets `initialHeight`
54+
- Each header updates `storeFrom`
55+
56+
### ShrEx/Sub
57+
58+
1. **Validator Registration**: Manager MUST register as message validator
59+
2. **Message Processing**: All ShrEx/Sub notifications MUST pass through manager validation
60+
3. **Peer Collection**: Valid notifications MUST add peers to appropriate pools
61+
62+
### Discovery
63+
64+
The Peer Manager MUST expose `UpdateNodePool` for discovery:
65+
66+
1. **Peer Addition**: Discovery MUST call `UpdateNodePool` with `isAdded=true` for new peers
67+
2. **Peer Removal**: Discovery MUST call `UpdateNodePool` with `isAdded=false` for removed peers
68+
3. **Blacklist Check**: Blacklisted peers MUST NOT be added to discovered nodes pool
69+
70+
## Protocol Specification
71+
72+
### Peer Pools
73+
74+
The Peer Manager maintains two distinct types of peer collections:
75+
76+
#### Data Hash Pools
77+
78+
- **Purpose**: Store peers organized by specific data hashes they have announced
79+
- **Structure**: Map of data hash strings to sync pools
80+
- **Validation**: Each pool tracks whether its associated data hash has been validated
81+
- **Lifecycle**: Pools are created on-demand when notifications arrive and removed during garbage collection
82+
- **Capacity**: The manager stores pools for the most recent heights only (controlled by `storedPoolsAmount`, default: 10)
83+
84+
#### Discovered Nodes Pool
85+
86+
- **Purpose**: Store peers found through discovery service, independent of specific data
87+
- **Usage**: Fallback option when no data-hash-specific peers are available
88+
- **Management**: Peers are added from discovery and removed on disconnection
89+
90+
### Parameters
91+
92+
The Peer Manager operates with the following configurable parameters:
93+
94+
#### PoolValidationTimeout
95+
96+
- **Type**: Duration
97+
- **Purpose**: Maximum time allowed for a pool to receive validation through header subscription
98+
- **Rationale**: Pools that do not receive corresponding headers within this timeout are considered invalid, and their peers are blacklisted
99+
100+
#### PeerCooldown
101+
102+
- **Type**: Duration
103+
- **Purpose**: Duration a peer remains unavailable after being marked for cooldown
104+
- **Rationale**: Allows temporary removal of unreliable peers without permanent blacklisting
105+
106+
#### GcInterval
107+
108+
- **Type**: Duration
109+
- **Purpose**: Interval between garbage collection cycles
110+
- **Rationale**: Regular cleanup prevents memory growth from outdated or invalid pools
111+
112+
#### EnableBlackListing
113+
114+
- **Type**: Boolean
115+
- **Purpose**: Feature flag to enable or disable peer blacklisting functionality
116+
- **Rationale**: Allows testing and gradual rollout of blacklisting mechanisms
117+
118+
### Result Types
119+
120+
#### ResultNoop
121+
122+
- **Value**: "result_noop"
123+
- **Meaning**: Operation completed successfully with no additional action required
124+
- **Effect**: No state changes in the Peer Manager
125+
126+
#### ResultCooldownPeer
127+
128+
- **Value**: "result_cooldown_peer"
129+
- **Meaning**: Peer should be temporarily unavailable for selection
130+
- **Effect**: Peer is placed on cooldown for the configured duration
131+
- **Use Case**: Temporary issues like timeouts or transient errors
132+
133+
#### ResultBlacklistPeer
134+
135+
- **Value**: "result_blacklist_peer"
136+
- **Meaning**: Peer has misbehaved and should be permanently blocked
137+
- **Effect**: Peer is added to blacklist, disconnected, and blocked from future connections
138+
- **Use Case**: Malicious behavior, invalid data, or protocol violations
139+
140+
## Manager Operations
141+
142+
### Peer Selection
143+
144+
The Peer Manager implements a prioritized peer selection strategy:
145+
146+
#### Selection Priority
147+
148+
1. **First Priority**: Peers from validated data hash pool
149+
- Peers that have announced the specific data hash being requested
150+
- Must be from a pool validated through header subscription
151+
- Provides highest confidence of data availability
152+
153+
2. **Second Priority**: Discovered nodes pool
154+
- General-purpose peers found through discovery
155+
- Used when no data-hash-specific peers are available
156+
- May or may not have the requested data
157+
158+
3. **Blocking Wait**: If no peers available from either source
159+
- Block until a peer becomes available from either source
160+
- Return first available peer
161+
- Subject to context timeout
162+
163+
#### Peer Validation Before Return
164+
165+
Before returning a peer, the manager MUST verify:
166+
167+
- Peer is not blacklisted
168+
- Peer has an active connection
169+
- If validation fails, peer is removed from pool and selection is retried
170+
171+
### Pool Management
172+
173+
#### Pool Creation
174+
175+
- Pools MUST be created lazily when first notification arrives for a data hash
176+
- Each pool MUST store the associated height and creation timestamp
177+
- Pools MUST initialize with unvalidated status
178+
179+
#### Pool Validation
180+
181+
- Pools MUST be validated when a corresponding header arrives from header subscription
182+
- Validation MUST be performed by comparing data hash and height
183+
- Once validated, all peers in the pool MUST be added to discovered nodes pool
184+
- Validation status MUST be atomic to prevent race conditions
185+
186+
#### Pool Storage Limits
187+
188+
- The manager MUST only store pools for recent heights
189+
- The `storeFrom` threshold MUST be updated based on latest header height
190+
- Pools below the threshold MUST be removed during garbage collection
191+
- Default storage depth is 10 most recent heights (`storedPoolsAmount`)
192+
193+
### Validation
194+
195+
The Peer Manager implements the `MessageValidator` interface for ShrEx/Sub:
196+
197+
#### Validation Rules
198+
199+
1. **Self Messages**: Messages from the node itself MUST be accepted without validation
200+
201+
2. **Blacklisted Hash Check**: Messages containing blacklisted hashes MUST be rejected
202+
203+
3. **Blacklisted Peer Check**: Messages from blacklisted peers MUST be rejected
204+
205+
4. **Height Check**: Messages for heights below `storeFrom` threshold MUST be ignored
206+
207+
5. **Peer Collection**: Valid messages MUST result in peer being added to corresponding pool
208+
209+
6. **Discovered Nodes Addition**: If pool is already validated, peer MUST be immediately added to discovered nodes pool
210+
211+
#### Validation Results
212+
213+
- **Accept**: Only for self-originated messages
214+
- **Reject**: For blacklisted peers or hashes
215+
- **Ignore**: For all other cases (valid messages and old heights)
216+
217+
### Garbage Collection
218+
219+
The Peer Manager MUST implement periodic garbage collection:
220+
221+
#### GC Trigger
222+
223+
- Garbage collection MUST run at intervals specified by `GcInterval` parameter
224+
- GC MUST continue until manager shutdown
225+
226+
#### GC Operations
227+
228+
1. **Validated Pool Cleanup**:
229+
- Remove pools for heights below `storeFrom` threshold
230+
- Keep recently validated pools
231+
232+
2. **Unvalidated Pool Timeout**:
233+
- Identify pools older than `PoolValidationTimeout`
234+
- Pools below `initialHeight` cannot be validated and MUST be removed
235+
- Timeout pools that should have been validated but were not
236+
237+
3. **Blacklisting**:
238+
- Timed-out pool data hashes MUST be blacklisted
239+
- All peers from timed-out pools MUST be collected for blacklisting
240+
- Blacklisting MUST occur after GC cycle completes
241+
242+
#### Initial Height Requirement
243+
244+
- GC MUST NOT blacklist peers until `initialHeight` is set
245+
246+
### Blacklisting
247+
248+
The Peer Manager implements a blacklisting mechanism for misbehaving peers:
249+
250+
#### Blacklist Reasons
251+
252+
- **reasonMisbehave**: Peer reported as misbehaving through `ResultBlacklistPeer`
253+
- **reasonInvalidHash**: Peer announced data hash that was never validated
254+
255+
#### Blacklist Actions
256+
257+
When blacklisting is enabled:
258+
259+
1. Peer MUST be removed from discovered nodes pool
260+
2. Peer MUST be blocked via connection gater to prevent future connections
261+
3. All existing connections to peer MUST be closed
262+
4. Peer MUST remain blocked permanently (until node restart)
263+
264+
### Connection Management
265+
266+
The Peer Manager MUST monitor peer connectivity:
267+
268+
#### Disconnection Handling
269+
270+
- Manager MUST subscribe to libp2p connectedness events
271+
- When peer disconnects (connectedness becomes `NotConnected`):
272+
- Peer MUST be removed from discovered nodes pool
273+
- Peer remains in data hash pools until GC or validation failure
274+
275+
#### Connection Validation
276+
277+
- Before returning a peer, manager MUST verify active connection exists
278+
- Disconnected peers MUST be removed from pools and selection retried
279+
280+
## References
281+
282+
1**ShrEx/Sub Specification**: (see shrex-sub.md)
283+
2**Discovery Specification**: (see discovery.md)
284+
3**libp2p Connection Gater**: <https://github.com/libp2p/go-libp2p/blob/master/core/connmgr/gater.go>
285+
286+
## Requirements Language
287+
288+
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119.

0 commit comments

Comments
 (0)