|
| 1 | +# Peer Manager Specification for SHREX Protocol |
| 2 | + |
| 3 | +## Abstract |
| 4 | + |
| 5 | +This specification defines the Peer Manager component for the SHREX protocol in the Celestia network. The Peer Manager is responsible for collecting, organizing, validating, and selecting peers for efficient data retrieval operations based on data availability notifications and peer discovery. |
| 6 | + |
| 7 | +## Table of Contents |
| 8 | + |
| 9 | +- [Terminology](#terminology) |
| 10 | +- [Overview](#overview) |
| 11 | +- [Core Components](#core-components) |
| 12 | +- [Protocol Specification](#protocol-specification) |
| 13 | + - [Peer Pools](#peer-pools) |
| 14 | + - [Parameters](#parameters) |
| 15 | + - [Result Types](#result-types) |
| 16 | +- [Manager Operations](#manager-operations) |
| 17 | + - [Peer Selection](#peer-selection) |
| 18 | + - [Pool Management](#pool-management) |
| 19 | + - [Validation](#validation) |
| 20 | + - [Garbage Collection](#garbage-collection) |
| 21 | + - [Blacklisting](#blacklisting) |
| 22 | +- [References](#references) |
| 23 | +- [Requirements Language](#requirements-language) |
| 24 | + |
| 25 | +## Terminology |
| 26 | + |
| 27 | +- **Peer Manager**: The component responsible for collecting, organizing, and providing peers for data retrieval operations |
| 28 | +- **Peer Pool**: A collection of peer IDs organized by data hash, storing peers known to have specific data |
| 29 | +- **Sync Pool**: A pool with additional metadata including validation status, height, and creation time |
| 30 | +- **Validated Pool**: A peer pool that has been confirmed through header subscription to contain legitimate data |
| 31 | +- **Cooldown**: A temporary state where a peer is unavailable for selection but not permanently blocked |
| 32 | +- **Blacklist**: A permanent block list preventing all future communication with specific peers |
| 33 | +- **Discovered Nodes**: Peers found through the discovery service, independent of specific data hashes |
| 34 | +- **Initial Height**: The height of the first header received from header subscription, used as a validation baseline |
| 35 | +- **Store From**: The biggest height received from header subscription |
| 36 | + |
| 37 | +## Overview |
| 38 | + |
| 39 | +The Peer Manager serves as the central coordination point for peer selection in the SHREX protocol. It aggregates peers from two primary sources: |
| 40 | + |
| 41 | +1. **ShrEx/Sub notifications**: Peers that announce specific data availability through the pubsub system |
| 42 | +2. **Discovery service**: Peers found through DHT-based discovery mechanisms |
| 43 | + |
| 44 | +The Peer Manager maintains data-hash-specific pools and a general pool of discovered nodes, enabling efficient peer selection for data retrieval while implementing validation, cooldown, and blacklisting mechanisms to maintain network quality. |
| 45 | + |
| 46 | +## Core Components |
| 47 | + |
| 48 | +### Header Subscription |
| 49 | + |
| 50 | +1. **Subscription Setup**: Manager MUST subscribe to header updates during start |
| 51 | +2. **Pool Validation**: Each received header MUST trigger validation of corresponding pool |
| 52 | +3. **Height Tracking**: |
| 53 | + - First header sets `initialHeight` |
| 54 | + - Each header updates `storeFrom` |
| 55 | + |
| 56 | +### ShrEx/Sub |
| 57 | + |
| 58 | +1. **Validator Registration**: Manager MUST register as message validator |
| 59 | +2. **Message Processing**: All ShrEx/Sub notifications MUST pass through manager validation |
| 60 | +3. **Peer Collection**: Valid notifications MUST add peers to appropriate pools |
| 61 | + |
| 62 | +### Discovery |
| 63 | + |
| 64 | +The Peer Manager MUST expose `UpdateNodePool` for discovery: |
| 65 | + |
| 66 | +1. **Peer Addition**: Discovery MUST call `UpdateNodePool` with `isAdded=true` for new peers |
| 67 | +2. **Peer Removal**: Discovery MUST call `UpdateNodePool` with `isAdded=false` for removed peers |
| 68 | +3. **Blacklist Check**: Blacklisted peers MUST NOT be added to discovered nodes pool |
| 69 | + |
| 70 | +## Protocol Specification |
| 71 | + |
| 72 | +### Peer Pools |
| 73 | + |
| 74 | +The Peer Manager maintains two distinct types of peer collections: |
| 75 | + |
| 76 | +#### Data Hash Pools |
| 77 | + |
| 78 | +- **Purpose**: Store peers organized by specific data hashes they have announced |
| 79 | +- **Structure**: Map of data hash strings to sync pools |
| 80 | +- **Validation**: Each pool tracks whether its associated data hash has been validated |
| 81 | +- **Lifecycle**: Pools are created on-demand when notifications arrive and removed during garbage collection |
| 82 | +- **Capacity**: The manager stores pools for the most recent heights only (controlled by `storedPoolsAmount`, default: 10) |
| 83 | + |
| 84 | +#### Discovered Nodes Pool |
| 85 | + |
| 86 | +- **Purpose**: Store peers found through discovery service, independent of specific data |
| 87 | +- **Usage**: Fallback option when no data-hash-specific peers are available |
| 88 | +- **Management**: Peers are added from discovery and removed on disconnection |
| 89 | + |
| 90 | +### Parameters |
| 91 | + |
| 92 | +The Peer Manager operates with the following configurable parameters: |
| 93 | + |
| 94 | +#### PoolValidationTimeout |
| 95 | + |
| 96 | +- **Type**: Duration |
| 97 | +- **Purpose**: Maximum time allowed for a pool to receive validation through header subscription |
| 98 | +- **Rationale**: Pools that do not receive corresponding headers within this timeout are considered invalid, and their peers are blacklisted |
| 99 | + |
| 100 | +#### PeerCooldown |
| 101 | + |
| 102 | +- **Type**: Duration |
| 103 | +- **Purpose**: Duration a peer remains unavailable after being marked for cooldown |
| 104 | +- **Rationale**: Allows temporary removal of unreliable peers without permanent blacklisting |
| 105 | + |
| 106 | +#### GcInterval |
| 107 | + |
| 108 | +- **Type**: Duration |
| 109 | +- **Purpose**: Interval between garbage collection cycles |
| 110 | +- **Rationale**: Regular cleanup prevents memory growth from outdated or invalid pools |
| 111 | + |
| 112 | +#### EnableBlackListing |
| 113 | + |
| 114 | +- **Type**: Boolean |
| 115 | +- **Purpose**: Feature flag to enable or disable peer blacklisting functionality |
| 116 | +- **Rationale**: Allows testing and gradual rollout of blacklisting mechanisms |
| 117 | + |
| 118 | +### Result Types |
| 119 | + |
| 120 | +#### ResultNoop |
| 121 | + |
| 122 | +- **Value**: "result_noop" |
| 123 | +- **Meaning**: Operation completed successfully with no additional action required |
| 124 | +- **Effect**: No state changes in the Peer Manager |
| 125 | + |
| 126 | +#### ResultCooldownPeer |
| 127 | + |
| 128 | +- **Value**: "result_cooldown_peer" |
| 129 | +- **Meaning**: Peer should be temporarily unavailable for selection |
| 130 | +- **Effect**: Peer is placed on cooldown for the configured duration |
| 131 | +- **Use Case**: Temporary issues like timeouts or transient errors |
| 132 | + |
| 133 | +#### ResultBlacklistPeer |
| 134 | + |
| 135 | +- **Value**: "result_blacklist_peer" |
| 136 | +- **Meaning**: Peer has misbehaved and should be permanently blocked |
| 137 | +- **Effect**: Peer is added to blacklist, disconnected, and blocked from future connections |
| 138 | +- **Use Case**: Malicious behavior, invalid data, or protocol violations |
| 139 | + |
| 140 | +## Manager Operations |
| 141 | + |
| 142 | +### Peer Selection |
| 143 | + |
| 144 | +The Peer Manager implements a prioritized peer selection strategy: |
| 145 | + |
| 146 | +#### Selection Priority |
| 147 | + |
| 148 | +1. **First Priority**: Peers from validated data hash pool |
| 149 | + - Peers that have announced the specific data hash being requested |
| 150 | + - Must be from a pool validated through header subscription |
| 151 | + - Provides highest confidence of data availability |
| 152 | + |
| 153 | +2. **Second Priority**: Discovered nodes pool |
| 154 | + - General-purpose peers found through discovery |
| 155 | + - Used when no data-hash-specific peers are available |
| 156 | + - May or may not have the requested data |
| 157 | + |
| 158 | +3. **Blocking Wait**: If no peers available from either source |
| 159 | + - Block until a peer becomes available from either source |
| 160 | + - Return first available peer |
| 161 | + - Subject to context timeout |
| 162 | + |
| 163 | +#### Peer Validation Before Return |
| 164 | + |
| 165 | +Before returning a peer, the manager MUST verify: |
| 166 | + |
| 167 | +- Peer is not blacklisted |
| 168 | +- Peer has an active connection |
| 169 | +- If validation fails, peer is removed from pool and selection is retried |
| 170 | + |
| 171 | +### Pool Management |
| 172 | + |
| 173 | +#### Pool Creation |
| 174 | + |
| 175 | +- Pools MUST be created lazily when first notification arrives for a data hash |
| 176 | +- Each pool MUST store the associated height and creation timestamp |
| 177 | +- Pools MUST initialize with unvalidated status |
| 178 | + |
| 179 | +#### Pool Validation |
| 180 | + |
| 181 | +- Pools MUST be validated when a corresponding header arrives from header subscription |
| 182 | +- Validation MUST be performed by comparing data hash and height |
| 183 | +- Once validated, all peers in the pool MUST be added to discovered nodes pool |
| 184 | +- Validation status MUST be atomic to prevent race conditions |
| 185 | + |
| 186 | +#### Pool Storage Limits |
| 187 | + |
| 188 | +- The manager MUST only store pools for recent heights |
| 189 | +- The `storeFrom` threshold MUST be updated based on latest header height |
| 190 | +- Pools below the threshold MUST be removed during garbage collection |
| 191 | +- Default storage depth is 10 most recent heights (`storedPoolsAmount`) |
| 192 | + |
| 193 | +### Validation |
| 194 | + |
| 195 | +The Peer Manager implements the `MessageValidator` interface for ShrEx/Sub: |
| 196 | + |
| 197 | +#### Validation Rules |
| 198 | + |
| 199 | +1. **Self Messages**: Messages from the node itself MUST be accepted without validation |
| 200 | + |
| 201 | +2. **Blacklisted Hash Check**: Messages containing blacklisted hashes MUST be rejected |
| 202 | + |
| 203 | +3. **Blacklisted Peer Check**: Messages from blacklisted peers MUST be rejected |
| 204 | + |
| 205 | +4. **Height Check**: Messages for heights below `storeFrom` threshold MUST be ignored |
| 206 | + |
| 207 | +5. **Peer Collection**: Valid messages MUST result in peer being added to corresponding pool |
| 208 | + |
| 209 | +6. **Discovered Nodes Addition**: If pool is already validated, peer MUST be immediately added to discovered nodes pool |
| 210 | + |
| 211 | +#### Validation Results |
| 212 | + |
| 213 | +- **Accept**: Only for self-originated messages |
| 214 | +- **Reject**: For blacklisted peers or hashes |
| 215 | +- **Ignore**: For all other cases (valid messages and old heights) |
| 216 | + |
| 217 | +### Garbage Collection |
| 218 | + |
| 219 | +The Peer Manager MUST implement periodic garbage collection: |
| 220 | + |
| 221 | +#### GC Trigger |
| 222 | + |
| 223 | +- Garbage collection MUST run at intervals specified by `GcInterval` parameter |
| 224 | +- GC MUST continue until manager shutdown |
| 225 | + |
| 226 | +#### GC Operations |
| 227 | + |
| 228 | +1. **Validated Pool Cleanup**: |
| 229 | + - Remove pools for heights below `storeFrom` threshold |
| 230 | + - Keep recently validated pools |
| 231 | + |
| 232 | +2. **Unvalidated Pool Timeout**: |
| 233 | + - Identify pools older than `PoolValidationTimeout` |
| 234 | + - Pools below `initialHeight` cannot be validated and MUST be removed |
| 235 | + - Timeout pools that should have been validated but were not |
| 236 | + |
| 237 | +3. **Blacklisting**: |
| 238 | + - Timed-out pool data hashes MUST be blacklisted |
| 239 | + - All peers from timed-out pools MUST be collected for blacklisting |
| 240 | + - Blacklisting MUST occur after GC cycle completes |
| 241 | + |
| 242 | +#### Initial Height Requirement |
| 243 | + |
| 244 | +- GC MUST NOT blacklist peers until `initialHeight` is set |
| 245 | + |
| 246 | +### Blacklisting |
| 247 | + |
| 248 | +The Peer Manager implements a blacklisting mechanism for misbehaving peers: |
| 249 | + |
| 250 | +#### Blacklist Reasons |
| 251 | + |
| 252 | +- **reasonMisbehave**: Peer reported as misbehaving through `ResultBlacklistPeer` |
| 253 | +- **reasonInvalidHash**: Peer announced data hash that was never validated |
| 254 | + |
| 255 | +#### Blacklist Actions |
| 256 | + |
| 257 | +When blacklisting is enabled: |
| 258 | + |
| 259 | +1. Peer MUST be removed from discovered nodes pool |
| 260 | +2. Peer MUST be blocked via connection gater to prevent future connections |
| 261 | +3. All existing connections to peer MUST be closed |
| 262 | +4. Peer MUST remain blocked permanently (until node restart) |
| 263 | + |
| 264 | +### Connection Management |
| 265 | + |
| 266 | +The Peer Manager MUST monitor peer connectivity: |
| 267 | + |
| 268 | +#### Disconnection Handling |
| 269 | + |
| 270 | +- Manager MUST subscribe to libp2p connectedness events |
| 271 | +- When peer disconnects (connectedness becomes `NotConnected`): |
| 272 | + - Peer MUST be removed from discovered nodes pool |
| 273 | + - Peer remains in data hash pools until GC or validation failure |
| 274 | + |
| 275 | +#### Connection Validation |
| 276 | + |
| 277 | +- Before returning a peer, manager MUST verify active connection exists |
| 278 | +- Disconnected peers MUST be removed from pools and selection retried |
| 279 | + |
| 280 | +## References |
| 281 | + |
| 282 | +1**ShrEx/Sub Specification**: (see shrex-sub.md) |
| 283 | +2**Discovery Specification**: (see discovery.md) |
| 284 | +3**libp2p Connection Gater**: <https://github.com/libp2p/go-libp2p/blob/master/core/connmgr/gater.go> |
| 285 | + |
| 286 | +## Requirements Language |
| 287 | + |
| 288 | +The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119. |
0 commit comments