|
| 1 | +# Gossip Rate Limiting Configuration Guide |
| 2 | + |
| 3 | +When running a Lightning node, one of the most critical yet often overlooked |
| 4 | +aspects is properly configuring the gossip rate limiting system. This guide will |
| 5 | +help you understand how LND manages outbound gossip traffic and how to tune |
| 6 | +these settings for your specific needs. |
| 7 | + |
| 8 | +## Understanding Gossip Rate Limiting |
| 9 | + |
| 10 | +At its core, LND uses a token bucket algorithm to control how much bandwidth it |
| 11 | +dedicates to sending gossip messages to other nodes. Think of it as a bucket |
| 12 | +that fills with tokens at a steady rate. Each time your node sends a gossip |
| 13 | +message, it consumes tokens equal to the message size. If the bucket runs dry, |
| 14 | +messages must wait until enough tokens accumulate. |
| 15 | + |
| 16 | +This system serves an important purpose: it prevents any single peer, or group |
| 17 | +of peers, from overwhelming your node's network resources. Without rate |
| 18 | +limiting, a misbehaving peer could request your entire channel graph repeatedly, |
| 19 | +consuming all your bandwidth and preventing normal operation. |
| 20 | + |
| 21 | +## Core Configuration Options |
| 22 | + |
| 23 | +The gossip rate limiting system has several configuration options that work |
| 24 | +together to control your node's behavior. |
| 25 | + |
| 26 | +### Setting the Sustained Rate: gossip.msg-rate-bytes |
| 27 | + |
| 28 | +The most fundamental setting is `gossip.msg-rate-bytes`, which determines how |
| 29 | +many bytes per second your node will allocate to outbound gossip messages. This |
| 30 | +rate is shared across all connected peers, not per-peer. |
| 31 | + |
| 32 | +The default value of 102,400 bytes per second (100 KB/s) works well for most |
| 33 | +nodes, but you may need to adjust it based on your situation. Setting this value |
| 34 | +too low can cause serious problems. When the rate limit is exhausted, peers |
| 35 | +waiting to synchronize must queue up, potentially waiting minutes between |
| 36 | +messages. Values below 50 KB/s can make initial synchronization fail entirely, |
| 37 | +as peers timeout before receiving the data they need. |
| 38 | + |
| 39 | +### Managing Burst Capacity: gossip.msg-burst-bytes |
| 40 | + |
| 41 | +The burst capacity, configured via `gossip.msg-burst-bytes`, determines the |
| 42 | +initial capacity of your token bucket. This value must be greater than |
| 43 | +`gossip.msg-rate-bytes` for the rate limiter to function properly. The burst |
| 44 | +capacity represents the maximum number of bytes that can be sent immediately |
| 45 | +when the bucket is full. |
| 46 | + |
| 47 | +The default of 204,800 bytes (200 KB) is set to be double the default rate |
| 48 | +(100 KB/s), providing a good balance. This ensures that when the rate limiter |
| 49 | +starts or after a period of inactivity, you can send up to 200 KB worth of |
| 50 | +messages immediately before rate limiting kicks in. Any single message larger |
| 51 | +than this value can never be sent, regardless of how long you wait. |
| 52 | + |
| 53 | +### Controlling Concurrent Operations: gossip.filter-concurrency |
| 54 | + |
| 55 | +When peers apply gossip filters to request specific channel updates, these |
| 56 | +operations can consume significant resources. The `gossip.filter-concurrency` |
| 57 | +setting limits how many of these operations can run simultaneously. The default |
| 58 | +value of 5 provides a reasonable balance between resource usage and |
| 59 | +responsiveness. |
| 60 | + |
| 61 | +Large routing nodes handling many simultaneous peer connections might benefit |
| 62 | +from increasing this value to 10 or 15, while resource-constrained nodes should |
| 63 | +keep it at the default or even reduce it slightly. |
| 64 | + |
| 65 | +### Understanding Connection Limits: num-restricted-slots |
| 66 | + |
| 67 | +The `num-restricted-slots` configuration deserves special attention because it |
| 68 | +directly affects your gossip bandwidth requirements. This setting limits inbound |
| 69 | +connections, but not in the way you might expect. |
| 70 | + |
| 71 | +LND maintains a three-tier system for peer connections. Peers you've ever had |
| 72 | +channels with enjoy "protected" status and can always connect. Peers currently |
| 73 | +opening channels with you have "temporary" status. Everyone else—new peers |
| 74 | +without channels—must compete for the limited "restricted" slots. |
| 75 | + |
| 76 | +When a new peer without channels connects inbound, they consume one restricted |
| 77 | +slot. If all slots are full, additional peers are turned away. However, as soon |
| 78 | +as a restricted peer begins opening a channel, they're upgraded to temporary |
| 79 | +status, freeing their slot. This creates breathing room for large nodes to form |
| 80 | +new channel relationships without constantly rejecting connections. |
| 81 | + |
| 82 | +The relationship between restricted slots and rate limiting is straightforward: |
| 83 | +more allowed connections mean more peers requesting data, requiring more |
| 84 | +bandwidth. A reasonable rule of thumb is to allocate at least 1 KB/s of rate |
| 85 | +limit per restricted slot. |
| 86 | + |
| 87 | +## Calculating Appropriate Values |
| 88 | + |
| 89 | +To set these values correctly, you need to understand your node's position in |
| 90 | +the network and its typical workload. The fundamental question is: how much |
| 91 | +gossip traffic does your node actually need to handle? |
| 92 | + |
| 93 | +Start by considering how many peers typically connect to your node. A hobbyist |
| 94 | +node might have 10-20 connections, while a well-connected routing node could |
| 95 | +easily exceed 100. Each peer generates gossip traffic when syncing channel |
| 96 | +updates, announcing new channels, or requesting historical data. |
| 97 | + |
| 98 | +The calculation itself is straightforward. Take your average message size |
| 99 | +(approximately 210 bytes for gossip messages), multiply by your peer count and |
| 100 | +expected message frequency, then add a safety factor for traffic spikes. Since |
| 101 | +each channel generates approximately 842 bytes of bandwidth (including both |
| 102 | +channel announcements and updates), you can also calculate based on your |
| 103 | +channel count. Here's the formula: |
| 104 | + |
| 105 | +``` |
| 106 | +rate = avg_msg_size × peer_count × msgs_per_second × safety_factor |
| 107 | +``` |
| 108 | + |
| 109 | +Let's walk through some real-world examples to make this concrete. |
| 110 | + |
| 111 | +For a small node with 15 peers, you might see 10 messages per peer per second |
| 112 | +during normal operation. With an average message size of 210 bytes and a safety |
| 113 | +factor of 1.5, you'd need about 47 KB/s. Rounding up to 50 KB/s provides |
| 114 | +comfortable headroom. |
| 115 | + |
| 116 | +A medium-sized node with 75 peers faces different challenges. These nodes often |
| 117 | +relay more traffic and handle more frequent updates. With 15 messages per peer |
| 118 | +per second, the calculation yields about 237 KB/s. Setting the limit to 250 KB/s |
| 119 | +ensures smooth operation without waste. |
| 120 | + |
| 121 | +Large routing nodes require the most careful consideration. With 150 or more |
| 122 | +peers and high message frequency, bandwidth requirements can exceed 1 MB/s. |
| 123 | +These nodes form the backbone of the Lightning Network and need generous |
| 124 | +allocations to serve their peers effectively. |
| 125 | + |
| 126 | +Remember that the relationship between restricted slots and rate limiting is |
| 127 | +direct: each additional slot potentially adds another peer requesting data. Plan |
| 128 | +for at least 1 KB/s per restricted slot to maintain healthy synchronization. |
| 129 | + |
| 130 | +## Network Size and Geography |
| 131 | + |
| 132 | +The Lightning Network's growth directly impacts your gossip bandwidth needs. |
| 133 | +With over 80,000 public channels at the time of writing, each generating |
| 134 | +multiple updates daily, the volume of gossip traffic continues to increase. A |
| 135 | +channel update occurs whenever a node adjusts its fees, changes its routing |
| 136 | +policy, or goes offline temporarily. During volatile market conditions or fee |
| 137 | +market adjustments, update frequency can spike dramatically. |
| 138 | + |
| 139 | +Geographic distribution adds another layer of complexity. If your node connects |
| 140 | +to peers across continents, the inherent network latency affects how quickly you |
| 141 | +can exchange messages. However, this primarily impacts initial connection |
| 142 | +establishment rather than ongoing rate limiting. |
| 143 | + |
| 144 | +## Troubleshooting Common Issues |
| 145 | + |
| 146 | +When rate limiting isn't configured properly, the symptoms are often subtle at |
| 147 | +first but can cascade into serious problems. |
| 148 | + |
| 149 | +The most common issue is slow initial synchronization. New peers attempting to |
| 150 | +download your channel graph experience long delays between messages. You'll see |
| 151 | +entries in your logs like "rate limiting gossip replies, responding in 30s" or |
| 152 | +even longer delays. This happens because the rate limiter has exhausted its |
| 153 | +tokens and must wait for refill. The solution is straightforward: increase your |
| 154 | +msg-rate-bytes setting. |
| 155 | + |
| 156 | +Peer disconnections present a more serious problem. When peers wait too long for |
| 157 | +gossip responses, they may timeout and disconnect. This creates a vicious cycle |
| 158 | +where peers repeatedly connect, attempt to sync, timeout, and reconnect. Look |
| 159 | +for "peer timeout" errors in your logs. If you see these, you need to increase |
| 160 | +your rate limit. |
| 161 | + |
| 162 | +Sometimes you'll notice unusually high CPU usage from your LND process. This |
| 163 | +often indicates that many goroutines are blocked waiting for rate limiter |
| 164 | +tokens. The rate limiter must constantly calculate delays and manage waiting |
| 165 | +threads. Increasing the rate limit reduces this contention and lowers CPU usage. |
| 166 | + |
| 167 | +To debug these issues, focus on your LND logs rather than high-level commands. |
| 168 | +Search for "rate limiting" messages to understand how often delays occur and how |
| 169 | +long they last. Look for patterns in peer disconnections that might correlate |
| 170 | +with rate limiting delays. The specific commands that matter are: |
| 171 | + |
| 172 | +```bash |
| 173 | +# View peer connections and sync state |
| 174 | +lncli listpeers | grep -A5 "sync_type" |
| 175 | + |
| 176 | +# Check recent rate limiting events |
| 177 | +grep "rate limiting" ~/.lnd/logs/bitcoin/mainnet/lnd.log | tail -20 |
| 178 | +``` |
| 179 | + |
| 180 | +Pay attention to log entries showing "Timestamp range queue full" if you've |
| 181 | +implemented the queue-based approach—this indicates your system is shedding load |
| 182 | +due to overwhelming demand. |
| 183 | + |
| 184 | +## Best Practices for Configuration |
| 185 | + |
| 186 | +Experience has shown that starting with conservative (higher) rate limits and |
| 187 | +reducing them if needed works better than starting too low and debugging |
| 188 | +problems. It's much easier to notice excess bandwidth usage than to diagnose |
| 189 | +subtle synchronization failures. |
| 190 | + |
| 191 | +Monitor your node's actual bandwidth usage and sync times after making changes. |
| 192 | +Most operating systems provide tools to track network usage per process. When |
| 193 | +adjusting settings, make gradual changes of 25-50% rather than dramatic shifts. |
| 194 | +This helps you understand the impact of each change and find the sweet spot for |
| 195 | +your setup. |
| 196 | + |
| 197 | +Keep your burst size at least double the largest message size you expect to |
| 198 | +send. While the default 200 KB is usually sufficient, monitor your logs for any |
| 199 | +"message too large" errors that would indicate a need to increase this value. |
| 200 | + |
| 201 | +As your node grows and attracts more peers, revisit these settings periodically. |
| 202 | +What works for 50 peers may cause problems with 150 peers. Regular review |
| 203 | +prevents gradual degradation as conditions change. |
| 204 | + |
| 205 | +## Configuration Examples |
| 206 | + |
| 207 | +For most users running a personal node, conservative settings provide reliable |
| 208 | +operation without excessive resource usage: |
| 209 | + |
| 210 | +``` |
| 211 | +[Application Options] |
| 212 | +gossip.msg-rate-bytes=204800 |
| 213 | +gossip.msg-burst-bytes=409600 |
| 214 | +gossip.filter-concurrency=5 |
| 215 | +num-restricted-slots=100 |
| 216 | +``` |
| 217 | + |
| 218 | +Well-connected nodes that route payments regularly need more generous |
| 219 | +allocations: |
| 220 | + |
| 221 | +``` |
| 222 | +[Application Options] |
| 223 | +gossip.msg-rate-bytes=524288 |
| 224 | +gossip.msg-burst-bytes=1048576 |
| 225 | +gossip.filter-concurrency=10 |
| 226 | +num-restricted-slots=200 |
| 227 | +``` |
| 228 | + |
| 229 | +Large routing nodes at the heart of the network require the most resources: |
| 230 | + |
| 231 | +``` |
| 232 | +[Application Options] |
| 233 | +gossip.msg-rate-bytes=1048576 |
| 234 | +gossip.msg-burst-bytes=2097152 |
| 235 | +gossip.filter-concurrency=15 |
| 236 | +num-restricted-slots=300 |
| 237 | +``` |
| 238 | + |
| 239 | +## Critical Warning About Low Values |
| 240 | + |
| 241 | +Setting `gossip.msg-rate-bytes` below 50 KB/s creates serious operational |
| 242 | +problems that may not be immediately obvious. Initial synchronization, which |
| 243 | +typically transfers 10-20 MB of channel graph data, can take hours or fail |
| 244 | +entirely. Peers appear to connect but remain stuck in a synchronization loop, |
| 245 | +never completing their initial download. |
| 246 | + |
| 247 | +Your channel graph remains perpetually outdated, causing routing failures as you |
| 248 | +attempt to use channels that have closed or changed their fee policies. The |
| 249 | +gossip subsystem appears to work, but operates so slowly that it cannot keep |
| 250 | +pace with network changes. |
| 251 | + |
| 252 | +During normal operation, a well-connected node processes hundreds of channel |
| 253 | +updates per minute. Each update is small, but they add up quickly. Factor in |
| 254 | +occasional bursts during network-wide fee adjustments or major routing node |
| 255 | +policy changes, and you need substantial headroom above the theoretical minimum. |
| 256 | + |
| 257 | +The absolute minimum viable configuration requires at least enough bandwidth to |
| 258 | +complete initial sync in under an hour and process ongoing updates without |
| 259 | +falling behind. This translates to no less than 50 KB/s for even the smallest |
| 260 | +nodes. |
0 commit comments