@@ -81,3 +81,57 @@ be specialized per statement by setting :attr:`.Statement.retry_policy`.
8181Retries are presently attempted on the same coordinator, but this may change in the future.
8282
8383Please see :class :`.policies.RetryPolicy` for further details.
84+
85+ How to fight connection storms ?
86+ --------------------------------
87+
88+ Scylla employs a shard- per- core architecture to efficiently utilize CPU , memory, and cache.
89+ Data and load are effectively sharded not only across hosts but also across individual shards.
90+
91+ Every ScyllaDB driver routes queries to a specific shard responsible for the requested data.
92+ This eliminates the need for routing logic on the server side.
93+ To support this, the driver maintains a separate connection to every shard in the cluster.
94+
95+ The downside of this approach is the total number of connections that each host in the cluster must handle.
96+ Typically, the number of connections on a given node can be calculated as :
97+ number of clients × number of shards.
98+
99+ For example, a node with 64 shards and 1 ,000 clients would handle 64 ,000 connections.
100+ In production clusters, the number of clients can reach hundreds of thousands, and nodes may need to handle up to 2 million connections.
101+ Once established, these connections consume very few resources.
102+
103+ However, when nodes or clients are restarted, all these connections must be re- established.
104+ The process of establishing a new connection involves several steps:
105+
106+ 1 . Establishing a TCP / TLS connection
107+ 2 . Protocol negotiation
108+ 3 . Authentication
109+ 4 . Running discovery queries: `SELECT * FROM system.local` and `SELECT * FROM system.peers`
110+
111+ If any of these steps fail, the connection is dropped, and a new attempt is made to connect to the same shard, restarting the process.
112+
113+ When a large number of clients attempt to open connections to the same node simultaneously, it can lead to:
114+
115+ 1 . CPU consumption spikes
116+ 2 . Latency spikes
117+ 3 . Service unavailability
118+ 4 . Clients failing to create and initialize connections
119+
120+ To avoid these problems, it is important to limit the rate at which clients establish connections to nodes.
121+
122+ For this purpose, we introduced the `ShardConnectionBackoffPolicy` specifically, the `LimitedConcurrencyShardConnectionBackoffPolicy` .
123+ This policy does two things:
124+
125+ 1 . Introduces a backoff delay between each connection the driver creates to a host
126+ 2 . Limits the number of pending connection attempts
127+
128+ For example, with a configuration allowing only `1 ` pending connection and a backoff of `0.1 ` seconds:
129+ .. code- block:: python
130+
131+ cluster = Cluster(
132+ shard_connection_backoff_policy = LimitedConcurrencyShardConnectionBackoffPolicy(
133+ backoff_policy = ConstantShardConnectionBackoffSchedule(0.1 ),
134+ max_concurrent = 1 ,
135+ )
136+ )
137+
0 commit comments