Skip to content

Commit cfea9d3

Browse files
authored
Merge pull request #1018 from szabosteve/cp.selectors
2 parents f49c2ce + fadb815 commit cfea9d3

File tree

2 files changed

+124
-79
lines changed

2 files changed

+124
-79
lines changed

docs/connection-pool.asciidoc

Lines changed: 82 additions & 55 deletions
Original file line numberDiff line numberDiff line change
@@ -1,37 +1,44 @@
11
[[connection_pool]]
22
== Connection Pool
33

4-
The connection pool is an object inside the client that is responsible for maintaining the current list of nodes.
5-
Theoretically, nodes are either dead or alive.
6-
7-
However, in the real world, things are never so clear. Nodes are sometimes in a gray-zone of _"probably dead but not
8-
confirmed"_, _"timed-out but unclear why"_ or _"recently dead but now alive"_. The connection pool's job is to
9-
manage this set of unruly connections and try to provide the best behavior to the client.
10-
11-
If a connection pool is unable to find an alive node to query against, it will return a `NoNodesAvailableException`.
12-
This is distinct from an exception due to maximum retries. For example, your cluster may have 10 nodes. You execute
13-
a request and 9 out of the 10 nodes fail due to connection timeouts. The tenth node succeeds and the query executes.
14-
The first nine nodes will be marked dead (depending on the connection pool being used) and their "dead" timers will begin
4+
The connection pool is an object inside the client that is responsible for
5+
maintaining the current list of nodes. Theoretically, nodes are either dead or
6+
alive. However, in the real world, things are never so clear. Nodes are
7+
sometimes in a gray-zone of _"probably dead but not confirmed"_, _"timed-out but
8+
unclear why"_ or _"recently dead but now alive"_. The job of the connection pool
9+
is to manage this set of unruly connections and try to provide the best behavior
10+
to the client.
11+
12+
If a connection pool is unable to find an alive node to query against, it
13+
returns a `NoNodesAvailableException`. This is distinct from an exception due to
14+
maximum retries. For example, your cluster may have 10 nodes. You execute a
15+
request and 9 out of the 10 nodes fail due to connection timeouts. The tenth
16+
node succeeds and the query executes. The first nine nodes are marked dead
17+
(depending on the connection pool being used) and their "dead" timers begin
1518
ticking.
1619

17-
When the next request is sent to the client, nodes 1-9 are still considered "dead", so they will be skipped. The request
18-
is sent to the only known alive node (#10), and if this node fails, a `NoNodesAvailableException` is returned. You'll note
19-
this is much less than the `retries` value, because `retries` only applies to retries against alive nodes. In this case,
20-
only one node is known to be alive, so `NoNodesAvailableException` is returned.
21-
20+
When the next request is sent to the client, nodes 1-9 are still considered
21+
"dead", so they are skipped. The request is sent to the only known alive node
22+
(#10), if this node fails, a `NoNodesAvailableException` is returned. You
23+
will note this much less than the `retries` value, because `retries` only
24+
applies to retries against alive nodes. In this case, only one node is known to
25+
be alive, so `NoNodesAvailableException` is returned.
2226

2327
There are several connection pool implementations that you can choose from:
2428

29+
2530
=== staticNoPingConnectionPool (default)
2631

27-
This connection pool maintains a static list of hosts, which are assumed to be alive when the client initializes. If
28-
a node fails a request, it is marked as `dead` for 60 seconds and the next node is tried. After 60 seconds, the node
29-
is revived and put back into rotation. Each additional failed request will cause the dead timeout to increase exponentially.
32+
This connection pool maintains a static list of hosts which are assumed to be
33+
alive when the client initializes. If a node fails a request, it is marked as
34+
`dead` for 60 seconds and the next node is tried. After 60 seconds, the node is
35+
revived and put back into rotation. Each additional failed request causes the
36+
dead timeout to increase exponentially.
3037

31-
A successful request will reset the "failed ping timeout" counter.
38+
A successful request resets the "failed ping timeout" counter.
3239

33-
If you wish to explicitly set the `StaticNoPingConnectionPool` implementation, you may do so with the `setConnectionPool()`
34-
method of the ClientBuilder object:
40+
If you wish to explicitly set the `StaticNoPingConnectionPool` implementation,
41+
you may do so with the `setConnectionPool()` method of the ClientBuilder object:
3542

3643
[source,php]
3744
----
@@ -42,10 +49,13 @@ $client = ClientBuilder::create()
4249

4350
Note that the implementation is specified via a namespace path to the class.
4451

52+
4553
=== staticConnectionPool
4654

47-
Identical to the `StaticNoPingConnectionPool`, except it pings nodes before they are used to determine if they are alive.
48-
This may be useful for long-running scripts, but tends to be additional overhead that is unnecessary for average PHP scripts.
55+
Identical to the `StaticNoPingConnectionPool`, except it pings nodes before they
56+
are used to determine if they are alive. This may be useful for long-running
57+
scripts but tends to be additional overhead that is unnecessary for average PHP
58+
scripts.
4959

5060
To use the `StaticConnectionPool`:
5161

@@ -58,13 +68,15 @@ $client = ClientBuilder::create()
5868

5969
Note that the implementation is specified via a namespace path to the class.
6070

71+
6172
=== simpleConnectionPool
6273

63-
The `SimpleConnectionPool` simply returns the next node as specified by the Selector; it does not perform track
64-
the "liveness" of nodes. This pool will return nodes whether they are alive or dead. It is just a simple pool of static
65-
hosts.
74+
The `SimpleConnectionPool` returns the next node as specified by the selector;
75+
it does not track node conditions. It returns nodes either they are dead or
76+
alive. It is a simple pool of static hosts.
6677

67-
The `SimpleConnectionPool` is not recommended for routine use, but it may be a useful debugging tool.
78+
The `SimpleConnectionPool` is not recommended for routine use but it may be a
79+
useful debugging tool.
6880

6981
To use the `SimpleConnectionPool`:
7082

@@ -77,11 +89,13 @@ $client = ClientBuilder::create()
7789

7890
Note that the implementation is specified via a namespace path to the class.
7991

92+
8093
=== sniffingConnectionPool
8194

82-
Unlike the two previous static connection pools, this one is dynamic. The user provides a seed list of hosts, which the
83-
client uses to "sniff" and discover the rest of the cluster. It achieves this through the Cluster State API. As new
84-
nodes are added or removed from the cluster, the client will update it's pool of active connections.
95+
Unlike the two previous static connection pools, this one is dynamic. The user
96+
provides a seed list of hosts, which the client uses to "sniff" and discover the
97+
rest of the cluster by using the Cluster State API. As new nodes are added or
98+
removed from the cluster, the client updates its pool of active connections.
8599

86100
To use the `SniffingConnectionPool`:
87101

@@ -97,7 +111,8 @@ Note that the implementation is specified via a namespace path to the class.
97111

98112
=== Custom Connection Pool
99113

100-
If you wish to implement your own custom Connection Pool, your class must implement `ConnectionPoolInterface`:
114+
If you wish to implement your own custom Connection Pool, your class must
115+
implement `ConnectionPoolInterface`:
101116

102117
[source,php]
103118
----
@@ -124,7 +139,9 @@ class MyCustomConnectionPool implements ConnectionPoolInterface
124139
}
125140
----
126141

127-
You can then instantiate an instance of your ConnectionPool and inject it into the ClientBuilder:
142+
143+
You can then instantiate an instance of your ConnectionPool and inject it into
144+
the ClientBuilder:
128145

129146
[source,php]
130147
----
@@ -135,9 +152,11 @@ $client = ClientBuilder::create()
135152
->build();
136153
----
137154

138-
If your connection pool only makes minor changes, you may consider extending `AbstractConnectionPool`, which provides
139-
some helper concrete methods. If you choose to go down this route, you need to make sure your ConnectionPool's implementation
140-
has a compatible constructor (since it is not defined in the interface):
155+
If your connection pool only makes minor changes, you may consider extending
156+
`AbstractConnectionPool` which provides some helper concrete methods. If you
157+
choose to go down this route, you need to make sure your ConnectionPool
158+
implementation has a compatible constructor (since it is not defined in the
159+
interface):
141160

142161
[source,php]
143162
----
@@ -169,7 +188,9 @@ class MyCustomConnectionPool extends AbstractConnectionPool implements Connectio
169188
}
170189
----
171190

172-
If your constructor matches AbstractConnectionPool, you may use either object injection or namespace instantiation:
191+
192+
If your constructor matches AbstractConnectionPool, you may use either object
193+
injection or namespace instantiation:
173194

174195
[source,php]
175196
----
@@ -184,21 +205,27 @@ $client = ClientBuilder::create()
184205

185206
=== Which connection pool to choose? PHP and connection pooling
186207

187-
At first glance, the `sniffingConnectionPool` implementation seems superior. For many languages, it is. In PHP, the
188-
conversation is a bit more nuanced.
189-
190-
Because PHP is a share-nothing architecture, there is no way to maintain a connection pool across script instances.
191-
This means that every script is responsible for creating, maintaining, and destroying connections everytime the script
192-
is re-run.
193-
194-
Sniffing is a relatively lightweight operation (one API call to `/_cluster/state`, followed by pings to each node) but
195-
it may be a non-negligible overhead for certain PHP applications. The average PHP script will likely load the client,
196-
execute a few queries and then close. Imagine this script being called 1000 times per second: the sniffing connection
197-
pool will perform the sniffing and pinging process 1000 times per second. The sniffing process will add a large
198-
amount of overhead
199-
200-
In reality, if your script only executes a few queries, the sniffing concept is _too_ robust. It tends to be more
201-
useful in long-lived processes which potentially "out-live" a static list.
202-
203-
For this reason the default connection pool is currently the `staticNoPingConnectionPool`. You can, of course, change
204-
this default - but we strongly recommend you load test and verify that it does not negatively impact your performance.
208+
At first glance, the `sniffingConnectionPool` implementation seems superior. For
209+
many languages, it is. In PHP, the conversation is a bit more nuanced.
210+
211+
Because PHP is a share-nothing architecture, there is no way to maintain a
212+
connection pool across script instances. This means that every script is
213+
responsible for creating, maintaining, and destroying connections everytime the
214+
script is re-run.
215+
216+
Sniffing is a relatively lightweight operation (one API call to
217+
`/_cluster/state`, followed by pings to each node) but it may be a
218+
non-negligible overhead for certain PHP applications. The average PHP script
219+
likely loads the client, executes a few queries and then closes. Imagine that
220+
this script being called 1000 times per second: the sniffing connection pool
221+
performS the sniffing and pinging process 1000 times per second. The sniffing
222+
process eventually adds a large amount of overhead.
223+
224+
In reality, if your script only executes a few queries, the sniffing concept is
225+
_too_ robust. It tends to be more useful in long-lived processes which
226+
potentially "out-live" a static list.
227+
228+
For this reason the default connection pool is currently the
229+
`staticNoPingConnectionPool`. You can, of course, change this default - but we
230+
strongly recommend you to perform load test and to verify that the change does
231+
not negatively impact the performance.

docs/selectors.asciidoc

Lines changed: 42 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -1,19 +1,24 @@
11
[[selectors]]
22
== Selectors
33

4-
The connection pool maintains the list of connections, and decides when nodes should transition from alive to dead (and
5-
vice versa). It has no logic to choose connections, however. That job belongs to the Selector class.
4+
The connection pool maintains the list of connections, and decides when nodes
5+
should transition from alive to dead (and vice versa). It has no logic to choose
6+
connections, however. That job belongs to the selector class.
7+
8+
The job of a selector is to return a single connection from a provided array of
9+
connections. Like the connection pool, there are several implementations to
10+
choose from.
611

7-
The selector's job is to return a single connection from a provided array of connections. Like the Connection Pool,
8-
there are several implementations to choose from.
912

1013
=== RoundRobinSelector (Default)
1114

12-
This selector returns connections in a round-robin fashion. Node #1 is selected on the first request, Node #2 on
13-
the second request, etc. This ensures an even load of traffic across your cluster. Round-robin'ing happens on a
14-
per-request basis (e.g. sequential requests go to different nodes).
15+
This selector returns connections in a round-robin fashion. Node #1 is selected
16+
on the first request, Node #2 on the second request, and so on. This ensures an
17+
even load of traffic across your cluster. Round-robining happens on a
18+
per-request basis (for example sequential requests go to different nodes).
1519

16-
The `RoundRobinSelector` is default, but if you wish to explicitily configure it you can do:
20+
The `RoundRobinSelector` is default but if you wish to explicitly configure it
21+
you can do:
1722

1823
[source,php]
1924
----
@@ -24,21 +29,28 @@ $client = ClientBuilder::create()
2429

2530
Note that the implementation is specified via a namespace path to the class.
2631

27-
=== StickyRoundRobinSelector
28-
29-
This selector is "sticky", in that it prefers to reuse the same connection repeatedly. For example, Node #1 is chosen
30-
on the first request. Node #1 will continue to be re-used for each subsequent request until that node fails. Upon failure,
31-
the selector will round-robin to the next available node, then "stick" to that node.
32-
33-
This is an ideal strategy for many PHP scripts. Since PHP scripts are shared-nothing and tend to exit quickly, creating
34-
new connections for each request is often a sub-optimal strategy and introduces a lot of overhead. Instead, it is
35-
better to "stick" to a single connection for the duration of the script.
3632

37-
By default, this selector will randomize the hosts upon initialization, which will still guarantee an even distribution
38-
of load across the cluster. It changes the round-robin dynamics from per-request to per-script.
33+
=== StickyRoundRobinSelector
3934

40-
If you are using <<future_mode>>, the "sticky" behavior of this selector will be non-ideal, since all parallel requests
41-
will go to the same node instead of multiple nodes in your cluster. When using future mode, the default `RoundRobinSelector`
35+
This selector is "sticky", so that it prefers to reuse the same connection
36+
repeatedly. For example, Node #1 is chosen on the first request. Node #1 will
37+
continue to be re-used for each subsequent request until that node fails. Upon
38+
failure, the selector will round-robin to the next available node, then "stick"
39+
to that node.
40+
41+
This is an ideal strategy for many PHP scripts. Since PHP scripts are
42+
shared-nothing and tend to exit quickly, creating new connections for each
43+
request is often a sub-optimal strategy and introduces a lot of overhead.
44+
Instead, it is better to "stick" to a single connection for the duration of the
45+
script.
46+
47+
By default, this selector randomizes the hosts upon initialization which still
48+
guarantees an even load distribution across the cluster. It changes the
49+
round-robin dynamics from per-request to per-script.
50+
51+
If you are using <<future_mode>>, the "sticky" behavior of this selector is
52+
non-ideal, since all parallel requests go to the same node instead of multiple
53+
nodes in your cluster. When using future mode, the default `RoundRobinSelector`
4254
should be preferred.
4355

4456
If you wish to use this selector, you may do so with:
@@ -52,9 +64,11 @@ $client = ClientBuilder::create()
5264

5365
Note that the implementation is specified via a namespace path to the class.
5466

67+
5568
=== RandomSelector
5669

57-
This selector simply returns a random node, regardless of state. It is generally just for testing.
70+
This selector returns a random node, regardless of state. It is generally just
71+
for testing.
5872

5973
If you wish to use this selector, you may do so with:
6074

@@ -67,9 +81,11 @@ $client = ClientBuilder::create()
6781

6882
Note that the implementation is specified via a namespace path to the class.
6983

84+
7085
=== Custom Selector
7186

72-
You can implement your own custom selector. Custom selectors must implement `SelectorInterface`
87+
You can implement your own custom selector. Custom selectors must implement
88+
`SelectorInterface`:
7389

7490
[source,php]
7591
----
@@ -97,7 +113,9 @@ class MyCustomSelector implements SelectorInterface
97113
----
98114
{zwsp} +
99115

100-
You can then use your custom selector either via object injection or namespace instantiation:
116+
117+
You can then use your custom selector either via object injection or namespace
118+
instantiation:
101119

102120
[source,php]
103121
----

0 commit comments

Comments
 (0)