You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm not an expert in this domain, so if anyone thinks he's wrong at something, I'm all ears.
1. Closing connections
He mentions that they do not close connections from server anymore and do it like this:
server send close message (e.g {action: "close"}) to client and starts a timeout
(A) client receives close message, knows how to handle it and closes the connection
(B) client went rogue, doesn't close - server timeout finishes and closes connection as a backup mechanism
Reason: according to TCP spec, party that initiates connection close ends in a TIME-WAIT state but according to him, Linux might consume a file descriptor for up to 2 minutes in that state. Server needs them a lot more than client.
2. Rolling out updates
If you have a WS cluster, limit connection lifetime (for example ~30min) and teach clients to reconnect. Randomise connection lifetime per connection (e.g 28-32min), otherwise clients reconnection times might get synced and you get a peak interval.
Reason: if clients are infinitely sticky to one server, it's hard to update your code/cluster. You could just turn off the old cluster but then all the clients will try to reconnect at the same time which can end very badly.
3. Fail without fear
If you have a WS cluster, prefer smaller instances, i.e servers with less CPU and memory.
Reason: When one of them dies, clients will try to reconnect. Smaller instance holds less connections and therefore you don't get as big reconnection spike. Otherwise instance death might bring down your whole cluster.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
-
Interesting Websocket related talk by Netflix engineer:
https://www.youtube.com/watch?v=6w6E_B55p0E
I'm not an expert in this domain, so if anyone thinks he's wrong at something, I'm all ears.
1. Closing connections
He mentions that they do not close connections from server anymore and do it like this:
{action: "close"}
) to client and starts a timeoutReason: according to TCP spec, party that initiates connection close ends in a
TIME-WAIT
state but according to him, Linux might consume a file descriptor for up to 2 minutes in that state. Server needs them a lot more than client.2. Rolling out updates
If you have a WS cluster, limit connection lifetime (for example ~30min) and teach clients to reconnect. Randomise connection lifetime per connection (e.g 28-32min), otherwise clients reconnection times might get synced and you get a peak interval.
Reason: if clients are infinitely sticky to one server, it's hard to update your code/cluster. You could just turn off the old cluster but then all the clients will try to reconnect at the same time which can end very badly.
3. Fail without fear
If you have a WS cluster, prefer smaller instances, i.e servers with less CPU and memory.
Reason: When one of them dies, clients will try to reconnect. Smaller instance holds less connections and therefore you don't get as big reconnection spike. Otherwise instance death might bring down your whole cluster.
Beta Was this translation helpful? Give feedback.
All reactions