Skip to content

Commit 22e5bee

Browse files
MB-67804: Make sure cb_dist:select does not block net_kernel
All erlang proto modules try to resolve the hostname when select() is called. This can block the net_kernel and lead to node start problem (while we are waiting for dns couchdb timeout expires). At the same time, the name resolution is not really needed in cb_dist:select() as it already knows the protocol (afamily and encryption) that it will use. Change-Id: If0c7af5298ed04c872165b4efbf6e7d7a8c21ad4 Reviewed-on: https://review.couchbase.org/c/ns_server/+/231603 Well-Formed: Build Bot <[email protected]> Tested-by: Timofey Barmin <[email protected]> Well-Formed: Restriction Checker Reviewed-by: Peter Searby <[email protected]>
1 parent 87a79eb commit 22e5bee

File tree

1 file changed

+12
-6
lines changed

1 file changed

+12
-6
lines changed

src/cb_dist.erl

Lines changed: 12 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -143,12 +143,18 @@ accept_connection(_, {ConRef, HandshakeProcPid, Module, ConnectionSocket},
143143

144144
-spec select(Node :: atom()) -> true | false.
145145
select(Node) ->
146-
try get_preferred_dist(Node) of
147-
Module -> Module:select(Node)
148-
catch
149-
_:Error ->
150-
error_msg("Select for ~p failed. Couldn't find preferred proto: ~p",
151-
[Node, Error]),
146+
case dist_util:split_node(Node) of
147+
{node, _Name, _Host} ->
148+
%% Not proxying select() to preferred proto to avoid blocking
149+
%% net_kernel (select() is called by net_kernel).
150+
%% Select in inet_tcp_dist (and other protocols) can block
151+
%% because it resolves the hostname.
152+
%% It seems like the name resolution is not really needed here
153+
%% as we already know the protocol (afamily and encryption) that
154+
%% we want to use.
155+
true;
156+
_ ->
157+
error_msg("Select failed. Invalid node name: ~p", [Node]),
152158
false
153159
end.
154160

0 commit comments

Comments
 (0)