ZOOKEEPER-3100: ZooKeeper client times out due to random choice of resolved addresses#2338
Conversation
|
This code has been this way as long as I can remember - it seems like it could have a significant impact to existing functionality/expectations. eg how do we know that every dns provider will behave as you expect. I would imagine something like this behind a config option, release notes and documentation would be less risky. |
That's good point. Currently there're two issues with this code.
Now both issues are covered by the patch, but if we want to make this optional, which part shall we introduce a config setting?
|
I think we can iterate every resolved addresses in jdk given order so we obey both -1 to new properties. We probably will run into interference or inconsistence with system properties just like ZOOKEEPER-4955. See: |
I don't think we should iterate over all addresses. Thought about this, but I believe it's better to pick one address from DNS according to java system properties and if it fails, immediately move to next server. If a single server fails, it most likely will fail on all addresses, because it's down. In the next iteration when we come back to this server again, DNS will give another IP address if available. |
Are u saying that patch is fine as it is now? |
Good point! +1 to this pr or |
kezhuw
left a comment
There was a problem hiding this comment.
how do we know that every dns provider will behave as you expect. I would imagine something like this behind a config option, release notes and documentation would be less risky.
Introduce zookeeper.shuffleDnsResponse (true by default) and keep the original behavior completely if set to true.
If we are going to introduce option for this, I would prefer to turn shuffle disabled by default. This way zookeeper probably will function consistent in dns with other softwares on the same host.
|
So, number 2) is ...
which is going to be a feature flag which toggles the old behavior. By default, however, should be turned off meaning using the new behavior implemented in this patch. if (shuffleDnsResponse) {
Collections.shuffle(resolvedAddresses);
}@phunt wdyt? |
|
Looks like I've found a problem: |
No, it seems that I was wrong. It was just some issue with IntelliJ and JVM didn't pick up the |
|
I've another idea folks. Wdyt about the following? private InetSocketAddress resolve(InetSocketAddress address) {
try {
String curHostString = address.getHostString();
List<InetAddress> ra = new ArrayList<>(Arrays.asList(this.resolver.getAllByName(curHostString)));
if (ra.isEmpty()) {
return address;
}
Map<Boolean, List<InetAddress>> grouped = ra.stream()
.collect(Collectors.groupingBy(ia -> ia instanceof Inet6Address));
List<InetAddress> resolvedAddresses;
if (Boolean.parseBoolean(System.getProperty("java.net.preferIPv6Addresses")) && !grouped.get(Boolean.TRUE).isEmpty()) {
resolvedAddresses = grouped.get(Boolean.TRUE); // we prefer v6, and we have at least one v6 address available
} else {
resolvedAddresses = grouped.get(Boolean.FALSE); // we prefer v4 or we only have v4 addresses available
}
Collections.shuffle(resolvedAddresses);
return new InetSocketAddress(resolvedAddresses.get(0), address.getPort());
} catch (UnknownHostException e) {
LOG.error("Unable to resolve address: {}", address.toString(), e);
return address;
}
}This way we keep the random pick behavior, but first we group the IPs based on JVM setting. This might be the most direct solution of the problem and it doesn't rely on I think we don't need feature flag with this approach. |
|
+1 - lgtm and esp appreciate the time/effort to maintain the original behavior as possible. Docs are great to see. thx! |
|
Your most recent comment sounds reasonable - but it seems like having a big "on/off" for the new functionality would be the safest. (which is not inconsistent with this iiuc?) |
|
The alternative approach violates "system" of "java.net.preferIPv6Addresses". |
|
Ok. Thanks for the feedback. Let's merge it as it is. |
049a9d4 to
79e3223
Compare
I ended up removing the random shuffle when resolving DNS names. This way we will honor JVM setting
java.net.preferIPv6Addresses, because the order of returned address are already influenced by this setting. If multiple addresses are returned for the same host in the same address type (v4/v6), we'll try each in every round, because DNS already randomize the addresses within the same class.