[roku] Fix jobs not starting if initial connection fails#20214
[roku] Fix jobs not starting if initial connection fails#20214mlobstein wants to merge 4 commits intoopenhab:mainfrom
Conversation
Signed-off-by: Michael Lobstein <michael.lobstein@gmail.com>
There was a problem hiding this comment.
Pull request overview
Fixes an initialization regression in the Roku binding where scheduled polling jobs could fail to start (or never recover) when initial hostname resolution/connection fails, by moving initialization responsibilities into the periodic refresh flow and retrying hostname resolution.
Changes:
- Start automatic refresh and app-list refresh immediately during handler initialization (independent of initial connection success).
- Add
resolveHostName()and call it from refresh/command paths to retry hostname resolution until successful. - Refactor the initial
getDeviceInfo()property population intorefreshPlayerState()and gate it withdeviceInfoLoaded.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
...penhab.binding.roku/src/main/java/org/openhab/binding/roku/internal/handler/RokuHandler.java
Outdated
Show resolved
Hide resolved
...penhab.binding.roku/src/main/java/org/openhab/binding/roku/internal/handler/RokuHandler.java
Outdated
Show resolved
Hide resolved
...penhab.binding.roku/src/main/java/org/openhab/binding/roku/internal/handler/RokuHandler.java
Outdated
Show resolved
Hide resolved
| try { | ||
| resolvedHost = InetAddress.getByName(resolvedHost).getHostAddress(); | ||
| communicator = new RokuCommunicator(httpClient, resolvedHost, port); | ||
| } catch (UnknownHostException e) { | ||
| logger.debug("Unable to resolve host", e); | ||
| logger.debug("Unable to resolve hostname", e); | ||
| updateStatus(ThingStatus.OFFLINE, ThingStatusDetail.COMMUNICATION_ERROR, "Cannot resolve hostname"); | ||
| } catch (RokuHttpException e) { | ||
| logger.debug("Unable to retrieve Roku device-info. Exception: {}", e.getMessage(), e); | ||
| updateStatus(ThingStatus.OFFLINE, ThingStatusDetail.COMMUNICATION_ERROR, "Cannot connect to device"); | ||
| return false; |
There was a problem hiding this comment.
resolveHostName() updates Thing status to OFFLINE on every failed resolution attempt. Since it is called on each refresh/command, this can generate repeated status events and noisy logs while DNS is down. Consider only calling updateStatus(...) when the status/detail/message actually changes, and/or adding a retry backoff to resolution attempts.
There was a problem hiding this comment.
AFIAK it is not an issue to call updateStatus() with the same status repeatedly.
There was a problem hiding this comment.
It depends on if you see this as a transient or a permanent error.
I guess you could makt it as a configuration error and put the thing offline permanent until the configuration has been updated.
There was a problem hiding this comment.
I was thinking that it would be a transient error that would occur when the thing is started (system reboot in most cases) but the network is down preventing the hostname from being resolved. In that case it should re-try in the hopes that the network comes up. But of course it could also be due to an incorrectly entered host name in which case it would be permanent.
I just tested both scenarios and unfortunately the exception message is the same in each circumstance:
java.net.UnknownHostException: No such host is known (roku)
at java.net.Inet4AddressImpl.lookupAllHostAddr(Native Method) ~[?:?]
...
As such I don't think there is a readily available way to determine the nature of the error in this case.
There was a problem hiding this comment.
Ok, another question. What if the address is resolved and cached, but the DNS is changed at some point in time.
This will render the thing Offline, while all it should do is to resolve the host again. Currently the user needs to manually disable/enable the thing to get it resolved.
Fixing this might make it more complex and the current way of handling this might be a good tradeoff, but like to mention this before we move forward.
There was a problem hiding this comment.
That is a good point... I have a made a fix for that. But as I was going to commit, it occurred to me that this logic is getting very complex in the handler and it might be better to move all of this down into the communicator at the point of use. That would help maintain the separation of concerns. WDYT?
Edit: Made additional changes to do the resolution in the RokuCommunicator instead.
Signed-off-by: Michael Lobstein <michael.lobstein@gmail.com>
Signed-off-by: Michael Lobstein <michael.lobstein@gmail.com>
|
@mlobstein I tested this and confirmed everything works as expected on my end. |
Signed-off-by: Michael Lobstein <michael.lobstein@gmail.com>
Follow-up to #20208
The previous changes have a problem where if the hostname resolution or initial connection request fails, then the scheduled jobs that poll the device will not be started. This MR corrects that and improves the hostname resolution so that it will re-try until successful. The redundant getDeviceInfo() call was also refactored to be done inside refreshPlayerState().
@realPyR3X, Please test if you can.