feat: add ability to configure tes error cache expire time#1751
feat: add ability to configure tes error cache expire time#1751
Conversation
|
Binary incompatibility detected for commit 1f80692. com.aws.greengrass.tes.CredentialRequestHandler is binary incompatible and is source incompatible because of FIELD_REMOVED Produced by binaryCompatability.py |
|
Unit Tests Coverage Report
Minimum allowed coverage is Generated by 🐒 cobertura-action against 1f80692 |
|
Integration Tests Coverage Report
Minimum allowed coverage is Generated by 🐒 cobertura-action against 1f80692 |
src/main/java/com/aws/greengrass/tes/CredentialRequestHandler.java
Outdated
Show resolved
Hide resolved
… avoid multiple restarts
7b3fd9b to
732599d
Compare
| if (node != null && (node.childOf(PORT_TOPIC) | ||
| || node.childOf(CLOUD_4XX_ERROR_CACHE_TOPIC) | ||
| || node.childOf(CLOUD_5XX_ERROR_CACHE_TOPIC) | ||
| || node.childOf(UNKNOWN_ERROR_CACHE_TOPIC))) { |
There was a problem hiding this comment.
Nit: We don't need to execute this callback if the what happened events are irrelevant to this. For eg, see how we don't take an action if the what happened event doesn't show a change in value.
|
We had an offline discussion about if the component should enter errored state when configuration is not valid, instead of printing an error but proceeding to succeed operation by using the minimum value rather than the configured value. We agreed it is preferable to enter errored state. So that the component provides feedback to greengrass about the configuration not being valid, let greengrass decide how to proceed, and provide immediate feedback to the customer about the invalid configuration. Currently there is only one way for TES config to be updated, that is through deployments. In this case greengrass can fail the deployment and roll back to a previously working state if configured to do so. In the future if we introduce the ability for TES config to be updated outside of a deployment (i.e. during runtime) (e.g. by calling UpdateConfiguration for TES from some custom component), then it seems like entering errored state would cause the device to become unhealthy and not auto-recover at that time. This problem is left to be addressed in the future. Update: this decision was changed, see #1751 (comment) |
| public static final int CLOUD_5XX_ERROR_CACHE_IN_MIN = 1; | ||
| public static final int UNKNOWN_ERROR_CACHE_IN_MIN = 5; | ||
| public static final int TIME_BEFORE_CACHE_EXPIRE_IN_SEC = 300; | ||
| public static final int CLOUD_4XX_ERROR_CACHE_IN_SEC = 120; |
There was a problem hiding this comment.
nit: DEFAULT_CLOUD_4XX_ERROR_CACHE_IN_SEC , etc.
a870a69 to
d36ab58
Compare
|
as a follow up to #1751 (comment) , It was further discussed and decided that we want to make the TES component less likely to break because of its importance to Greengrass and deployments (currently, a broken TES component prevents future deployments from working). So we prefer to log the error message and succeed in running with a valid value of our choosing. Part of the justification for why this is okay is that these cache configs are within our service domain and thus customers are unlikely to have a problem if we adjust the value for them to make it valid. There is still a communication/feedback problem in that a customer will not know that this is happening unless they happen to check or detect this in their logs, and that could be improved in the future by introducing some kind of communication channel or mechanism for non-blocking errors like this. |
src/main/java/com/aws/greengrass/tes/CredentialRequestHandler.java
Outdated
Show resolved
Hide resolved
added the ability to configure TES error cache expire time, TES wouldn't restart when error cache is changed and if the time is less than 10 secs or greater than 12 hrs it will fallback to default time.
Issue #, if available:
Description of changes:
Add ability for users to configure TES error cache expire time
Why is this change necessary:
How was this change tested:
Any additional information or context required to review the change:
Documentation Checklist:
Compatibility Checklist:
any deprecated method or type.
Refer to Compatibility Guidelines for more information.
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.