-
Notifications
You must be signed in to change notification settings - Fork 3
WIP: introduce AgentExitReason and NodeInstanceConnectivityReason enums and use them in conn update messages
#54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
ExitReason enum and use it in Agent/Node Instance connectivity update messagesAgentExitReason and NodeInstanceConnectivityReason enums and use them in conn update messages
|
|
| enum NodeInstanceConnectivityReason { | ||
| // NODE_INSTANCE_CONNECTIVITY_REASON_UNSPECIFIED acts as default value | ||
| NODE_INSTANCE_CONNECTIVITY_REASON_UNSPECIFIED = 0; | ||
| NODE_INSTANCE_CONNECTIVITY_REASON_ONLINE = 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the use case for this reason? We already have liveness property on the message.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeap we could remove it.
The thinking behind this was that this enum is about Connectivity Reasons, not disconnection reasons. So apart from the default (unspecified case), I added the Online reason to be used when UpdateNodeInstanceConnection has liveness true.
I think that if we were to remove it we could rename this as NodeInstanceDisconnectionReason and have only:
- Unspecified
- No Retention
and - Agent Update
cases. Wdyt @car12o
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also think about it as Connectivity Reasons which should include connections and disconnections.
I added the Online reason to be used when UpdateNodeInstanceConnection has liveness true.
I think it makes sense but wouldn't we also need Offline when a node disconnects, and it's not No Retention neither Agent Update? Perhaps instead of NODE_INSTANCE_CONNECTIVITY_REASON_ONLINE we could have something like NODE_INSTANCE_CONNECTIVITY_REASON_NORMAL which could be used when a node gracefully connects or disconnects without it being No Retention neither Agent Update.
wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm now this is the point where ConnectionUpdateSource comes into play further complicating things because CONNECTION_UPDATE_SOURCE_AGENT also means that this was a normal connection / disconnection
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couldn't this be mapped to NODE_INSTANCE_CONNECTIVITY_REASON_NORMAL or do we need a specific Connectivity Reason for that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that if we were to remove it we could rename this as
NodeInstanceDisconnectionReasonand have only:
- Unspecified
- No Retention
and- Agent Update
I agree on keeping these three only, I'd initially add only the ones we have use cases for, keeping UNSPECIFIED for other situations since it's the default value anyway.
Regarding NodeInstanceConnectivityReason name, I think it's ok, I understand it can apply for connections or disconnections
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry forgot to answer to this.
Sounds good to me guys I removed the online case and kept the other 3 :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm ok with having just these 3 but personally I would consider UNSPECIFIED != NORMAL.
This PR introduces two new Enums.
1.
AgentExitReasonThe values are taken from EXIT_REASON enum definition.
A new array field called
exit_reasonsis added toUpdateAgentConnectionproto to let the Cloud know about all the reasons that led to the latest Agent shutdown/disconnection.Those values will be added to relevant feed events to facilitate investigations (of course we could also surface them in the UI if it is needed).
2.
NodeInstanceConnectivityReasonThis enum aims to notify Cloud about important node instance disconnection reasons and enable it to make smarter decisions. Note, that those improvements are highly sought by our users.
So apart from the
NODE_INSTANCE_CONNECTIVITY_REASON_UNSPECIFIEDandNODE_INSTANCE_CONNECTIVITY_REASON_ONLINE, the following two values (that are about disconnections) are added:NODE_INSTANCE_CONNECTIVITY_REASON_NO_RETENTION: This will enable Cloud to remove right away node instances that are rotated out of the Agent's DB. Currently those get disconnected, marked as pruned and 60 days need to go by until the Cloud can remove them. In the meantime user's see them as offline.NODE_INSTANCE_CONNECTIVITY_REASON_AGENT_UPDATE: This will enable Cloud to not send reachability notifications when node instances go offline due to Agent updates.This enum is added to
UpdateNodeInstanceConnectionmessages.