Conversation
| - We will limit the support scope to Linux-based nodes and focus on Ubuntu distro for now. | ||
| This is because Ubuntu is the widely and commonly available Linux distribution | ||
| across the target environments. |
There was a problem hiding this comment.
I think we should consider focusing on Ubuntu and Azure Linux from the start.
There was a problem hiding this comment.
for 3p clouds do we want to offer AzLinux?
| * `containerd` w/ 2.0+ version; | ||
| * `runc` |
| * TLS bootstrap configurations; | ||
| * Other cloud provider binaries; | ||
| - NFTables / IPtables installed for Kubernetes network policies; | ||
| - Network forward, IP masquerade and bridge settings configured for Kubernetes networking; |
There was a problem hiding this comment.
yeah I think we need to check what kind of VPN components are needed here. But for now I don't have a concrete answer so I left it out
| - Detailed GPU device plugin requirements and enablement strategies will be addressed in | ||
| a separate document. | ||
|
|
||
| ## Baseline Environment Requirements |
There was a problem hiding this comment.
Maybe it would be possible to more clearly separate (a) binaries and configuration, and (b) stuff that could/should be baked in (if we're baking) and stuff that might not be?
docs/node-env.md
Outdated
|
|
||
| **Expected behaviors**: | ||
|
|
||
| - Produced image is **immutable** and **reproducible** giving the same inputs. |
There was a problem hiding this comment.
Immutable? Well, it's an image, and we don't prevent files being modified at runtime?
Reproducible? As far as the contents of the filesystem, maybe -- byte for byte at an image level is probably hard.
There was a problem hiding this comment.
immutable means we don't update the same published version after that. Reproducible means we can rebuild the same image with same set of components / binaries in any time. The other files inside the image that are not critical to the kubelet functionalitity, then it's fine to be drifted.
There was a problem hiding this comment.
will find a way to document this part in the doc. I called them out because I have seen a few incidents in agent baker side caused by changed of the inputs (artifact naming change from package registry for example) caused outage. If we can find a good contract to limit and pin the critical components, then incidents like that could be avoided. cc @cameronmeissner
There was a problem hiding this comment.
added two footnotes to explain the terms here
|
|
||
| **Inputs**: | ||
|
|
||
| - Cluster endpoint (API server URL, CA bundle) |
There was a problem hiding this comment.
Perhaps network/VPN configuration/credentials?
There was a problem hiding this comment.
Perhaps identity credentials
There was a problem hiding this comment.
added two more items to cover them
|
|
||
| - All failure handling mechanisms from both Node VHD Image Baking and Node Bootstrapping | ||
|
|
||
| ### Node Rebooting & Repairing |
There was a problem hiding this comment.
Is any of the stuff below really in scope for this document?
There was a problem hiding this comment.
I added these operations to make sure the tool we are building here will support for all these scenarios.
| * `containerd` w/ 2.0+ version; | ||
| * `runc` | ||
| - Kubernetes components: | ||
| * `kubelet` matching with the target worker node version; |
There was a problem hiding this comment.
Will images be Kubernetes-version specific, or will one image support multiple k8s versions?
There was a problem hiding this comment.
This is actually a debatable question, in current agent baker, we bake multiple k8s versions as AKS has multiple versions support. But I think for flex node, we can do in a more limited way so we can reduce the support matrix while providing more stable functionality.
docs/node-env.md
Outdated
| - Kubernetes components: | ||
| * `kubelet` matching with the target worker node version; | ||
| * Control plane public CA certificate(s); | ||
| * TLS bootstrap configurations; |
There was a problem hiding this comment.
for flex nodes, it looks like we plan on using bootstrap tokens (at least as a start) - have we thought about mechanisms by which we could avoid bootstrap tokens that would work across all clouds / on-prem environments? (this is probably a project in and of itself, though just curious)
There was a problem hiding this comment.
not every cloud support the same set of features, hence I just put "TLS bootstrap configurations" here. It can be static token or Arc or something similar to secure TLS bootstrap setup.
|
|
||
| ### Additional Requirements | ||
|
|
||
| - Node identity for identifying and authenticating the node to cluster control plane; |
There was a problem hiding this comment.
node identity - are you referring to the client certificate obtained through TLS bootstrapping here?
|
|
||
| ### Node VHD Image Baking | ||
|
|
||
| **Purpose**: Produce a base node image (VHD or similar) that satisfies baseline |
There was a problem hiding this comment.
have we thought about how distribution of said base node image would look like?
There was a problem hiding this comment.
yeah unfortunately, different clouds will have different ways for doing this. I guess we will end up with maintaining a couple of supported node images in every cloud
|
|
||
| ### Node Bootstrapping w/ Baking | ||
|
|
||
| **Purpose**: In environments without pre-baked images, the bootstrapping process |
There was a problem hiding this comment.
Artur and I have talked about embedding the provisioning scripts we have in AgentBaker directly into aks-node-controller to accomplish this - I haven't looked around the rest of this repo yet, but it seems like you're dealing with component installation all natively in Golang from the start? If so you wouldn't need to do something like script embedding, though just thought that would be worth mentioning
There was a problem hiding this comment.
I think we will end up with having a binary embeded to the node for doing things similar to the aks-node-controller
No description provided.