Skip to content

Commit b49be96

Browse files
authored
feat: add AKS support (#90)
1 parent 3295922 commit b49be96

File tree

5 files changed

+131
-18
lines changed

5 files changed

+131
-18
lines changed

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ Put simply: "identical", "peer" nodes that exist _in duplicate_ for the purpose
66

77
# Status of Project
88

9-
The kamino set of tools are currently approaching a v1.0 stable release, tested against Kubernetes running on Azure with VMSS-backed node pools, on clusters built with the [AKS Engine](https://github.com/Azure/aks-engine) tool.
9+
The kamino set of tools are currently approaching a v1.0 stable release, tested against Kubernetes running on Azure with VMSS-backed node pools, on clusters built with the [AKS Engine](https://github.com/Azure/aks-engine) tool. You may also run it as a proof of concept on your AKS cluster, see [here](docs/AKS.md) for more information.
1010

1111
More status [here][status].
1212

docs/AKS.md

Lines changed: 86 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,86 @@
1+
# Kamino on AKS
2+
3+
You can test-drive kamino on your AKS cluster to evaluate the potential value of an optimized OS disk image, with the following, important caveat:
4+
5+
The AKS managed service will eventually overwrite changes that kamino makes to a node pool's underlying VMSS resource:
6+
7+
- kamino-delivered changes to `virtualMachineProfile.storageProfile.imageReference.id` and `virtualMachineProfile.storageProfile.imageReference.resourceGroup` will be reverted the standard AKS OS image maintained by the AKS managed service
8+
- kamino updates to the `virtualMachineProfile.extensionProfile.extensions` array will be reverted
9+
10+
## How to run kamino on your (non-production!) AKS cluster
11+
12+
1. Get the resource group name that your cluster's VMSS are running in. E.g.:
13+
14+
```sh
15+
$ az aks show -n aks-kamino -g aks-kamino | jq -r .nodeResourceGroup
16+
MC_aks-kamino_aks-kamino_westus2
17+
```
18+
19+
2. Get the managed identity resource for your node VMs. E.g.:
20+
21+
```sh
22+
$ az identity list -g MC_aks-kamino_aks-kamino_westus2
23+
[
24+
{
25+
"clientId": "<clientId value>",
26+
"clientSecretUrl": "<clientSecretUrl value>",
27+
"id": "<id value>",
28+
"location": "westus2",
29+
"name": "aks-kamino-agentpool",
30+
"principalId": "<principalId value>",
31+
"resourceGroup": "MC_aks-kamino_aks-kamino_westus2",
32+
"tags": {},
33+
"tenantId": "<tenantId value>",
34+
"type": "Microsoft.ManagedIdentity/userAssignedIdentities"
35+
}
36+
]
37+
```
38+
39+
Now you can give your cluster node pool managed identity resource contributor access to the resource group using the actual value of `principalId` from above (substitute `<principalId value>` below with the actual value):
40+
41+
```sh
42+
$ az role assignment create --assignee <principalId value> --role 'Contributor' --scope /subscriptions/<subscription ID that cluster is in>/resourcegroups/MC_aks-kamino_aks-kamino_westus2
43+
{
44+
"canDelegate": null,
45+
"condition": null,
46+
"conditionVersion": null,
47+
"description": null,
48+
"id": "<id value>",
49+
"name": "<name value>",
50+
"principalId": "<principalId value>",
51+
"principalName": "<principalName value>",
52+
"principalType": "ServicePrincipal",
53+
"resourceGroup": "MC_aks-kamino_aks-kamino_westus2",
54+
"roleDefinitionId": "<roleDefinitionId value>",
55+
"roleDefinitionName": "Contributor",
56+
"scope": "/subscriptions/<subscription ID that cluster is in>/resourceGroups/MC_aks-kamino_aks-kamino_westus2",
57+
"type": "Microsoft.Authorization/roleAssignments"
58+
}
59+
```
60+
61+
This additional access granted to your node pool managed identity allows the kamino runtime access to create the necessary infra in your cluster resource group.
62+
63+
Now you can target a particular node running on your cluster, make an OS image snapshot from its OS image, and then use that OS image as a Shared Image Gallery image to build new VMSS VMs from. This will replicate any pre-pulled container images onto any newly scaled out nodes, as well as remove the need to run any startup scripts. This can demonstrably improve reliability and responsiveness of new node scale out operations.
64+
65+
```sh
66+
$ k get nodes
67+
NAME STATUS ROLES AGE VERSION
68+
aks-nodepool1-68550425-vmss000000 Ready agent 5h9m v1.21.7
69+
aks-nodepool2-35877414-vmss000000 Ready agent 5h v1.21.7
70+
aks-nodepool2-35877414-vmss000002 Ready agent 4h42m v1.21.7
71+
```
72+
73+
From the above set of nodes let's choose `aks-nodepool2-35877414-vmss000000` from nodepool2 to build a new image from, and to use as a base when building any new nodes in nodepool2:
74+
75+
```sh
76+
$ helm install --repo https://jackfrancis.github.io/kamino/ \
77+
update-nodepool2-os-image \
78+
vmss-prototype --namespace default \
79+
--set kamino.targetNode=aks-nodepool2-35877414-vmss000000
80+
```
81+
82+
The above command will schedule the kamino runtime as a pod on any schedulable node other than the target node, and do the needful work.
83+
84+
Again, at present this solution is not designed for production AKS clusters, as the managed service will overwrite the changes. But have fun testing!
85+
86+
A more detailed walkthrough of how kamino works is [here](../helm/vmss-prototype/walkthrough.md).

helm/vmss-prototype/Chart.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
apiVersion: v1
22
description: A Helm chart for the Kamino vmss-prototype pattern image generator
33
name: vmss-prototype
4-
version: 0.0.16
4+
version: 0.0.17
55
maintainers:
66
- name: Michael Sinz
77
email: msinz@microsoft.com

helm/vmss-prototype/templates/vmss-prototype.yaml

Lines changed: 40 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,5 @@
11
{{- if hasKey .Values "kamino" -}}
22

3-
# Pick our job name here
43
{{- $jobName := printf "%s-%s" .Values.kamino.name "status" -}}
54
{{- if hasKey .Values.kamino "targetVMSS" -}}
65
{{- $jobName = printf "%s-%s" .Values.kamino.name "autoupdate" -}}
@@ -9,7 +8,46 @@
98
{{- $jobName = printf "%s-%s" .Values.kamino.name (substr 0 (int (sub (len .Values.kamino.targetNode) 6)) .Values.kamino.targetNode) -}}
109
{{- end -}}
1110
{{- end -}}
11+
{{- $rbacResourceName := printf "kamino-%s" $jobName -}}
1212

13+
apiVersion: v1
14+
kind: ServiceAccount
15+
metadata:
16+
name: {{ $rbacResourceName }}
17+
namespace: {{ .Release.Namespace }}
18+
---
19+
apiVersion: rbac.authorization.k8s.io/v1
20+
kind: ClusterRole
21+
metadata:
22+
name: {{ $rbacResourceName }}
23+
rules:
24+
- apiGroups: [""]
25+
resources: ["pods/eviction"]
26+
verbs: ["create"]
27+
- apiGroups: [""]
28+
resources: ["pods"]
29+
verbs: ["get", "list", "delete", "create"]
30+
- apiGroups: [""]
31+
resources: ["nodes"]
32+
verbs: ["get", "patch"]
33+
- apiGroups: ["apps"]
34+
resources: ["statefulsets", "namespaces", "daemonsets", "replicasets"]
35+
verbs: ["get", "list"]
36+
---
37+
apiVersion: rbac.authorization.k8s.io/v1
38+
kind: ClusterRoleBinding
39+
metadata:
40+
name: {{ $rbacResourceName }}
41+
namespace: {{ .Release.Namespace }}
42+
subjects:
43+
- kind: ServiceAccount
44+
name: {{ $rbacResourceName }}
45+
namespace: {{ .Release.Namespace }}
46+
roleRef:
47+
kind: ClusterRole
48+
name: {{ $rbacResourceName }}
49+
apiGroup: rbac.authorization.k8s.io
50+
---
1351
# If cronjob is enabled and we must have a targetVMSS...
1452
{{- if .Values.kamino.auto.cronjob.enabled }}
1553
apiVersion: batch/v1beta1
@@ -63,6 +101,7 @@ spec:
63101
{{- end }}
64102
{{- end }}
65103
spec:
104+
serviceAccountName: {{ $rbacResourceName }}
66105
restartPolicy: Never
67106

68107
{{- if hasKey .Values.kamino.container "pullSecret" }}
@@ -146,10 +185,6 @@ spec:
146185
{{- end }}
147186

148187
env:
149-
# We use the in-cluster kubeconfig
150-
- name: KUBECONFIG
151-
value: /.kubeconfig
152-
153188
# This gets mapped here since the node has cloud local CA bundles we need
154189
- name: REQUESTS_CA_BUNDLE
155190
value: /etc/ssl/certs/ca-certificates.crt
@@ -178,9 +213,6 @@ spec:
178213
- name: kubectl
179214
mountPath: /usr/bin/kubectl
180215
readOnly: true
181-
- name: kubeconfig
182-
mountPath: /.kubeconfig
183-
readOnly: true
184216
- name: host-crt
185217
mountPath: /etc/ssl/certs/ca-certificates.crt
186218
readOnly: true
@@ -197,11 +229,6 @@ spec:
197229
path: /usr/local/bin/kubectl
198230
type: File
199231

200-
- name: kubeconfig
201-
hostPath:
202-
path: /var/lib/kubelet/kubeconfig
203-
type: File
204-
205232
- name: host-crt
206233
hostPath:
207234
path: /etc/ssl/certs/ca-certificates.crt

vmss-prototype/vmss-prototype

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -92,7 +92,7 @@ def log_stdout_stderr(stdout, stderr, exit_code, masq=None):
9292

9393
def run(command, timeout=None, env=None, shell=False, check=True, quiet=False, dry_run=False, show_cmd=True, cwd=None,
9494
echo=False, masq=None, stdin='', comment=None, log_timing=False, log_stdout_on_error=True, log_stderr_on_error=True,
95-
retries=1, retry_wait_duration_func=lambda num_retry: 2 ** (num_retry + 2), log_final_error=False, log_retry_errors=False,
95+
retries=1, retry_wait_duration_func=lambda num_retry: 2 ** (num_retry + 2), log_final_error=True, log_retry_errors=False,
9696
retry_func=lambda stdout, stderr, exit_code: True):
9797
"""
9898
Run the given command in the shell
@@ -116,7 +116,7 @@ def run(command, timeout=None, env=None, shell=False, check=True, quiet=False, d
116116
:param log_stderr_on_error: If true (default), on error, the stderr of the command is logged
117117
:param retries: Number of retries before giving up due to errors
118118
:param retry_wait_duration_func: A function to get the wait in seconds for a particular retry attempt. Default delay is exponential backoff with initial delay of 8 seconds
119-
:param log_final_error: Log the final retry error even if check==False
119+
:param log_final_error: Log the final retry error even if check==False - defaults to True
120120
:param log_retry_errors: Log the errors for retries
121121
:param retry_func: Function to call after an error. Passed in 3-tuple (stdin, stdout, exit_code), return is if retry should happen (False if no retry)
122122
:return: 3-tuple (stdout as string, stderr as string and exit code as int)
@@ -1085,7 +1085,7 @@ def vmss_prototype_update(sub_args):
10851085
# provisioning bits on it (prototype) we can remove all extensions
10861086
# But... We want to keep the AKS Engine-identifying extension for telemetry reasons
10871087
# It is a no-op code wise but gives counts
1088-
if 'vmss-computeAksLinuxBilling' not in extension['name']:
1088+
if 'AKSLinuxBilling' not in extension['name']:
10891089
delete_extensions = True
10901090
run(az(['vmss', 'extension', 'delete'], subscription) + [
10911091
'--resource-group', resource_group,

0 commit comments

Comments
 (0)