|
| 1 | +# Flows |
| 2 | + |
| 3 | +Flows is a way to designate part of AppController dependency graph (i.e. its subgraph) and then use |
| 4 | +it as a vertex in other parts of the graph. With flows it becomes possible to compose complex deployment |
| 5 | +topologies from smaller reusable units which may be designed and maintained separately from each other. |
| 6 | + |
| 7 | +Examples, where flows are especially useful: |
| 8 | +* Flow may correspond to a single application deployment. Then we can design complex software systems |
| 9 | + as a dependency graph of applications (higher level abstracts) rather than internal parts of those applications |
| 10 | + (pods, services, volumes, etc). Such graphs are not just much easier to understand and visualize, but also |
| 11 | + makes much easier later maintenance of such system. |
| 12 | +* Flows may represent units within applications - nodes in a cluster, worker nodes in backend system, frontend |
| 13 | + servers in a web farm. Especially, this is helpful if such units are made of several Kubernetes resources. |
| 14 | + Encapsulating such deployment blocks in a flow allows to build multi-node systems by repeating (replicating) |
| 15 | + flow. |
| 16 | +* Moreover, flows can represent an operation that can be performed with application: scale, migrate, heal, backup |
| 17 | + and so on, if such operations can be expressed by Kubernetes resources. Considering that number of then can |
| 18 | + contain arbitrary bash scripts, this is often the case. |
| 19 | + |
| 20 | +Most important properties of flows are: |
| 21 | +1. **Name**. Since flow is named part of dependency graph, it has a name. When one wants to use the flow in another |
| 22 | + graph part, he just makes a dependency on the flow vertex. |
| 23 | +2. **Scope**. This is a label selector than can be applied to dependencies. All graph vertexes, reachable from the |
| 24 | + flow vertex using only edges (dependencies) that match the selector are said to belong to the flow. There can |
| 25 | + be one or two scopes for a flow: construction scope and (optional) destruction scope. Construction scope defines, |
| 26 | + what is going to be deployed for each flow replica. Destruction scope defines what needs to be performed before |
| 27 | + deleting replica resource. |
| 28 | +3. **Parameters**. Flows can have parameters, which can be used to generalize the topology for which flow stands for. |
| 29 | + Resource definition get special syntax for how to substitute parameter values into various string fields of |
| 30 | + resources. Flow consumer can provide parameters values. |
| 31 | +4. **Replication**. Strictly saying it is not a property of flows, but rather how that get deployed. Flow replication |
| 32 | + means that there can be several copies (replicas) of the flow subgraph merged together into single graph. Before |
| 33 | + flows it did not make any sense because several copies of the same graph will still produce the same resources in |
| 34 | + Kubernetes. But with parametrization, parameter value may be used as part of resource names. So with different |
| 35 | + arguments the same flow can produce different resources. Replication creates specified number of flow copies, each |
| 36 | + of them gets unique name which can be used like if it was a parameter. |
| 37 | + |
| 38 | +## Flow definition |
| 39 | + |
| 40 | +In AppController dependency graph there are resource definitions and dependencies between them. Flows do not bring |
| 41 | +new entities into this picture. Instead, flow is implemented as a yet another resource type that can be used as a graph |
| 42 | +vertex. Flow resource is where all flow properties can be specified. |
| 43 | + |
| 44 | +To use flow, i.e. deploy resources that make the flow, one just need to place it in dependency graph, and the flow |
| 45 | +vertex creation will trigger creation of the subgraph. |
| 46 | + |
| 47 | +### Flow resource |
| 48 | + |
| 49 | +Below is a sample Flow resource that has all possible attributes: |
| 50 | + |
| 51 | +```YAML |
| 52 | +apiVersion: appcontroller.k8s/v1alpha1 |
| 53 | +kind: Flow |
| 54 | +metadata: |
| 55 | + name: flow-name |
| 56 | + |
| 57 | +construction: |
| 58 | + key1: value1 |
| 59 | + key2: value2 |
| 60 | +destruction: |
| 61 | + key1: value1 |
| 62 | + key2: value2 |
| 63 | + |
| 64 | +replicaSpace: optional-name |
| 65 | +exported: true |
| 66 | + |
| 67 | +parameters: |
| 68 | + parameterName1: |
| 69 | + description: optional parameter description |
| 70 | + default: optional parameter default value |
| 71 | + parameterName2: |
| 72 | +``` |
| 73 | +
|
| 74 | +Flow resource must have `Flow` kind and `appcontroller.k8s/v1alpha1` API version. However, despite looking similar |
| 75 | +to other Kubernetes resources, Flows are are not real resource. One cannot upload them to Kubernetes as is. |
| 76 | +Instead, they must be wrapped inside `Definition` resources. This is consistent with how other graph vertexes |
| 77 | +are represented. However, for other resource types it is still possible to create them in Kubernetes with |
| 78 | +`kubectl create -f resource.yaml`. But for flows it is always `cat flow.yaml | kubeac wrap | kubectl create -f-`. |
| 79 | + |
| 80 | +Sections below explain each of the flow properties and how they affect graph deployment. |
| 81 | + |
| 82 | +Note: Since it is possible to run `kubeac` both inside and outside of the cluster, here and below I refer to |
| 83 | +`kubeac` binary by its simple name. However, in most cases this is going to be something like |
| 84 | +`kubectl exec k8s-appcontroller kubeac` to run the binary remotely, in AppController pod. |
| 85 | + |
| 86 | +### Flow name |
| 87 | + |
| 88 | +Each flow must have a `name` in its `metadata`. This name is used to refer to the flow in order to run it. |
| 89 | +There are two method to run a flow: |
| 90 | +1) Call it from withing a dependency graph. This is done by placing dependency on the `flow/flowName` resource, |
| 91 | + where `flowName` is the name of the flow. |
| 92 | +2) Explicitly call the flow from command line: `kubeac run flowName`. |
| 93 | + |
| 94 | +The later method is only possible for exported flows. Exportable flows are the flows that have `exported: true` |
| 95 | +in their definition. Such flows are explicitly designed to be used by the user, rather than being of an internal use. |
| 96 | + |
| 97 | +When running flow from commandline, flow name may be omitted. In this case `DEFAULT` is used, which is the name of |
| 98 | +default flow - the main dependency graph. |
| 99 | + |
| 100 | +### Flow scope |
| 101 | + |
| 102 | +Since flow is a dependency graph on its own, having flow resource alone is not enough. There must be a way to |
| 103 | +identify, which resources belong to the flow. This is achieved by traversing the graph starting from the flow |
| 104 | +vertex. All vertices that are reachable from this vertex using dependencies with labels that match `construction` |
| 105 | +or `destruction` selectors of the flow are said to belong to the flow. |
| 106 | + |
| 107 | +Both `construction` and optional `destruction` fields of the `Flow` resource are dictionaries that state what |
| 108 | +labels dependencies must have in order for child resources of those dependencies to be included in the flow. |
| 109 | + |
| 110 | +Flows may consume other flows by making dependencies of the consumed flow resource. Moreover, there might be |
| 111 | +other resources in the consuming flow that depends on the flow, being consumed. In this case consumed flow is |
| 112 | +going to be a parent in both dependencies that form the flow and those that are part of consuming flow. So it is |
| 113 | +important to choose label selectors that do not overlap. If during graph traversal from a flow vertex AppController |
| 114 | +encounters another flow vertex, it will not go over dependencies that match the second flow even if they match |
| 115 | +the original flow selector. This is especially important for `DEFAULT` flow that has empty selector that matches |
| 116 | +any dependency. |
| 117 | + |
| 118 | +`DEFAULT` flow is the one that corresponds to the main dependency graph. It is created implicitly by AppController. |
| 119 | +All non-flow resources that do not depend on anything automatically become dependent on `flow/DEFAULT` flow. |
| 120 | +`kubeac run` is a shortcut for `kubeac run DEFAULT`. By-default, `DEFAULT` flow is exported, has no arguments and |
| 121 | +empty selector for `construction` scope that matches any label. But it can also be explicitly declared with |
| 122 | +different settings. Moreover, if there is a dependency that has `flow/DEFAULT` either as a parent or as a child, |
| 123 | +the flow must be created explicitly (by creating flow resource definition with the name `DEFAULT`). |
| 124 | + |
| 125 | +For most flows, only the `construction` scope is required, because this is how AppController knows what to deploy |
| 126 | +for the flow. `destruction` is only required to specify actions that must be performed before flow resources get |
| 127 | +deleted. However, if both `construction` and `destruction` scopes are present the latter will have priority over |
| 128 | +the former for dependencies that match both selectors at the same time. |
| 129 | + |
| 130 | +### Flow parameters |
| 131 | + |
| 132 | +In order for flows to be reusable, there should be a way to generalize them. This is exactly what flow parameters |
| 133 | +are for. Parameters are arbitrary key-value pairs that can be provided by flow consumers and then used somewhere in |
| 134 | +resource definitions. |
| 135 | + |
| 136 | +Only the declared parameter can be used in the flow, unless `kubeac run` is followed by `--undeclared-args` switch. |
| 137 | +The advantages of declared parameters are: |
| 138 | +* Such declaration serves as a documentation. It is easy to see what parameters can be passed to the flow. Each |
| 139 | + parameter can be accompanied with description text. |
| 140 | +* Declaration may have default value for the parameter. If there is a default value, the parameter becomes optional. |
| 141 | + Values for parameter that do not have default must be provided by the flow consumer. |
| 142 | + |
| 143 | +Parameters are declared in the `parameters` section of the `Flow` resource. It is a dictionary where keys are |
| 144 | +parameter names and values are structures with two optional fields: `description` and `default`. If none of them |
| 145 | +present, then the value may just remain empty, as shown for `parameterName2` in example above. Parameter names |
| 146 | +may be any combination of alpha-numeric characters and underscores (`[0-9A-Za-z_]+` regexp). |
| 147 | + |
| 148 | +When creating resource from its `Definition`, AppController looks for `$parameterName` strings in selected fields |
| 149 | +of the definition and then substitutes it with parameter value. Each resource has its own list of fields, where |
| 150 | +such substitution can take place. Usually it contains all the fields, where parametrization is relevant with |
| 151 | +notable exception for fields that contain shell scripts. In order to access parameter values in scripts, |
| 152 | +they must be propagated through environment variables, which are parametrized. Parameter references may be used |
| 153 | +in any portion of the fields. For example, `a$arg$yet_another_arg-d` is a valid field value that will be turned |
| 154 | +into `abc-d` if the flow gets `b` as an `arg` parameter value and `c` for `yet_another_arg`. |
| 155 | + |
| 156 | +Note that in dependency graph resources are identified by their names before substitution takes place. For example, |
| 157 | +in dependencies, the pod with name `$arg` will still be `pod/$arg`. |
| 158 | + |
| 159 | +There are two ways to pass arguments to flow: |
| 160 | +1. Through the CLI |
| 161 | +2. Through dependencies |
| 162 | + |
| 163 | +With the first method it is possible to provide parameter values to the flow being started:\ |
| 164 | +`kubeac run flowName --arg arg1=value1 --arg arg2=value2` |
| 165 | + |
| 166 | +The second method is used when one flow calls another to pass parameters between them. It can be also used |
| 167 | +to pass parameters between two arbitrary resources. In order to do it, `args` field of dependencies is used: |
| 168 | +```YAML |
| 169 | +apiVersion: appcontroller.k8s/v1alpha1 |
| 170 | +kind: Dependency |
| 171 | +metadata: |
| 172 | + generateName: dependency- |
| 173 | + labels: |
| 174 | + flow: my-flow |
| 175 | +parent: pod/my-pod |
| 176 | +child: flow/another-flow |
| 177 | +args: |
| 178 | + arg1: $myFlowParameter |
| 179 | + arg2: hardcoded value |
| 180 | +``` |
| 181 | + |
| 182 | +If resource depends on several other resources, each of those dependencies may pass its own arguments. In this |
| 183 | +case they all get merged into a single dictionary. If the argument is passed twice with different values, the result |
| 184 | +value of this parameter is undefined. However, if this parameter is used in resource name, there is going to be |
| 185 | +created as many resources as the number of passed values for the parameter. |
| 186 | +For example, the graph |
| 187 | +``` |
| 188 | + {arg=a} |
| 189 | + --------- |
| 190 | +[parent] / \ [child] |
| 191 | +my-flow -> -> pod/pod-$arg |
| 192 | + \ / |
| 193 | + --------- |
| 194 | + {arg=b} |
| 195 | +``` |
| 196 | +will create two pods: `pod-a` and `pod-b`. |
| 197 | +
|
| 198 | +## Replication of flows |
| 199 | +
|
| 200 | +Flow replication is an AppController feature that makes specified number of flow graph copies, each one with |
| 201 | +a unique name and then merges them into a single graph. Because each replica name may be used in some of resource |
| 202 | +names, there can be resources that belong just to that replica, but there also can be resources that are shared |
| 203 | +across replicas. |
| 204 | +
|
| 205 | +Replication is very handy where there is a need to deploy several mostly identical entities. For example, it may |
| 206 | +be cluster nodes or servers in the web farm. Kubernetes has a built-in replication capabilities for pods. With |
| 207 | +AppController flows the entire topology can be replicated. |
| 208 | +
|
| 209 | +Replication works as following: |
| 210 | +1. User specifies desired number of flow replicas, either in absolute number or relative to the current |
| 211 | + replica count. |
| 212 | +2. AppController allocates requested number of replicas. For each replica, special third party Kubernetes resource |
| 213 | + of type `appcontroller.k8s.Replica` is created. The resource name has an auto-generated part which becomes a |
| 214 | + replica name. |
| 215 | +3. If new replicas were created by the adjustment: |
| 216 | + 1. For each new replica, build flow dependency graph with special **`AC_NAME`** parameter being set to the |
| 217 | + replica name (alongside other flow arguments). |
| 218 | + 2. All generated replica graphs are merged into a single graph. Vertices that have the same name are merged |
| 219 | + into one vertex. |
| 220 | +4. If new replica count is less than it was before (i.e. there are replicas that should be deleted): |
| 221 | + 1. For each extra replica build flow dependency graph using flow `destruction` scope. Pass **`AC_NAME`** |
| 222 | + argument to each replica. |
| 223 | + 2. Merge all such replicas into one graph. |
| 224 | + 3. After deployment, delete all resources that belong exclusively, to replicas being deleted, including |
| 225 | + the `Replica` object and resources from both `construction` and `destruction` flow scopes. |
| 226 | +5. If the number of replicas does not change, build (and merge) dependency graph, for existing replicas. |
| 227 | +
|
| 228 | +The replica count is specified using `-n` (or longer `--replicas`) commandline switch: |
| 229 | +
|
| 230 | +`kubec run my-flow -n3` creates deploys 3 replicas of `my-flow`. If there were 1, 2 replicas would be created. |
| 231 | +If there were 7 of them, 4 replicas would be deleted.\ |
| 232 | +`kubeac run my-flow -n+1` increments replica count by 1\ |
| 233 | +`kubeac run my-flow -n-2` decreases replica count by 2\ |
| 234 | +`kubeac run my-flow` if there are no replicas exist, create one, otherwise validate status of |
| 235 | +resources of existing replicas. |
| 236 | +
|
| 237 | +### Replica-spaces and contexts |
| 238 | +
|
| 239 | +Replica-space, is a tag that all replicas of the flow share. When new `Replica` object for the flow is created, |
| 240 | +it gets `replicaspace=name` label. When AppController needs to adjust replica count, first thing it does is selects |
| 241 | +all `Replica` objects in the flow replica-space. |
| 242 | +
|
| 243 | +Usually, replica-space name is the same as flow name so that replicas from different flows do not interfere. |
| 244 | +However, replica-space name of the flow can be specified explicitly using `replicaSpace` attribute of the `Flow`. |
| 245 | +If different flows put the same name in the `replicaSpace` fields, they will get shared replica-space. This is |
| 246 | +useful for cases, where there are several alternate ways to create entities. |
| 247 | +
|
| 248 | +However, there is a case when counting replicas based on their replica-space alone is not enough. Consider there |
| 249 | +are two flows: `A` and `B` and flow `A` calls `B` as part of its dependency graph. User deployed 5 replicas of `B` |
| 250 | +and then wants to have 2 replicas of `A`. How many replicas of `B` should be created? What should happen if user |
| 251 | +opts to delete one `A` replica then? |
| 252 | +
|
| 253 | +The problem here is that replicas of flows, created in context of another flow are indistinguishable from replicas, |
| 254 | +created explicitly. This is where `contexts` come into play. When flow is run from another flow, it gets unique |
| 255 | +name that is a combination of flow name, replica name and the dependency name, which triggered the flow. In other |
| 256 | +words, this name is unique for each usage of the flow, but its not random and thus remain the same on subsequent |
| 257 | +deployments. This name is called `context`. When AppController looks for replicas of flow that has such context, |
| 258 | +it queries Kubernetes for `Replica` objects that have composite label `replicaspace=name;context=context`. When |
| 259 | +new replicas need to be created, they get this composite label as well. As a result, each flow occurrence within |
| 260 | +another flow will "see" only its own replicas so the `Flow` resource can always adjust replica count to 1. |
| 261 | +However, when the flow is run independently, it will not have any context and thus query replicas based on |
| 262 | +replica-space alone, which means it will get all the replicas from all contexts. |
| 263 | +
|
| 264 | +## Scheduling flow deployments |
| 265 | +
|
| 266 | +When user runs `kubeac run something` the deployment does not happen immediately (unless there is also a `--deploy` |
| 267 | +switch present in the commandline). Instead it schedules flow deployment that another process will pick and deploy. |
| 268 | +This another process is also `kubeac` which is run as `kubeac deploy`. Usually that process is run in a restartable |
| 269 | +Kubernetes pod. If the deployment process fails for some reason, the pod will be rescheduled and the deployment |
| 270 | +process will be restarted. Then it will pick the same flow and continue deployment. This is what makes AppController |
| 271 | +be highly available. |
| 272 | +
|
| 273 | +Flow deployments are scheduled by creating a `ConfigMap` object with special label `AppController=FlowDeployment`. |
| 274 | +Deployment process watches for config-maps with such label create and delete events, sorts new config-maps in order |
| 275 | +they were created and deploys one by one. Thus it is possible to schedule flow deployment using `kubectl` or even |
| 276 | +Kubernetes API alone. This is especially useful from within containers where `kubeac` is not available. |
| 277 | +
|
| 278 | +Below is a config-map format: |
| 279 | +```YAML |
| 280 | +apiVersion: v1 |
| 281 | +kind: ConfigMap |
| 282 | +metadata: |
| 283 | + generateName: flow-deployment- # Any name will work. With generateName it is easier to generate unique names |
| 284 | + labels: |
| 285 | + AppController: FlowDeployment # Magic label |
| 286 | +
|
| 287 | +data: |
| 288 | + selector: "" # AppController will only look for definitions and dependencies with |
| 289 | + # matching this label selector |
| 290 | +
|
| 291 | + concurrency: "0" # How many resources can be deployed in parallel. "0" = no limit |
| 292 | + flowName: "" # Flow name. Empty string is the same as DEFAULT |
| 293 | + exportedOnly: "false" # When set to "false" any flow can be run. |
| 294 | + # Otherwise, only the flows that are marked as exported |
| 295 | +
|
| 296 | + allowUndeclaredArgs: "false" # Allow flow to use parameters that were not declared |
| 297 | + replicaCount: "0" # replica count |
| 298 | + fixedNumberOfReplicas: "false" # "true" means that `replicaCount` is an absolute number, |
| 299 | + # otherwise it is relative to the current count |
| 300 | +
|
| 301 | + minReplicaCount: "0" # Minimum number of replicas, "0" = no minimum |
| 302 | + maxReplicaCount: "0" # Maximum number of replicas, "0" = no maximum |
| 303 | + allowDeleteExternalResources: "false" # By default, when replica is deleted, AppController will not delete |
| 304 | + # resources that it did not create. This can be overridden by this setting |
| 305 | +
|
| 306 | + arg.arg1: "value1" # argument values. "arg." is prepend to each parameter name |
| 307 | + arg.arg2: "value2" |
| 308 | +``` |
| 309 | + |
| 310 | +All settings in this config-map are optional. The example above is filled with default values. |
| 311 | +Once the deployment is done, AppController automatically deletes the config-map for it. |
0 commit comments