Skip to content

Commit 9d77376

Browse files
authored
Merge pull request #256 from istalker2/flows-doc
Flows documentation
2 parents 6790d9f + f9d7582 commit 9d77376

File tree

2 files changed

+312
-1
lines changed

2 files changed

+312
-1
lines changed

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -91,7 +91,7 @@ Load it to k8s:
9191

9292
Start the graph deployment:
9393

94-
`kubectl exec k8s-appcontroller run`
94+
`kubectl exec k8s-appcontroller kubeac -- run`
9595

9696
## Reporting
9797

docs/flows.md

Lines changed: 311 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,311 @@
1+
# Flows
2+
3+
Flows is a way to designate part of AppController dependency graph (i.e. its subgraph) and then use
4+
it as a vertex in other parts of the graph. With flows it becomes possible to compose complex deployment
5+
topologies from smaller reusable units which may be designed and maintained separately from each other.
6+
7+
Examples, where flows are especially useful:
8+
* Flow may correspond to a single application deployment. Then we can design complex software systems
9+
as a dependency graph of applications (higher level abstracts) rather than internal parts of those applications
10+
(pods, services, volumes, etc). Such graphs are not just much easier to understand and visualize, but also
11+
makes much easier later maintenance of such system.
12+
* Flows may represent units within applications - nodes in a cluster, worker nodes in backend system, frontend
13+
servers in a web farm. Especially, this is helpful if such units are made of several Kubernetes resources.
14+
Encapsulating such deployment blocks in a flow allows to build multi-node systems by repeating (replicating)
15+
flow.
16+
* Moreover, flows can represent an operation that can be performed with application: scale, migrate, heal, backup
17+
and so on, if such operations can be expressed by Kubernetes resources. Considering that number of then can
18+
contain arbitrary bash scripts, this is often the case.
19+
20+
Most important properties of flows are:
21+
1. **Name**. Since flow is named part of dependency graph, it has a name. When one wants to use the flow in another
22+
graph part, he just makes a dependency on the flow vertex.
23+
2. **Scope**. This is a label selector than can be applied to dependencies. All graph vertexes, reachable from the
24+
flow vertex using only edges (dependencies) that match the selector are said to belong to the flow. There can
25+
be one or two scopes for a flow: construction scope and (optional) destruction scope. Construction scope defines,
26+
what is going to be deployed for each flow replica. Destruction scope defines what needs to be performed before
27+
deleting replica resource.
28+
3. **Parameters**. Flows can have parameters, which can be used to generalize the topology for which flow stands for.
29+
Resource definition get special syntax for how to substitute parameter values into various string fields of
30+
resources. Flow consumer can provide parameters values.
31+
4. **Replication**. Strictly saying it is not a property of flows, but rather how that get deployed. Flow replication
32+
means that there can be several copies (replicas) of the flow subgraph merged together into single graph. Before
33+
flows it did not make any sense because several copies of the same graph will still produce the same resources in
34+
Kubernetes. But with parametrization, parameter value may be used as part of resource names. So with different
35+
arguments the same flow can produce different resources. Replication creates specified number of flow copies, each
36+
of them gets unique name which can be used like if it was a parameter.
37+
38+
## Flow definition
39+
40+
In AppController dependency graph there are resource definitions and dependencies between them. Flows do not bring
41+
new entities into this picture. Instead, flow is implemented as a yet another resource type that can be used as a graph
42+
vertex. Flow resource is where all flow properties can be specified.
43+
44+
To use flow, i.e. deploy resources that make the flow, one just need to place it in dependency graph, and the flow
45+
vertex creation will trigger creation of the subgraph.
46+
47+
### Flow resource
48+
49+
Below is a sample Flow resource that has all possible attributes:
50+
51+
```YAML
52+
apiVersion: appcontroller.k8s/v1alpha1
53+
kind: Flow
54+
metadata:
55+
name: flow-name
56+
57+
construction:
58+
key1: value1
59+
key2: value2
60+
destruction:
61+
key1: value1
62+
key2: value2
63+
64+
replicaSpace: optional-name
65+
exported: true
66+
67+
parameters:
68+
parameterName1:
69+
description: optional parameter description
70+
default: optional parameter default value
71+
parameterName2:
72+
```
73+
74+
Flow resource must have `Flow` kind and `appcontroller.k8s/v1alpha1` API version. However, despite looking similar
75+
to other Kubernetes resources, Flows are are not real resource. One cannot upload them to Kubernetes as is.
76+
Instead, they must be wrapped inside `Definition` resources. This is consistent with how other graph vertexes
77+
are represented. However, for other resource types it is still possible to create them in Kubernetes with
78+
`kubectl create -f resource.yaml`. But for flows it is always `cat flow.yaml | kubeac wrap | kubectl create -f-`.
79+
80+
Sections below explain each of the flow properties and how they affect graph deployment.
81+
82+
Note: Since it is possible to run `kubeac` both inside and outside of the cluster, here and below I refer to
83+
`kubeac` binary by its simple name. However, in most cases this is going to be something like
84+
`kubectl exec k8s-appcontroller kubeac` to run the binary remotely, in AppController pod.
85+
86+
### Flow name
87+
88+
Each flow must have a `name` in its `metadata`. This name is used to refer to the flow in order to run it.
89+
There are two method to run a flow:
90+
1) Call it from withing a dependency graph. This is done by placing dependency on the `flow/flowName` resource,
91+
where `flowName` is the name of the flow.
92+
2) Explicitly call the flow from command line: `kubeac run flowName`.
93+
94+
The later method is only possible for exported flows. Exportable flows are the flows that have `exported: true`
95+
in their definition. Such flows are explicitly designed to be used by the user, rather than being of an internal use.
96+
97+
When running flow from commandline, flow name may be omitted. In this case `DEFAULT` is used, which is the name of
98+
default flow - the main dependency graph.
99+
100+
### Flow scope
101+
102+
Since flow is a dependency graph on its own, having flow resource alone is not enough. There must be a way to
103+
identify, which resources belong to the flow. This is achieved by traversing the graph starting from the flow
104+
vertex. All vertices that are reachable from this vertex using dependencies with labels that match `construction`
105+
or `destruction` selectors of the flow are said to belong to the flow.
106+
107+
Both `construction` and optional `destruction` fields of the `Flow` resource are dictionaries that state what
108+
labels dependencies must have in order for child resources of those dependencies to be included in the flow.
109+
110+
Flows may consume other flows by making dependencies of the consumed flow resource. Moreover, there might be
111+
other resources in the consuming flow that depends on the flow, being consumed. In this case consumed flow is
112+
going to be a parent in both dependencies that form the flow and those that are part of consuming flow. So it is
113+
important to choose label selectors that do not overlap. If during graph traversal from a flow vertex AppController
114+
encounters another flow vertex, it will not go over dependencies that match the second flow even if they match
115+
the original flow selector. This is especially important for `DEFAULT` flow that has empty selector that matches
116+
any dependency.
117+
118+
`DEFAULT` flow is the one that corresponds to the main dependency graph. It is created implicitly by AppController.
119+
All non-flow resources that do not depend on anything automatically become dependent on `flow/DEFAULT` flow.
120+
`kubeac run` is a shortcut for `kubeac run DEFAULT`. By-default, `DEFAULT` flow is exported, has no arguments and
121+
empty selector for `construction` scope that matches any label. But it can also be explicitly declared with
122+
different settings. Moreover, if there is a dependency that has `flow/DEFAULT` either as a parent or as a child,
123+
the flow must be created explicitly (by creating flow resource definition with the name `DEFAULT`).
124+
125+
For most flows, only the `construction` scope is required, because this is how AppController knows what to deploy
126+
for the flow. `destruction` is only required to specify actions that must be performed before flow resources get
127+
deleted. However, if both `construction` and `destruction` scopes are present the latter will have priority over
128+
the former for dependencies that match both selectors at the same time.
129+
130+
### Flow parameters
131+
132+
In order for flows to be reusable, there should be a way to generalize them. This is exactly what flow parameters
133+
are for. Parameters are arbitrary key-value pairs that can be provided by flow consumers and then used somewhere in
134+
resource definitions.
135+
136+
Only the declared parameter can be used in the flow, unless `kubeac run` is followed by `--undeclared-args` switch.
137+
The advantages of declared parameters are:
138+
* Such declaration serves as a documentation. It is easy to see what parameters can be passed to the flow. Each
139+
parameter can be accompanied with description text.
140+
* Declaration may have default value for the parameter. If there is a default value, the parameter becomes optional.
141+
Values for parameter that do not have default must be provided by the flow consumer.
142+
143+
Parameters are declared in the `parameters` section of the `Flow` resource. It is a dictionary where keys are
144+
parameter names and values are structures with two optional fields: `description` and `default`. If none of them
145+
present, then the value may just remain empty, as shown for `parameterName2` in example above. Parameter names
146+
may be any combination of alpha-numeric characters and underscores (`[0-9A-Za-z_]+` regexp).
147+
148+
When creating resource from its `Definition`, AppController looks for `$parameterName` strings in selected fields
149+
of the definition and then substitutes it with parameter value. Each resource has its own list of fields, where
150+
such substitution can take place. Usually it contains all the fields, where parametrization is relevant with
151+
notable exception for fields that contain shell scripts. In order to access parameter values in scripts,
152+
they must be propagated through environment variables, which are parametrized. Parameter references may be used
153+
in any portion of the fields. For example, `a$arg$yet_another_arg-d` is a valid field value that will be turned
154+
into `abc-d` if the flow gets `b` as an `arg` parameter value and `c` for `yet_another_arg`.
155+
156+
Note that in dependency graph resources are identified by their names before substitution takes place. For example,
157+
in dependencies, the pod with name `$arg` will still be `pod/$arg`.
158+
159+
There are two ways to pass arguments to flow:
160+
1. Through the CLI
161+
2. Through dependencies
162+
163+
With the first method it is possible to provide parameter values to the flow being started:\
164+
`kubeac run flowName --arg arg1=value1 --arg arg2=value2`
165+
166+
The second method is used when one flow calls another to pass parameters between them. It can be also used
167+
to pass parameters between two arbitrary resources. In order to do it, `args` field of dependencies is used:
168+
```YAML
169+
apiVersion: appcontroller.k8s/v1alpha1
170+
kind: Dependency
171+
metadata:
172+
generateName: dependency-
173+
labels:
174+
flow: my-flow
175+
parent: pod/my-pod
176+
child: flow/another-flow
177+
args:
178+
arg1: $myFlowParameter
179+
arg2: hardcoded value
180+
```
181+
182+
If resource depends on several other resources, each of those dependencies may pass its own arguments. In this
183+
case they all get merged into a single dictionary. If the argument is passed twice with different values, the result
184+
value of this parameter is undefined. However, if this parameter is used in resource name, there is going to be
185+
created as many resources as the number of passed values for the parameter.
186+
For example, the graph
187+
```
188+
{arg=a}
189+
---------
190+
[parent] / \ [child]
191+
my-flow -> -> pod/pod-$arg
192+
\ /
193+
---------
194+
{arg=b}
195+
```
196+
will create two pods: `pod-a` and `pod-b`.
197+
198+
## Replication of flows
199+
200+
Flow replication is an AppController feature that makes specified number of flow graph copies, each one with
201+
a unique name and then merges them into a single graph. Because each replica name may be used in some of resource
202+
names, there can be resources that belong just to that replica, but there also can be resources that are shared
203+
across replicas.
204+
205+
Replication is very handy where there is a need to deploy several mostly identical entities. For example, it may
206+
be cluster nodes or servers in the web farm. Kubernetes has a built-in replication capabilities for pods. With
207+
AppController flows the entire topology can be replicated.
208+
209+
Replication works as following:
210+
1. User specifies desired number of flow replicas, either in absolute number or relative to the current
211+
replica count.
212+
2. AppController allocates requested number of replicas. For each replica, special third party Kubernetes resource
213+
of type `appcontroller.k8s.Replica` is created. The resource name has an auto-generated part which becomes a
214+
replica name.
215+
3. If new replicas were created by the adjustment:
216+
1. For each new replica, build flow dependency graph with special **`AC_NAME`** parameter being set to the
217+
replica name (alongside other flow arguments).
218+
2. All generated replica graphs are merged into a single graph. Vertices that have the same name are merged
219+
into one vertex.
220+
4. If new replica count is less than it was before (i.e. there are replicas that should be deleted):
221+
1. For each extra replica build flow dependency graph using flow `destruction` scope. Pass **`AC_NAME`**
222+
argument to each replica.
223+
2. Merge all such replicas into one graph.
224+
3. After deployment, delete all resources that belong exclusively, to replicas being deleted, including
225+
the `Replica` object and resources from both `construction` and `destruction` flow scopes.
226+
5. If the number of replicas does not change, build (and merge) dependency graph, for existing replicas.
227+
228+
The replica count is specified using `-n` (or longer `--replicas`) commandline switch:
229+
230+
`kubec run my-flow -n3` creates deploys 3 replicas of `my-flow`. If there were 1, 2 replicas would be created.
231+
If there were 7 of them, 4 replicas would be deleted.\
232+
`kubeac run my-flow -n+1` increments replica count by 1\
233+
`kubeac run my-flow -n-2` decreases replica count by 2\
234+
`kubeac run my-flow` if there are no replicas exist, create one, otherwise validate status of
235+
resources of existing replicas.
236+
237+
### Replica-spaces and contexts
238+
239+
Replica-space, is a tag that all replicas of the flow share. When new `Replica` object for the flow is created,
240+
it gets `replicaspace=name` label. When AppController needs to adjust replica count, first thing it does is selects
241+
all `Replica` objects in the flow replica-space.
242+
243+
Usually, replica-space name is the same as flow name so that replicas from different flows do not interfere.
244+
However, replica-space name of the flow can be specified explicitly using `replicaSpace` attribute of the `Flow`.
245+
If different flows put the same name in the `replicaSpace` fields, they will get shared replica-space. This is
246+
useful for cases, where there are several alternate ways to create entities.
247+
248+
However, there is a case when counting replicas based on their replica-space alone is not enough. Consider there
249+
are two flows: `A` and `B` and flow `A` calls `B` as part of its dependency graph. User deployed 5 replicas of `B`
250+
and then wants to have 2 replicas of `A`. How many replicas of `B` should be created? What should happen if user
251+
opts to delete one `A` replica then?
252+
253+
The problem here is that replicas of flows, created in context of another flow are indistinguishable from replicas,
254+
created explicitly. This is where `contexts` come into play. When flow is run from another flow, it gets unique
255+
name that is a combination of flow name, replica name and the dependency name, which triggered the flow. In other
256+
words, this name is unique for each usage of the flow, but its not random and thus remain the same on subsequent
257+
deployments. This name is called `context`. When AppController looks for replicas of flow that has such context,
258+
it queries Kubernetes for `Replica` objects that have composite label `replicaspace=name;context=context`. When
259+
new replicas need to be created, they get this composite label as well. As a result, each flow occurrence within
260+
another flow will "see" only its own replicas so the `Flow` resource can always adjust replica count to 1.
261+
However, when the flow is run independently, it will not have any context and thus query replicas based on
262+
replica-space alone, which means it will get all the replicas from all contexts.
263+
264+
## Scheduling flow deployments
265+
266+
When user runs `kubeac run something` the deployment does not happen immediately (unless there is also a `--deploy`
267+
switch present in the commandline). Instead it schedules flow deployment that another process will pick and deploy.
268+
This another process is also `kubeac` which is run as `kubeac deploy`. Usually that process is run in a restartable
269+
Kubernetes pod. If the deployment process fails for some reason, the pod will be rescheduled and the deployment
270+
process will be restarted. Then it will pick the same flow and continue deployment. This is what makes AppController
271+
be highly available.
272+
273+
Flow deployments are scheduled by creating a `ConfigMap` object with special label `AppController=FlowDeployment`.
274+
Deployment process watches for config-maps with such label create and delete events, sorts new config-maps in order
275+
they were created and deploys one by one. Thus it is possible to schedule flow deployment using `kubectl` or even
276+
Kubernetes API alone. This is especially useful from within containers where `kubeac` is not available.
277+
278+
Below is a config-map format:
279+
```YAML
280+
apiVersion: v1
281+
kind: ConfigMap
282+
metadata:
283+
generateName: flow-deployment- # Any name will work. With generateName it is easier to generate unique names
284+
labels:
285+
AppController: FlowDeployment # Magic label
286+
287+
data:
288+
selector: "" # AppController will only look for definitions and dependencies with
289+
# matching this label selector
290+
291+
concurrency: "0" # How many resources can be deployed in parallel. "0" = no limit
292+
flowName: "" # Flow name. Empty string is the same as DEFAULT
293+
exportedOnly: "false" # When set to "false" any flow can be run.
294+
# Otherwise, only the flows that are marked as exported
295+
296+
allowUndeclaredArgs: "false" # Allow flow to use parameters that were not declared
297+
replicaCount: "0" # replica count
298+
fixedNumberOfReplicas: "false" # "true" means that `replicaCount` is an absolute number,
299+
# otherwise it is relative to the current count
300+
301+
minReplicaCount: "0" # Minimum number of replicas, "0" = no minimum
302+
maxReplicaCount: "0" # Maximum number of replicas, "0" = no maximum
303+
allowDeleteExternalResources: "false" # By default, when replica is deleted, AppController will not delete
304+
# resources that it did not create. This can be overridden by this setting
305+
306+
arg.arg1: "value1" # argument values. "arg." is prepend to each parameter name
307+
arg.arg2: "value2"
308+
```
309+
310+
All settings in this config-map are optional. The example above is filled with default values.
311+
Once the deployment is done, AppController automatically deletes the config-map for it.

0 commit comments

Comments
 (0)