Skip to content

Commit 331ac47

Browse files
schongloogyfora
authored andcommitted
[FLINK-37515] Basic support for Blue/Green deployments
1 parent 525b459 commit 331ac47

File tree

40 files changed

+15394
-5
lines changed

40 files changed

+15394
-5
lines changed

docs/content/docs/custom-resource/reference.md

Lines changed: 59 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -73,6 +73,24 @@ This serves as a full reference for FlinkDeployment and FlinkSessionJob custom r
7373
| Parameter | Type | Docs |
7474
| ----------| ---- | ---- |
7575

76+
### FlinkBlueGreenDeploymentConfigOptions
77+
**Class**: org.apache.flink.kubernetes.operator.api.spec.FlinkBlueGreenDeploymentConfigOptions
78+
79+
**Description**: Configuration options to be used by the Flink Blue/Green Deployments.
80+
81+
| Parameter | Type | Docs |
82+
| ----------| ---- | ---- |
83+
84+
### FlinkBlueGreenDeploymentSpec
85+
**Class**: org.apache.flink.kubernetes.operator.api.spec.FlinkBlueGreenDeploymentSpec
86+
87+
**Description**: Spec that describes a Flink application with blue/green deployment capabilities.
88+
89+
| Parameter | Type | Docs |
90+
| ----------| ---- | ---- |
91+
| configuration | java.util.Map<java.lang.String,java.lang.String> | |
92+
| template | org.apache.flink.kubernetes.operator.api.spec.FlinkDeploymentTemplateSpec | |
93+
7694
### FlinkDeploymentSpec
7795
**Class**: org.apache.flink.kubernetes.operator.api.spec.FlinkDeploymentSpec
7896

@@ -94,6 +112,16 @@ This serves as a full reference for FlinkDeployment and FlinkSessionJob custom r
94112
| logConfiguration | java.util.Map<java.lang.String,java.lang.String> | Log configuration overrides for the Flink deployment. Format logConfigFileName -> configContent. |
95113
| mode | org.apache.flink.kubernetes.operator.api.spec.KubernetesDeploymentMode | Deployment mode of the Flink cluster, native or standalone. |
96114

115+
### FlinkDeploymentTemplateSpec
116+
**Class**: org.apache.flink.kubernetes.operator.api.spec.FlinkDeploymentTemplateSpec
117+
118+
**Description**: Template Spec that describes a Flink application managed by the blue/green controller.
119+
120+
| Parameter | Type | Docs |
121+
| ----------| ---- | ---- |
122+
| metadata | io.fabric8.kubernetes.api.model.ObjectMeta | |
123+
| spec | org.apache.flink.kubernetes.operator.api.spec.FlinkDeploymentSpec | |
124+
97125
### FlinkSessionJobSpec
98126
**Class**: org.apache.flink.kubernetes.operator.api.spec.FlinkSessionJobSpec
99127

@@ -308,6 +336,37 @@ This serves as a full reference for FlinkDeployment and FlinkSessionJob custom r
308336
| UNKNOWN | Checkpoint format unknown, if the checkpoint was not triggered by the operator. |
309337
| description | org.apache.flink.configuration.description.InlineElement | |
310338

339+
### FlinkBlueGreenDeploymentState
340+
**Class**: org.apache.flink.kubernetes.operator.api.status.FlinkBlueGreenDeploymentState
341+
342+
**Description**: Enumeration of the possible states of the blue/green transition.
343+
344+
| Value | Docs |
345+
| ----- | ---- |
346+
| INITIALIZING_BLUE | We use this state while initializing for the first time, always with a "Blue" deployment type. |
347+
| ACTIVE_BLUE | Identifies the system is running normally with a "Blue" deployment type. |
348+
| ACTIVE_GREEN | Identifies the system is running normally with a "Green" deployment type. |
349+
| TRANSITIONING_TO_BLUE | Identifies the system is transitioning from "Green" to "Blue". |
350+
| TRANSITIONING_TO_GREEN | Identifies the system is transitioning from "Blue" to "Green". |
351+
| SAVEPOINTING_BLUE | Identifies the system is savepointing "Blue" before it transitions to "Green". |
352+
| SAVEPOINTING_GREEN | Identifies the system is savepointing "Green" before it transitions to "Blue". |
353+
354+
### FlinkBlueGreenDeploymentStatus
355+
**Class**: org.apache.flink.kubernetes.operator.api.status.FlinkBlueGreenDeploymentStatus
356+
357+
**Description**: Last observed status of the Flink Blue/Green deployment.
358+
359+
| Parameter | Type | Docs |
360+
| ----------| ---- | ---- |
361+
| jobStatus | org.apache.flink.kubernetes.operator.api.status.JobStatus | |
362+
| blueGreenState | org.apache.flink.kubernetes.operator.api.status.FlinkBlueGreenDeploymentState | The state of the blue/green transition. |
363+
| lastReconciledSpec | java.lang.String | Last reconciled (serialized) deployment spec. |
364+
| lastReconciledTimestamp | java.lang.String | Timestamp of last reconciliation. |
365+
| abortTimestamp | java.lang.String | Computed from abortGracePeriodMs, timestamp after which the deployment should be aborted. |
366+
| deploymentReadyTimestamp | java.lang.String | Timestamp when the deployment became READY/STABLE. Used to determine when to delete it. |
367+
| savepointTriggerId | java.lang.String | Persisted triggerId to track transition with savepoint. Only used with UpgradeMode.SAVEPOINT |
368+
| error | java.lang.String | Error information about the FlinkBlueGreenDeployment. |
369+
311370
### FlinkDeploymentReconciliationStatus
312371
**Class**: org.apache.flink.kubernetes.operator.api.status.FlinkDeploymentReconciliationStatus
313372

Lines changed: 99 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,99 @@
1+
################################################################################
2+
# Licensed to the Apache Software Foundation (ASF) under one
3+
# or more contributor license agreements. See the NOTICE file
4+
# distributed with this work for additional information
5+
# regarding copyright ownership. The ASF licenses this file
6+
# to you under the Apache License, Version 2.0 (the
7+
# "License"); you may not use this file except in compliance
8+
# with the License. You may obtain a copy of the License at
9+
#
10+
# http://www.apache.org/licenses/LICENSE-2.0
11+
#
12+
# Unless required by applicable law or agreed to in writing, software
13+
# distributed under the License is distributed on an "AS IS" BASIS,
14+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
15+
# See the License for the specific language governing permissions and
16+
# limitations under the License.
17+
################################################################################
18+
19+
apiVersion: flink.apache.org/v1beta1
20+
kind: FlinkBlueGreenDeployment
21+
metadata:
22+
name: basic-bg-laststate-example
23+
spec:
24+
configuration:
25+
kubernetes.operator.bluegreen.deployment-deletion.delay: "1s"
26+
template:
27+
spec:
28+
image: flink:1.20
29+
flinkVersion: v1_20
30+
flinkConfiguration:
31+
rest.port: "8081"
32+
execution.checkpointing.interval: "10s"
33+
execution.checkpointing.storage: "filesystem"
34+
state.backend.incremental: "true"
35+
state.checkpoints.dir: file:///opt/flink/volume/flink-cp
36+
state.savepoints.dir: file:///opt/flink/volume/flink-sp
37+
state.checkpoints.num-retained: "5"
38+
taskmanager.numberOfTaskSlots: "1"
39+
serviceAccount: flink
40+
jobManager:
41+
resource:
42+
memory: 1G
43+
cpu: 1
44+
podTemplate:
45+
spec:
46+
containers:
47+
- name: flink-main-container
48+
resources:
49+
requests:
50+
ephemeral-storage: 2048Mi
51+
limits:
52+
ephemeral-storage: 2048Mi
53+
volumeMounts:
54+
- mountPath: /opt/flink/volume
55+
name: flink-volume
56+
volumes:
57+
- name: flink-volume
58+
persistentVolumeClaim:
59+
claimName: flink-bg-laststate
60+
taskManager:
61+
resource:
62+
memory: 2G
63+
cpu: 1
64+
job:
65+
jarURI: local:///opt/flink/examples/streaming/StateMachineExample.jar
66+
parallelism: 1
67+
entryClass: org.apache.flink.streaming.examples.statemachine.StateMachineExample
68+
args:
69+
- "--error-rate"
70+
- "0.15"
71+
- "--sleep"
72+
- "30"
73+
upgradeMode: last-state
74+
mode: native
75+
76+
---
77+
apiVersion: v1
78+
kind: PersistentVolumeClaim
79+
metadata:
80+
name: flink-bg-laststate
81+
spec:
82+
accessModes:
83+
- ReadWriteOnce
84+
volumeMode: Filesystem
85+
resources:
86+
requests:
87+
storage: 1Gi
88+
89+
---
90+
apiVersion: networking.k8s.io/v1
91+
kind: IngressClass
92+
metadata:
93+
annotations:
94+
ingressclass.kubernetes.io/is-default-class: "true"
95+
labels:
96+
app.kubernetes.io/component: controller
97+
name: nginx
98+
spec:
99+
controller: k8s.io/ingress-nginx
Lines changed: 52 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,52 @@
1+
################################################################################
2+
# Licensed to the Apache Software Foundation (ASF) under one
3+
# or more contributor license agreements. See the NOTICE file
4+
# distributed with this work for additional information
5+
# regarding copyright ownership. The ASF licenses this file
6+
# to you under the Apache License, Version 2.0 (the
7+
# "License"); you may not use this file except in compliance
8+
# with the License. You may obtain a copy of the License at
9+
#
10+
# http://www.apache.org/licenses/LICENSE-2.0
11+
#
12+
# Unless required by applicable law or agreed to in writing, software
13+
# distributed under the License is distributed on an "AS IS" BASIS,
14+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
15+
# See the License for the specific language governing permissions and
16+
# limitations under the License.
17+
################################################################################
18+
19+
apiVersion: flink.apache.org/v1beta1
20+
kind: FlinkBlueGreenDeployment
21+
metadata:
22+
name: basic-bg-stateless-example
23+
spec:
24+
configuration:
25+
kubernetes.operator.bluegreen.deployment-deletion.delay: "2s"
26+
template:
27+
spec:
28+
image: flink:1.20
29+
flinkVersion: v1_20
30+
flinkConfiguration:
31+
rest.port: "8081"
32+
taskmanager.numberOfTaskSlots: "1"
33+
serviceAccount: flink
34+
jobManager:
35+
resource:
36+
memory: 1G
37+
cpu: 1
38+
taskManager:
39+
resource:
40+
memory: 2G
41+
cpu: 1
42+
job:
43+
jarURI: local:///opt/flink/examples/streaming/StateMachineExample.jar
44+
parallelism: 1
45+
entryClass: org.apache.flink.streaming.examples.statemachine.StateMachineExample
46+
args:
47+
- "--error-rate"
48+
- "0.15"
49+
- "--sleep"
50+
- "30"
51+
upgradeMode: stateless
52+
mode: native
Lines changed: 83 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,83 @@
1+
#!/usr/bin/env bash
2+
################################################################################
3+
# Licensed to the Apache Software Foundation (ASF) under one
4+
# or more contributor license agreements. See the NOTICE file
5+
# distributed with this work for additional information
6+
# regarding copyright ownership. The ASF licenses this file
7+
# to you under the Apache License, Version 2.0 (the
8+
# "License"); you may not use this file except in compliance
9+
# with the License. You may obtain a copy of the License at
10+
#
11+
# http://www.apache.org/licenses/LICENSE-2.0
12+
#
13+
# Unless required by applicable law or agreed to in writing, software
14+
# distributed under the License is distributed on an "AS IS" BASIS,
15+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
16+
# See the License for the specific language governing permissions and
17+
# limitations under the License.
18+
################################################################################
19+
20+
# This script tests the Flink Blue/Green Deployments support as follows:
21+
# - Create a FlinkBlueGreenDeployment which automatically starts a "Blue" FlinkDeployment
22+
# - Once this setup is stable, we trigger a transition which will create the "Green" FlinkDeployment
23+
# - Once it's stable, verify the "Blue" FlinkDeployment is torn down
24+
# - Perform additional validation(s) before exiting
25+
26+
SCRIPT_DIR=$(dirname "$(readlink -f "$0")")
27+
source "${SCRIPT_DIR}/utils.sh"
28+
29+
CLUSTER_ID="basic-bg-laststate-example"
30+
BG_CLUSTER_ID=$CLUSTER_ID
31+
BLUE_CLUSTER_ID=$CLUSTER_ID"-blue"
32+
GREEN_CLUSTER_ID=$CLUSTER_ID"-green"
33+
34+
APPLICATION_YAML="${SCRIPT_DIR}/data/bluegreen-laststate.yaml"
35+
APPLICATION_IDENTIFIER="flinkbgdep/$CLUSTER_ID"
36+
BLUE_APPLICATION_IDENTIFIER="flinkdep/$BLUE_CLUSTER_ID"
37+
GREEN_APPLICATION_IDENTIFIER="flinkdep/$GREEN_CLUSTER_ID"
38+
TIMEOUT=300
39+
40+
#echo "BG_CLUSTER_ID " $BG_CLUSTER_ID
41+
#echo "BLUE_CLUSTER_ID " $BLUE_CLUSTER_ID
42+
#echo "APPLICATION_IDENTIFIER " $APPLICATION_IDENTIFIER
43+
#echo "BLUE_APPLICATION_IDENTIFIER " $BLUE_APPLICATION_IDENTIFIER
44+
45+
retry_times 5 30 "kubectl apply -f $APPLICATION_YAML" || exit 1
46+
47+
sleep 1
48+
wait_for_jobmanager_running $BLUE_CLUSTER_ID $TIMEOUT
49+
wait_for_status $BLUE_APPLICATION_IDENTIFIER '.status.lifecycleState' STABLE ${TIMEOUT} || exit 1
50+
wait_for_status $APPLICATION_IDENTIFIER '.status.jobStatus.state' RUNNING ${TIMEOUT} || exit 1
51+
wait_for_status $APPLICATION_IDENTIFIER '.status.blueGreenState' ACTIVE_BLUE ${TIMEOUT} || exit 1
52+
53+
#blue_job_id=$(kubectl get -oyaml flinkdep/basic-bluegreen-example-blue | yq '.status.jobStatus.jobId')
54+
55+
#kubectl patch flinkbgdep ${BG_CLUSTER_ID} --type merge --patch '{"spec":{"template":{"spec":{"flinkConfiguration":{"rest.port":"8082","state.checkpoints.num-retained":"6"}}}}}'
56+
kubectl patch flinkbgdep ${BG_CLUSTER_ID} --type merge --patch '{"spec":{"template":{"spec":{"flinkConfiguration":{"state.checkpoints.num-retained":"6"}}}}}'
57+
echo "Resource patched, giving a chance for the savepoint to be taken..."
58+
sleep 10
59+
60+
jm_pod_name=$(get_jm_pod_name $BLUE_CLUSTER_ID)
61+
echo "Inspecting savepoint directory..."
62+
kubectl exec -it $jm_pod_name -- bash -c "ls -lt /opt/flink/volume/flink-sp/"
63+
64+
wait_for_status $GREEN_APPLICATION_IDENTIFIER '.status.lifecycleState' STABLE ${TIMEOUT} || exit 1
65+
kubectl wait --for=delete deployment --timeout=${TIMEOUT}s --selector="app=${BLUE_CLUSTER_ID}"
66+
wait_for_status $APPLICATION_IDENTIFIER '.status.jobStatus.state' RUNNING ${TIMEOUT} || exit 1
67+
wait_for_status $APPLICATION_IDENTIFIER '.status.blueGreenState' ACTIVE_GREEN ${TIMEOUT} || exit 1
68+
69+
green_initialSavepointPath=$(kubectl get -oyaml $GREEN_APPLICATION_IDENTIFIER | yq '.spec.job.initialSavepointPath')
70+
71+
echo "Deleting test B/G resources" $BG_CLUSTER_ID
72+
kubectl delete flinkbluegreendeployments/$BG_CLUSTER_ID &
73+
echo "Waiting for deployment to be deleted..."
74+
kubectl wait --for=delete flinkbluegreendeployments/$BG_CLUSTER_ID
75+
76+
if [[ $green_initialSavepointPath == '/opt/flink/volume/flink-sp/savepoint-'* ]]; then
77+
echo 'Green deployment started from the expected initialSavepointPath:' $green_initialSavepointPath
78+
else
79+
echo 'Unexpected initialSavepointPath:' $green_initialSavepointPath
80+
exit 1
81+
fi;
82+
83+
echo "Successfully run the Flink Blue/Green Deployments test"
Lines changed: 62 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,62 @@
1+
#!/usr/bin/env bash
2+
################################################################################
3+
# Licensed to the Apache Software Foundation (ASF) under one
4+
# or more contributor license agreements. See the NOTICE file
5+
# distributed with this work for additional information
6+
# regarding copyright ownership. The ASF licenses this file
7+
# to you under the Apache License, Version 2.0 (the
8+
# "License"); you may not use this file except in compliance
9+
# with the License. You may obtain a copy of the License at
10+
#
11+
# http://www.apache.org/licenses/LICENSE-2.0
12+
#
13+
# Unless required by applicable law or agreed to in writing, software
14+
# distributed under the License is distributed on an "AS IS" BASIS,
15+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
16+
# See the License for the specific language governing permissions and
17+
# limitations under the License.
18+
################################################################################
19+
20+
# This script tests the Flink Blue/Green Deployments support as follows:
21+
# - Create a FlinkBlueGreenDeployment which automatically starts a "Blue" FlinkDeployment
22+
# - Once this setup is stable, we trigger a transition which will create the "Green" FlinkDeployment
23+
# - Once it's stable, verify the "Blue" FlinkDeployment is torn down
24+
# - Perform additional validation(s) before exiting
25+
26+
SCRIPT_DIR=$(dirname "$(readlink -f "$0")")
27+
source "${SCRIPT_DIR}/utils.sh"
28+
29+
CLUSTER_ID="basic-bg-stateless-example"
30+
BG_CLUSTER_ID=$CLUSTER_ID
31+
BLUE_CLUSTER_ID=$CLUSTER_ID"-blue"
32+
GREEN_CLUSTER_ID=$CLUSTER_ID"-green"
33+
34+
APPLICATION_YAML="${SCRIPT_DIR}/data/bluegreen-stateless.yaml"
35+
APPLICATION_IDENTIFIER="flinkbgdep/$CLUSTER_ID"
36+
BLUE_APPLICATION_IDENTIFIER="flinkdep/$BLUE_CLUSTER_ID"
37+
GREEN_APPLICATION_IDENTIFIER="flinkdep/$GREEN_CLUSTER_ID"
38+
TIMEOUT=300
39+
40+
retry_times 5 30 "kubectl apply -f $APPLICATION_YAML" || exit 1
41+
42+
sleep 1
43+
wait_for_jobmanager_running $BLUE_CLUSTER_ID $TIMEOUT
44+
wait_for_status $BLUE_APPLICATION_IDENTIFIER '.status.lifecycleState' STABLE ${TIMEOUT} || exit 1
45+
wait_for_status $APPLICATION_IDENTIFIER '.status.jobStatus.state' RUNNING ${TIMEOUT} || exit 1
46+
wait_for_status $APPLICATION_IDENTIFIER '.status.blueGreenState' ACTIVE_BLUE ${TIMEOUT} || exit 1
47+
48+
echo "PATCHING B/G deployment..."
49+
#kubectl patch flinkbgdep ${BG_CLUSTER_ID} --type merge --patch '{"spec":{"template":{"spec":{"flinkConfiguration":{"rest.port":"8082","taskmanager.numberOfTaskSlots":"2"}}}}}'
50+
kubectl patch flinkbgdep ${BG_CLUSTER_ID} --type merge --patch '{"spec":{"template":{"spec":{"flinkConfiguration":{"taskmanager.numberOfTaskSlots":"2"}}}}}'
51+
52+
wait_for_status $GREEN_APPLICATION_IDENTIFIER '.status.lifecycleState' STABLE ${TIMEOUT} || exit 1
53+
kubectl wait --for=delete deployment --timeout=${TIMEOUT}s --selector="app=${BLUE_CLUSTER_ID}"
54+
wait_for_status $APPLICATION_IDENTIFIER '.status.jobStatus.state' RUNNING ${TIMEOUT} || exit 1
55+
wait_for_status $APPLICATION_IDENTIFIER '.status.blueGreenState' ACTIVE_GREEN ${TIMEOUT} || exit 1
56+
57+
echo "Deleting test B/G resources" $BG_CLUSTER_ID
58+
kubectl delete flinkbluegreendeployments/$BG_CLUSTER_ID &
59+
echo "Waiting for deployment to be deleted..."
60+
kubectl wait --for=delete flinkbluegreendeployments/$BG_CLUSTER_ID
61+
62+
echo "Successfully run the Flink Blue/Green Deployments test"

0 commit comments

Comments
 (0)