Skip to content
Open
Show file tree
Hide file tree
Changes from 2 commits
Commits
Show all changes
23 commits
Select commit Hold shift + click to select a range
f6349bf
[FLINK-37515] FLIP-503: Basic support for Blue/Green deployments
schongloo Jul 10, 2025
ebab279
Fixing configOption default value management and log message formatting
schongloo Jul 15, 2025
3d91ba4
Optimized/simplified the reconciliation logic for first deployments. …
schongloo Jul 19, 2025
9f099ae
Added Blue/Green Deployments E2E test
schongloo Jul 21, 2025
853547f
- Refactoring (splitting) the Blue/Green controller logic from the Co…
schongloo Aug 3, 2025
ee47067
Refactoring for clarity (added BlueGreenTransitionContext)
schongloo Aug 4, 2025
05e9425
Introducing a Blue/Green State Machine
schongloo Aug 5, 2025
79d75bb
Optimizing the State Handling
schongloo Aug 6, 2025
d1098bb
Triggering a full transition only when needed, otherwise just patch t…
schongloo Aug 10, 2025
5a3b802
Optimized the B/G unit tests
schongloo Aug 11, 2025
c178eae
Adding support for Savepointing before transItion in the case of Upgr…
schongloo Aug 14, 2025
a3ab6d6
Updated unit test to assert Savepointing. Checkstyle fixes
schongloo Aug 15, 2025
ec1382f
Improving/adding E2E tests for blue/green deployments. Checkstyle fixes.
schongloo Aug 18, 2025
f712125
Addressing PR comments. Corrected abort/delay logic. Added the e2e te…
schongloo Aug 19, 2025
ec120f4
Removing redundant BlueGreenDiffType cases
schongloo Aug 19, 2025
f46f248
If a spec change comes in mid transition, we apply it right away.
schongloo Aug 20, 2025
929f3d0
Adjusted the semantics of the Diff class. PR comments.
schongloo Aug 21, 2025
287b1c4
Taking savepoint also for LAST-STATE, removed the last checkpoint usage.
schongloo Aug 21, 2025
9abfd87
Consolidated and organized the B/G utility methods.
schongloo Aug 22, 2025
afb13ec
Clearer comments
schongloo Aug 22, 2025
a731b09
More consistent reconcile result (UpdateControl) handling.
schongloo Aug 23, 2025
3d55610
Addressing PR comments
schongloo Aug 28, 2025
347117d
Addressing edge error cases when patching an existing FlinkDeployment…
schongloo Aug 29, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
55 changes: 55 additions & 0 deletions docs/content/docs/custom-resource/reference.md
Original file line number Diff line number Diff line change
Expand Up @@ -57,6 +57,23 @@ This serves as a full reference for FlinkDeployment and FlinkSessionJob custom r
| Parameter | Type | Docs |
| ----------| ---- | ---- |

### FlinkBlueGreenDeploymentConfigOptions

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should define the Blue Green deployment concept some where in the docs, ideally with diagrams.

Copy link
Author

@schongloo schongloo Aug 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've (temporarily) added a diagram to the PR description of the actual state machine, we can find it a final location later.

Also the pages for both FLIP-503 and FLIP-504 have higher level (simplified) concept diagrams.

Does this help?

**Class**: org.apache.flink.kubernetes.operator.api.spec.FlinkBlueGreenDeploymentConfigOptions

**Description**: Configuration options to be used by the Flink Blue/Green Deployments.

| Parameter | Type | Docs |
| ----------| ---- | ---- |

### FlinkBlueGreenDeploymentSpec
**Class**: org.apache.flink.kubernetes.operator.api.spec.FlinkBlueGreenDeploymentSpec

**Description**: Spec that describes a Flink application with blue/green deployment capabilities.

| Parameter | Type | Docs |
| ----------| ---- | ---- |
| template | org.apache.flink.kubernetes.operator.api.spec.FlinkDeploymentTemplateSpec | |

### FlinkDeploymentSpec
**Class**: org.apache.flink.kubernetes.operator.api.spec.FlinkDeploymentSpec

Expand All @@ -78,6 +95,17 @@ This serves as a full reference for FlinkDeployment and FlinkSessionJob custom r
| logConfiguration | java.util.Map<java.lang.String,java.lang.String> | Log configuration overrides for the Flink deployment. Format logConfigFileName -> configContent. |
| mode | org.apache.flink.kubernetes.operator.api.spec.KubernetesDeploymentMode | Deployment mode of the Flink cluster, native or standalone. |

### FlinkDeploymentTemplateSpec
**Class**: org.apache.flink.kubernetes.operator.api.spec.FlinkDeploymentTemplateSpec

**Description**: Template Spec that describes a Flink application managed by the blue/green controller.

| Parameter | Type | Docs |
| ----------| ---- | ---- |
| metadata | io.fabric8.kubernetes.api.model.ObjectMeta | |
| configuration | java.util.Map<java.lang.String,java.lang.String> | |
| spec | org.apache.flink.kubernetes.operator.api.spec.FlinkDeploymentSpec | |

### FlinkSessionJobSpec
**Class**: org.apache.flink.kubernetes.operator.api.spec.FlinkSessionJobSpec

Expand Down Expand Up @@ -290,6 +318,33 @@ This serves as a full reference for FlinkDeployment and FlinkSessionJob custom r
| UNKNOWN | Checkpoint format unknown, if the checkpoint was not triggered by the operator. |
| description | org.apache.flink.configuration.description.InlineElement | |

### FlinkBlueGreenDeploymentState
**Class**: org.apache.flink.kubernetes.operator.api.status.FlinkBlueGreenDeploymentState

**Description**: Enumeration of the possible states of the blue/green transition.

| Value | Docs |
| ----- | ---- |
| INITIALIZING_BLUE | We use this state while initializing for the first time, always with a "Blue" deployment type. |
| ACTIVE_BLUE | Identifies the system is running normally with a "Blue" deployment type. |
| ACTIVE_GREEN | Identifies the system is running normally with a "Green" deployment type. |
| TRANSITIONING_TO_BLUE | Identifies the system is transitioning from "Green" to "Blue". |
| TRANSITIONING_TO_GREEN | Identifies the system is transitioning from "Blue" to "Green". |

### FlinkBlueGreenDeploymentStatus
**Class**: org.apache.flink.kubernetes.operator.api.status.FlinkBlueGreenDeploymentStatus

**Description**: Last observed status of the Flink Blue/Green deployment.

| Parameter | Type | Docs |
| ----------| ---- | ---- |
| jobStatus | org.apache.flink.kubernetes.operator.api.status.JobStatus | |
| blueGreenState | org.apache.flink.kubernetes.operator.api.status.FlinkBlueGreenDeploymentState | The state of the blue/green transition. |
| lastReconciledSpec | java.lang.String | Last reconciled (serialized) deployment spec. |
| lastReconciledTimestamp | java.lang.String | Timestamp of last reconciliation. |
| abortTimestamp | java.lang.String | Computed from abortGracePeriodMs, timestamp after which the deployment should be aborted. |
| deploymentReadyTimestamp | java.lang.String | Timestamp when the deployment became READY/STABLE. Used to determine when to delete it. |

### FlinkDeploymentReconciliationStatus
**Class**: org.apache.flink.kubernetes.operator.api.status.FlinkDeploymentReconciliationStatus

Expand Down
21 changes: 21 additions & 0 deletions flink-kubernetes-operator-api/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -212,6 +212,7 @@ under the License.
<include>flinkdeployments.flink.apache.org-v1.yml</include>
<include>flinksessionjobs.flink.apache.org-v1.yml</include>
<include>flinkstatesnapshots.flink.apache.org-v1.yml</include>
<include>flinkbluegreendeployments.flink.apache.org-v1.yml</include>
</includes>
<filtering>false</filtering>
</resource>
Expand All @@ -236,6 +237,8 @@ under the License.
<classpath refid="maven.compile.classpath"/>
<arg value="file://${rootDir}/helm/flink-kubernetes-operator/crds/flinkdeployments.flink.apache.org-v1.yml"/>
<arg value="https://raw.githubusercontent.com/apache/flink-kubernetes-operator/release-1.9.0/helm/flink-kubernetes-operator/crds/flinkdeployments.flink.apache.org-v1.yml"/>
<arg value="https://raw.githubusercontent.com/apache/flink-kubernetes-operator/release-1.10.0/helm/flink-kubernetes-operator/crds/flinkdeployments.flink.apache.org-v1.yml"/>
<arg value="https://raw.githubusercontent.com/apache/flink-kubernetes-operator/release-1.11.0/helm/flink-kubernetes-operator/crds/flinkdeployments.flink.apache.org-v1.yml"/>
</java>
</target>
</configuration>
Expand All @@ -253,6 +256,24 @@ under the License.
<classpath refid="maven.compile.classpath"/>
<arg value="file://${rootDir}/helm/flink-kubernetes-operator/crds/flinksessionjobs.flink.apache.org-v1.yml"/>
<arg value="https://raw.githubusercontent.com/apache/flink-kubernetes-operator/release-1.9.0/helm/flink-kubernetes-operator/crds/flinksessionjobs.flink.apache.org-v1.yml"/>
<arg value="https://raw.githubusercontent.com/apache/flink-kubernetes-operator/release-1.10.0/helm/flink-kubernetes-operator/crds/flinksessionjobs.flink.apache.org-v1.yml"/>
<arg value="https://raw.githubusercontent.com/apache/flink-kubernetes-operator/release-1.11.0/helm/flink-kubernetes-operator/crds/flinksessionjobs.flink.apache.org-v1.yml"/>
</java>
</target>
</configuration>
</execution>
<execution>
<id>flinkbgdeployments-remove-scale-subresource</id>
<phase>package</phase>
<goals>
<goal>run</goal>
</goals>
<configuration>
<target>
<java classname="org.apache.flink.kubernetes.operator.api.utils.RemoveScaleSubResource"
fork="true" failonerror="true">
<classpath refid="maven.compile.classpath"/>
<arg value="${rootDir}/helm/flink-kubernetes-operator/crds/flinkbluegreendeployments.flink.apache.org-v1.yml"/>
</java>
</target>
</configuration>
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
* The ASF licenses this file to You under the Apache License, Version 2.0
* (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/

package org.apache.flink.kubernetes.operator.api;

import org.apache.flink.annotation.Experimental;
import org.apache.flink.kubernetes.operator.api.spec.FlinkBlueGreenDeploymentSpec;
import org.apache.flink.kubernetes.operator.api.status.FlinkBlueGreenDeploymentStatus;

import com.fasterxml.jackson.annotation.JsonInclude;
import com.fasterxml.jackson.databind.annotation.JsonDeserialize;
import io.fabric8.kubernetes.api.model.Namespaced;
import io.fabric8.kubernetes.client.CustomResource;
import io.fabric8.kubernetes.model.annotation.Group;
import io.fabric8.kubernetes.model.annotation.ShortNames;
import io.fabric8.kubernetes.model.annotation.Version;

/** Custom resource definition that represents a deployments with Blue/Green rollout capability. */
@Experimental
@JsonInclude(JsonInclude.Include.NON_NULL)
@JsonDeserialize()
@Group(CrdConstants.API_GROUP)
@Version(CrdConstants.API_VERSION)
@ShortNames({"flinkbgdep"})
public class FlinkBlueGreenDeployment
extends CustomResource<FlinkBlueGreenDeploymentSpec, FlinkBlueGreenDeploymentStatus>
implements Namespaced {}
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
* The ASF licenses this file to You under the Apache License, Version 2.0
* (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/

package org.apache.flink.kubernetes.operator.api.bluegreen;

import org.apache.flink.kubernetes.operator.api.FlinkDeployment;

/**
* Enumeration of the two possible Flink Blue/Green deployment types. Only one of each type will be
* present at all times for a particular job.
*/
public enum DeploymentType {
/** Identifier for the first or "Blue" deployment type. */
BLUE,

/** Identifier for the second or "Green" deployment type. */
GREEN;

public static final String LABEL_KEY = "flink/blue-green-deployment-type";

public static DeploymentType fromDeployment(FlinkDeployment flinkDeployment) {
String typeAnnotation = flinkDeployment.getMetadata().getLabels().get(LABEL_KEY);
return DeploymentType.valueOf(typeAnnotation);
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
* The ASF licenses this file to You under the Apache License, Version 2.0
* (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/

package org.apache.flink.kubernetes.operator.api.spec;

import org.apache.flink.configuration.ConfigOption;
import org.apache.flink.configuration.ConfigOptions;

import java.time.Duration;

/** Configuration options to be used by the Flink Blue/Green Deployments. */
public class FlinkBlueGreenDeploymentConfigOptions {

public static final String K8S_OP_CONF_PREFIX = "kubernetes.operator.";

public static final String BLUE_GREEN_CONF_PREFIX = K8S_OP_CONF_PREFIX + "bluegreen.";

public static final int MIN_ABORT_GRACE_PERIOD_MS = 120000; // 2 mins

public static ConfigOptions.OptionBuilder operatorConfig(String key) {
return ConfigOptions.key(BLUE_GREEN_CONF_PREFIX + key);
}

public static final ConfigOption<Duration> ABORT_GRACE_PERIOD =
operatorConfig("abort.grace-period")
.durationType()
.defaultValue(Duration.ofMillis(MIN_ABORT_GRACE_PERIOD_MS))
.withDescription(
"The max time to wait in milliseconds for a deployment to become ready before aborting it. Cannot be smaller than 2 minutes.");

public static final ConfigOption<Duration> RECONCILIATION_RESCHEDULING_INTERVAL =
operatorConfig("reconciliation.reschedule-interval")
.durationType()
.defaultValue(Duration.ofMillis(15000)) // 15 seconds
.withDescription(
"Configurable delay in milliseconds to use when the operator reschedules a reconciliation.");

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest including some advice as to what to set this to in different circumstances. Same for deployment-deletion.delay


public static final ConfigOption<Duration> DEPLOYMENT_DELETION_DELAY =
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These configs are not being used. Is there an intent to use them later ?

operatorConfig("deployment-deletion.delay")
.durationType()
.defaultValue(Duration.ofMillis(0))
.withDescription(
"Configurable delay in milliseconds before deleting a deployment after being marked done.");
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
* The ASF licenses this file to You under the Apache License, Version 2.0
* (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/

package org.apache.flink.kubernetes.operator.api.spec;

import org.apache.flink.annotation.Experimental;

import com.fasterxml.jackson.annotation.JsonIgnoreProperties;
import lombok.AllArgsConstructor;
import lombok.Data;
import lombok.NoArgsConstructor;

/** Spec that describes a Flink application with blue/green deployment capabilities. */
@Experimental
@Data
@NoArgsConstructor
@AllArgsConstructor
@JsonIgnoreProperties(ignoreUnknown = true)
public class FlinkBlueGreenDeploymentSpec {

private FlinkDeploymentTemplateSpec template;
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
* The ASF licenses this file to You under the Apache License, Version 2.0
* (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/

package org.apache.flink.kubernetes.operator.api.spec;

import com.fasterxml.jackson.annotation.JsonIgnore;
import com.fasterxml.jackson.annotation.JsonProperty;
import io.fabric8.kubernetes.api.model.ObjectMeta;
import lombok.AllArgsConstructor;
import lombok.Data;
import lombok.NoArgsConstructor;
import lombok.experimental.SuperBuilder;

import java.util.LinkedHashMap;
import java.util.Map;

/** Template Spec that describes a Flink application managed by the blue/green controller. */
@AllArgsConstructor
@NoArgsConstructor
@Data
@SuperBuilder
public class FlinkDeploymentTemplateSpec {

@JsonProperty("metadata")
private ObjectMeta metadata;

@JsonProperty("configuration")
private Map<String, String> configuration;

@JsonProperty("spec")
private FlinkDeploymentSpec spec;

@JsonIgnore
private Map<String, Object> additionalProperties = new LinkedHashMap<String, Object>();
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
* The ASF licenses this file to You under the Apache License, Version 2.0
* (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/

package org.apache.flink.kubernetes.operator.api.status;

/** Enumeration of the possible states of the blue/green transition. */
public enum FlinkBlueGreenDeploymentState {

/**
* We use this state while initializing for the first time, always with a "Blue" deployment
* type.
*/
INITIALIZING_BLUE,

/** Identifies the system is running normally with a "Blue" deployment type. */
ACTIVE_BLUE,

/** Identifies the system is running normally with a "Green" deployment type. */
ACTIVE_GREEN,

/** Identifies the system is transitioning from "Green" to "Blue". */
TRANSITIONING_TO_BLUE,

/** Identifies the system is transitioning from "Blue" to "Green". */
TRANSITIONING_TO_GREEN,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what state are we in during shutdown?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The (old) deployments are deleted once the new ones are RUNNING and STABLE, this is the last step during the TRANSITION_TO_* state.

}
Loading
Loading