Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion docs/content/docs/concepts/overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -94,7 +94,7 @@ The examples are maintained as part of the operator repo and can be found [here]
## Known Issues & Limitations

### JobManager High-availability
The Operator supports both [Kubernetes HA Services](https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/ha/kubernetes_ha/) and [Zookeeper HA Services](https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/ha/zookeeper_ha/) for providing High-availability for Flink jobs. The HA solution can benefit form using additional [Standby replicas](https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/ha/overview/), it will result in a faster recovery time, but Flink jobs will still restart when the Leader JobManager goes down.
The Operator supports both [Kubernetes HA Services](https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/ha/kubernetes_ha/) and [Zookeeper HA Services](https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/ha/zookeeper_ha/) for providing High-availability for Flink jobs. The HA solution can benefit from using additional [Standby replicas](https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/ha/overview/), it will result in a faster recovery time, but Flink jobs will still restart when the Leader JobManager goes down.

### JobResultStore Resource Leak
To mitigate the impact of [FLINK-27569](https://issues.apache.org/jira/browse/FLINK-27569) the operator introduced a workaround [FLINK-27573](https://issues.apache.org/jira/browse/FLINK-27573) by setting `job-result-store.delete-on-commit=false` and a unique value for `job-result-store.storage-path` for every cluster launch. The storage path for older runs must be cleaned up manually, keeping the latest directory always:
Expand Down
4 changes: 2 additions & 2 deletions docs/content/docs/operations/helm.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ The operator installation is managed by a helm chart. To install with the chart
helm install flink-kubernetes-operator helm/flink-kubernetes-operator
```

To install from our Helm Chart Reporsitory run:
To install from our Helm Chart Repository run:

```
helm repo add flink-operator-repo https://downloads.apache.org/flink/flink-kubernetes-operator-<OPERATOR-VERSION>/
Expand Down Expand Up @@ -112,7 +112,7 @@ The configurable parameters of the Helm chart and which default values as detail
| defaultConfiguration.create | Whether to enable default configuration to create for flink-kubernetes-operator. | true |
| defaultConfiguration.append | Whether to append configuration files with configs. | true |
| defaultConfiguration.flink-conf.yaml | The default configuration of flink-conf.yaml. | kubernetes.operator.metrics.reporter.slf4j.factory.class: org.apache.flink.metrics.slf4j.Slf4jReporterFactory<br/>kubernetes.operator.metrics.reporter.slf4j.interval: 5 MINUTE<br/>kubernetes.operator.reconcile.interval: 15 s<br/>kubernetes.operator.observer.progress-check.interval: 5 s |
| defaultConfiguration.config.yaml | The newer configuration file format for flink that will enforced in Flink 2.0. Note this was introudced in flink 1.19. | kubernetes.operator.metrics.reporter.slf4j.factory.class: org.apache.flink.metrics.slf4j.Slf4jReporterFactory<br/>kubernetes.operator.metrics.reporter.slf4j.interval: 5 MINUTE<br/>kubernetes.operator.reconcile.interval: 15 s<br/>kubernetes.operator.observer.progress-check.interval: 5 s |
| defaultConfiguration.config.yaml | The newer configuration file format for flink that will enforced in Flink 2.0. Note this was introduced in flink 1.19. | kubernetes.operator.metrics.reporter.slf4j.factory.class: org.apache.flink.metrics.slf4j.Slf4jReporterFactory<br/>kubernetes.operator.metrics.reporter.slf4j.interval: 5 MINUTE<br/>kubernetes.operator.reconcile.interval: 15 s<br/>kubernetes.operator.observer.progress-check.interval: 5 s |

| defaultConfiguration.log4j-operator.properties | The default configuration of log4j-operator.properties. | |
| defaultConfiguration.log4j-console.properties | The default configuration of log4j-console.properties. | |
Expand Down
2 changes: 1 addition & 1 deletion docs/content/docs/operations/metrics-logging.md
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,7 @@ The Operator gathers aggregates metrics about managed resources.
| Namespace | FlinkDeployment.JmDeploymentStatus.&lt;Status&gt;.Count | Number of managed FlinkDeployment resources per &lt;Status&gt; per namespace. &lt;Status&gt; can take values from: READY, DEPLOYED_NOT_READY, DEPLOYING, MISSING, ERROR | Gauge |
| Namespace | FlinkDeployment.FlinkVersion.&lt;FlinkVersion&gt;.Count | Number of managed FlinkDeployment resources per &lt;FlinkVersion&gt; per namespace. &lt;FlinkVersion&gt; is retrieved via REST API from Flink JM. | Gauge |
| Namespace | FlinkDeployment/FlinkSessionJob.Lifecycle.State.&lt;State&gt;.Count | Number of managed resources currently in state &lt;State&gt; per namespace. &lt;State&gt; can take values from: CREATED, SUSPENDED, UPGRADING, DEPLOYED, STABLE, ROLLING_BACK, ROLLED_BACK, FAILED | Gauge |
| System/Namespace | FlinkDeployment/FlinkSessionJob.Lifecycle.State.&lt;State&gt;.TimeSeconds | Time spent in state &lt;State$gt for a given resource. &lt;State&gt; can take values from: CREATED, SUSPENDED, UPGRADING, DEPLOYED, STABLE, ROLLING_BACK, ROLLED_BACK, FAILED | Histogram |
| System/Namespace | FlinkDeployment/FlinkSessionJob.Lifecycle.State.&lt;State&gt;.TimeSeconds | Time spent in state &lt;State&gt; for a given resource. &lt;State&gt; can take values from: CREATED, SUSPENDED, UPGRADING, DEPLOYED, STABLE, ROLLING_BACK, ROLLED_BACK, FAILED | Histogram |
| System/Namespace | FlinkDeployment/FlinkSessionJob.Lifecycle.Transition.&lt;Transition&gt;.TimeSeconds | Time statistics for selected lifecycle state transitions. &lt;Transition&gt; can take values from: Resume, Upgrade, Suspend, Stabilization, Rollback, Submission | Histogram |

#### Lifecycle metrics
Expand Down
2 changes: 1 addition & 1 deletion docs/content/docs/operations/plugins.md
Original file line number Diff line number Diff line change
Expand Up @@ -127,7 +127,7 @@ That folder is added to classpath upon initialization.

## Custom Flink Resource Mutators

`FlinkResourceMutator`, an interface for ,mutating the resources of `FlinkDeployment` and `FlinkSessionJob`, is a pluggable component based on the [Plugins](https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/filesystems/plugins) mechanism. During development, we can customize the implementation of `FlinkResourceMutator` and make sure to retain the service definition in `META-INF/services`.
`FlinkResourceMutator`, an interface for mutating the resources of `FlinkDeployment` and `FlinkSessionJob`, is a pluggable component based on the [Plugins](https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/filesystems/plugins) mechanism. During development, we can customize the implementation of `FlinkResourceMutator` and make sure to retain the service definition in `META-INF/services`.
The following steps demonstrate how to develop and use a custom mutator.

1. Implement `FlinkResourceMutator` interface:
Expand Down
7 changes: 4 additions & 3 deletions docs/content/docs/operations/upgrade.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,8 +38,9 @@ Please check the [related section](#upgrading-from-v1alpha1---v1beta1).
## Normal Upgrade Process

If you are upgrading from `kubernetes-operator-1.0.0` or later, please refer to the following two steps:
1. Upgrading the CRDs
2. Upgrading the Helm deployment
1. Upgrading the Java client library
2. Upgrading the CRDs
3. Upgrading the Helm deployment

We will cover these steps in detail in the next sections.

Expand Down Expand Up @@ -150,7 +151,7 @@ Here is a reference example of upgrading a `basic-checkpoint-ha-example` deploym
```
5. Restore the job:

Deploy the previously deleted job using this [FlinkDeployemnt](https://raw.githubusercontent.com/apache/flink-kubernetes-operator/main/examples/basic-checkpoint-ha.yaml) with `v1beta1` and explicitly set the `job.initialSavepointPath` to the savepoint location obtained from the step 1.
Deploy the previously deleted job using this [FlinkDeployment](https://raw.githubusercontent.com/apache/flink-kubernetes-operator/main/examples/basic-checkpoint-ha.yaml) with `v1beta1` and explicitly set the `job.initialSavepointPath` to the savepoint location obtained from the step 1.

```
spec:
Expand Down