diff --git a/docs/content/docs/concepts/overview.md b/docs/content/docs/concepts/overview.md index a96d8fa831..e19fec4cac 100644 --- a/docs/content/docs/concepts/overview.md +++ b/docs/content/docs/concepts/overview.md @@ -94,7 +94,7 @@ The examples are maintained as part of the operator repo and can be found [here] ## Known Issues & Limitations ### JobManager High-availability -The Operator supports both [Kubernetes HA Services](https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/ha/kubernetes_ha/) and [Zookeeper HA Services](https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/ha/zookeeper_ha/) for providing High-availability for Flink jobs. The HA solution can benefit form using additional [Standby replicas](https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/ha/overview/), it will result in a faster recovery time, but Flink jobs will still restart when the Leader JobManager goes down. +The Operator supports both [Kubernetes HA Services](https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/ha/kubernetes_ha/) and [Zookeeper HA Services](https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/ha/zookeeper_ha/) for providing High-availability for Flink jobs. The HA solution can benefit from using additional [Standby replicas](https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/ha/overview/), it will result in a faster recovery time, but Flink jobs will still restart when the Leader JobManager goes down. ### JobResultStore Resource Leak To mitigate the impact of [FLINK-27569](https://issues.apache.org/jira/browse/FLINK-27569) the operator introduced a workaround [FLINK-27573](https://issues.apache.org/jira/browse/FLINK-27573) by setting `job-result-store.delete-on-commit=false` and a unique value for `job-result-store.storage-path` for every cluster launch. The storage path for older runs must be cleaned up manually, keeping the latest directory always: diff --git a/docs/content/docs/operations/helm.md b/docs/content/docs/operations/helm.md index 657d078840..00ea1f4e22 100644 --- a/docs/content/docs/operations/helm.md +++ b/docs/content/docs/operations/helm.md @@ -32,7 +32,7 @@ The operator installation is managed by a helm chart. To install with the chart helm install flink-kubernetes-operator helm/flink-kubernetes-operator ``` -To install from our Helm Chart Reporsitory run: +To install from our Helm Chart Repository run: ``` helm repo add flink-operator-repo https://downloads.apache.org/flink/flink-kubernetes-operator-/ @@ -112,7 +112,7 @@ The configurable parameters of the Helm chart and which default values as detail | defaultConfiguration.create | Whether to enable default configuration to create for flink-kubernetes-operator. | true | | defaultConfiguration.append | Whether to append configuration files with configs. | true | | defaultConfiguration.flink-conf.yaml | The default configuration of flink-conf.yaml. | kubernetes.operator.metrics.reporter.slf4j.factory.class: org.apache.flink.metrics.slf4j.Slf4jReporterFactory
kubernetes.operator.metrics.reporter.slf4j.interval: 5 MINUTE
kubernetes.operator.reconcile.interval: 15 s
kubernetes.operator.observer.progress-check.interval: 5 s | -| defaultConfiguration.config.yaml | The newer configuration file format for flink that will enforced in Flink 2.0. Note this was introudced in flink 1.19. | kubernetes.operator.metrics.reporter.slf4j.factory.class: org.apache.flink.metrics.slf4j.Slf4jReporterFactory
kubernetes.operator.metrics.reporter.slf4j.interval: 5 MINUTE
kubernetes.operator.reconcile.interval: 15 s
kubernetes.operator.observer.progress-check.interval: 5 s | +| defaultConfiguration.config.yaml | The newer configuration file format for flink that will enforced in Flink 2.0. Note this was introduced in flink 1.19. | kubernetes.operator.metrics.reporter.slf4j.factory.class: org.apache.flink.metrics.slf4j.Slf4jReporterFactory
kubernetes.operator.metrics.reporter.slf4j.interval: 5 MINUTE
kubernetes.operator.reconcile.interval: 15 s
kubernetes.operator.observer.progress-check.interval: 5 s | | defaultConfiguration.log4j-operator.properties | The default configuration of log4j-operator.properties. | | | defaultConfiguration.log4j-console.properties | The default configuration of log4j-console.properties. | | diff --git a/docs/content/docs/operations/metrics-logging.md b/docs/content/docs/operations/metrics-logging.md index f40c8af937..d1e194efee 100644 --- a/docs/content/docs/operations/metrics-logging.md +++ b/docs/content/docs/operations/metrics-logging.md @@ -42,7 +42,7 @@ The Operator gathers aggregates metrics about managed resources. | Namespace | FlinkDeployment.JmDeploymentStatus.<Status>.Count | Number of managed FlinkDeployment resources per <Status> per namespace. <Status> can take values from: READY, DEPLOYED_NOT_READY, DEPLOYING, MISSING, ERROR | Gauge | | Namespace | FlinkDeployment.FlinkVersion.<FlinkVersion>.Count | Number of managed FlinkDeployment resources per <FlinkVersion> per namespace. <FlinkVersion> is retrieved via REST API from Flink JM. | Gauge | | Namespace | FlinkDeployment/FlinkSessionJob.Lifecycle.State.<State>.Count | Number of managed resources currently in state <State> per namespace. <State> can take values from: CREATED, SUSPENDED, UPGRADING, DEPLOYED, STABLE, ROLLING_BACK, ROLLED_BACK, FAILED | Gauge | -| System/Namespace | FlinkDeployment/FlinkSessionJob.Lifecycle.State.<State>.TimeSeconds | Time spent in state <State$gt for a given resource. <State> can take values from: CREATED, SUSPENDED, UPGRADING, DEPLOYED, STABLE, ROLLING_BACK, ROLLED_BACK, FAILED | Histogram | +| System/Namespace | FlinkDeployment/FlinkSessionJob.Lifecycle.State.<State>.TimeSeconds | Time spent in state <State> for a given resource. <State> can take values from: CREATED, SUSPENDED, UPGRADING, DEPLOYED, STABLE, ROLLING_BACK, ROLLED_BACK, FAILED | Histogram | | System/Namespace | FlinkDeployment/FlinkSessionJob.Lifecycle.Transition.<Transition>.TimeSeconds | Time statistics for selected lifecycle state transitions. <Transition> can take values from: Resume, Upgrade, Suspend, Stabilization, Rollback, Submission | Histogram | #### Lifecycle metrics diff --git a/docs/content/docs/operations/plugins.md b/docs/content/docs/operations/plugins.md index 3abd2c3a76..6503f57fdb 100644 --- a/docs/content/docs/operations/plugins.md +++ b/docs/content/docs/operations/plugins.md @@ -127,7 +127,7 @@ That folder is added to classpath upon initialization. ## Custom Flink Resource Mutators -`FlinkResourceMutator`, an interface for ,mutating the resources of `FlinkDeployment` and `FlinkSessionJob`, is a pluggable component based on the [Plugins](https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/filesystems/plugins) mechanism. During development, we can customize the implementation of `FlinkResourceMutator` and make sure to retain the service definition in `META-INF/services`. +`FlinkResourceMutator`, an interface for mutating the resources of `FlinkDeployment` and `FlinkSessionJob`, is a pluggable component based on the [Plugins](https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/filesystems/plugins) mechanism. During development, we can customize the implementation of `FlinkResourceMutator` and make sure to retain the service definition in `META-INF/services`. The following steps demonstrate how to develop and use a custom mutator. 1. Implement `FlinkResourceMutator` interface: diff --git a/docs/content/docs/operations/upgrade.md b/docs/content/docs/operations/upgrade.md index 70987d81e5..1e34b8db08 100644 --- a/docs/content/docs/operations/upgrade.md +++ b/docs/content/docs/operations/upgrade.md @@ -38,8 +38,9 @@ Please check the [related section](#upgrading-from-v1alpha1---v1beta1). ## Normal Upgrade Process If you are upgrading from `kubernetes-operator-1.0.0` or later, please refer to the following two steps: -1. Upgrading the CRDs -2. Upgrading the Helm deployment +1. Upgrading the Java client library +2. Upgrading the CRDs +3. Upgrading the Helm deployment We will cover these steps in detail in the next sections. @@ -150,7 +151,7 @@ Here is a reference example of upgrading a `basic-checkpoint-ha-example` deploym ``` 5. Restore the job: - Deploy the previously deleted job using this [FlinkDeployemnt](https://raw.githubusercontent.com/apache/flink-kubernetes-operator/main/examples/basic-checkpoint-ha.yaml) with `v1beta1` and explicitly set the `job.initialSavepointPath` to the savepoint location obtained from the step 1. + Deploy the previously deleted job using this [FlinkDeployment](https://raw.githubusercontent.com/apache/flink-kubernetes-operator/main/examples/basic-checkpoint-ha.yaml) with `v1beta1` and explicitly set the `job.initialSavepointPath` to the savepoint location obtained from the step 1. ``` spec: