diff --git a/.github/workflows/skywalking.yaml b/.github/workflows/skywalking.yaml index d41e8cf4fd78..3158ca364bac 100644 --- a/.github/workflows/skywalking.yaml +++ b/.github/workflows/skywalking.yaml @@ -664,6 +664,8 @@ jobs: config: test/e2e-v2/cases/activemq/e2e.yaml - name: Kong config: test/e2e-v2/cases/kong/e2e.yaml + - name: Flink + config: test/e2e-v2/cases/flink/e2e.yaml - name: UI Menu BanyanDB config: test/e2e-v2/cases/menu/banyandb/e2e.yaml diff --git a/docs/en/changes/changes.md b/docs/en/changes/changes.md index 1b67cebe59b9..2ffc7cd7bef9 100644 --- a/docs/en/changes/changes.md +++ b/docs/en/changes/changes.md @@ -11,6 +11,7 @@ * Support `hot/warm/cold` stages TTL query in the status API. * PromQL Service: traffic query support `limit` and regex match. * Fix an edge case of HashCodeSelector(Integer#MIN_VALUE causes ArrayIndexOutOfBoundsException). +* Support Flink monitoring. #### UI diff --git a/docs/en/setup/backend/backend-clickhouse-monitoring.md b/docs/en/setup/backend/backend-clickhouse-monitoring.md index 88208b97db4b..65f8bba73d4f 100644 --- a/docs/en/setup/backend/backend-clickhouse-monitoring.md +++ b/docs/en/setup/backend/backend-clickhouse-monitoring.md @@ -21,7 +21,7 @@ the metrics to . 2. Set up [OpenTelemetry Collector ](https://opentelemetry.io/docs/collector/getting-started/#docker). For details on Prometheus Receiver in OpenTelemetry Collector, refer - to [here](../../../../test/e2e-v2/cases/mysql/prometheus-mysql-exporter/otel-collector-config.yaml). + to [here](../../../../test/e2e-v2/cases/clickhouse/clickhouse-prometheus-endpoint/otel-collector-config.yaml). 3. Config SkyWalking [OpenTelemetry receiver](opentelemetry-receiver.md). ### ClickHouse Monitoring diff --git a/docs/en/setup/backend/backend-flink-monitoring.md b/docs/en/setup/backend/backend-flink-monitoring.md new file mode 100644 index 000000000000..8d53b42f15df --- /dev/null +++ b/docs/en/setup/backend/backend-flink-monitoring.md @@ -0,0 +1,105 @@ +# Flink monitoring + +## Flink server performance from built-in metrics data +SkyWalking leverages OpenTelemetry Collector to transfer the flink metrics to +[OpenTelemetry receiver](opentelemetry-receiver.md) and into the [Meter System](./../../concepts-and-designs/mal.md). + +## Data flow + +1. Configure Flink jobManager and TaskManager to expose metrics data for scraping through Prometheus endpoint. +2. OpenTelemetry Collector fetches metrics from Flink jobManager and TaskManager through Prometheus endpoint, and pushes metrics to SkyWalking OAP Server via + OpenTelemetry gRPC exporter. +3. The SkyWalking OAP Server parses the expression with [MAL](../../concepts-and-designs/mal.md) to + filter/calculate/aggregate and store the results. + +## Setup + +1. Set up [built-in prometheus endpoint](https://nightlies.apache.org/flink/flink-docs-release-2.0-preview1/docs/deployment/metric_reporters/#prometheus). +2. Set up [OpenTelemetry Collector ](https://opentelemetry.io/docs/collector/getting-started/#docker). + Please note that the OpenTelemetry Collector uses the job_name label by default, which may conflict with the job_name label in Flink. + Please modify the Flink label name in the configuration to avoid this conflict, you can refer to [here](../../../../test/e2e-v2/cases/flink/otel-collector-config.yaml) + for details on Prometheus Receiver in OpenTelemetry Collector. +3. Config SkyWalking [OpenTelemetry receiver](opentelemetry-receiver.md). + +## Flink Monitoring + +Flink monitoring provides multidimensional metrics monitoring of Flink cluster as `Layer: Flink` `Service` in +the OAP. In each cluster, the taskManager is represented as `Instance` and the job is represented as `Endpoint`. + +### Flink service Supported Metrics + +| Monitoring Panel | Unit | Metric Name | Description | Data Source | +|-------------------------------|-------|-------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------|------------------| +| Running Jobs | Count | meter_flink_jobManager_running_job_number | The number of running jobs. | Flink JobManager | +| TaskManagers | Count | meter_flink_jobManager_taskManagers_registered_number | The number of taskManagers. | Flink JobManager | +| JVM CPU Load | % | meter_flink_jobManager_jvm_cpu_load | The number of the jobManager JVM CPU load. | Flink JobManager | +| JVM thread count | Count | meter_flink_jobManager_jvm_thread_count | The total number of the jobManager JVM live threads. | Flink JobManager | +| JVM Memory Heap Used | MB | meter_flink_jobManager_jvm_memory_heap_used | The amount of the jobManager JVM memory heap used. | Flink JobManager | +| JVM Memory NonHeap Used | MB | meter_flink_jobManager_jvm_memory_NonHeap_used | The amount of the jobManager JVM nonHeap memory used. | Flink JobManager | +| Task Managers Slots Total | Count | meter_flink_jobManager_taskManagers_slots_total | The number of total slots. | Flink JobManager | +| Task Managers Slots Available | Count | meter_flink_jobManager_taskManagers_slots_available | The number of available slots. | Flink JobManager | +| JVM CPU Time | ms | meter_flink_jobManager_jvm_cpu_time | The jobManager CPU time used by the JVM increase per minute. | Flink JobManager | +| JVM Memory Heap Available | MB | meter_flink_jobManager_jvm_memory_heap_available | The amount of the jobManager available JVM memory Heap. | Flink JobManager | +| JVM Memory NoHeap Available | MB | meter_flink_jobManager_jvm_memory_nonHeap_available | The amount of the jobManager available JVM memory noHeap. | Flink JobManager | +| JVM Memory Metaspace Used | MB | meter_flink_jobManager_jvm_memory_metaspace_used | The amount of the jobManager Used JVM metaspace memory. | Flink JobManager | +| JVM Metaspace Available | MB | meter_flink_jobManager_jvm_memory_metaspace_available | The amount of the jobManager available JVM Metaspace Memory. | Flink JobManager | +| JVM G1 Young Generation Count | Count | meter_flink_jobManager_jvm_g1_young_generation_count | The incremental number of the jobManager JVM G1 young generation count per minute. | Flink JobManager | +| JVM G1 Old Generation Count | Count | meter_flink_jobManager_jvm_g1_old_generation_count | The incremental number of the jobManager JVM G1 old generation count per minute. | Flink JobManager | +| JVM G1 Young Generation Time | Count | meter_flink_jobManager_jvm_g1_young_generation_time | The incremental time of the jobManager JVM G1 young generation per minute. | Flink JobManager | +| JVM G1 Old Generation Time | ms | meter_flink_jobManager_jvm_g1_old_generation_time | The incremental time of JVM G1 old generation increase per minute. | Flink JobManager | +| JVM G1 Old Generation Count | Count | meter_flink_jobManager_jvm_all_garbageCollector_count | The incremental number of the jobManager JVM all garbageCollector count per minute. | Flink JobManager | +| JVM All GarbageCollector Time | ms | meter_flink_jobManager_jvm_all_garbageCollector_time | The incremental time spent performing garbage collection for the given (or all) collector for the jobManager per minute. | Flink JobManager | + +### Flink instance Supported Metrics + +| Monitoring Panel | Unit | Metric Name | Description | Data Source | +|----------------------------------|---------|----------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------| +| JVM CPU Load | % | meter_flink_taskManager_jvm_cpu_load | The number of the JVM CPU load. | Flink TaskManager | +| JVM Thread Count | Count | meter_flink_taskManager_jvm_thread_count | The total number of JVM threads. | Flink TaskManager | +| JVM Memory Heap Used | MB | meter_flink_taskManager_jvm_memory_heap_used | The amount of JVM memory heap used. | Flink TaskManager | +| JVM Memory NonHeap Used | MB | meter_flink_taskManager_jvm_memory_nonHeap_used | The amount of JVM nonHeap memory used. | Flink TaskManager | +| JVM CPU Time | ms | meter_flink_taskManager_jvm_cpu_time | The CPU time used by the JVM. | Flink TaskManager | +| JVM Memory Heap Available | MB | meter_flink_taskManager_jvm_memory_heap_available | The amount of available JVM memory Heap. | Flink TaskManager | +| JVM Memory NonHeap Available | MB | meter_flink_taskManager_jvm_memory_nonHeap_available | The amount of available JVM memory nonHeap. | Flink TaskManager | +| JVM Memory Metaspace Used | MB | meter_flink_taskManager_jvm_memory_metaspace_used | The amount of Used JVM metaspace memory. | Flink TaskManager | +| JVM Metaspace Available | MB | meter_flink_taskManager_jvm_memory_metaspace_available | The amount of Available JVM Metaspace Memory. | Flink TaskManager | +| NumRecordsIn | Count | meter_flink_taskManager_numRecordsIn | The total number of records this task has received. | Flink TaskManager | +| NumRecordsOut | Count | meter_flink_taskManager_numRecordsOut | The total number of records this task has emitted. | Flink TaskManager | +| NumBytesInPerSecond | Bytes/s | meter_flink_taskManager_numBytesInPerSecond | The number of bytes received per second. | Flink TaskManager | +| NumBytesOutPerSecond | Bytes/s | meter_flink_taskManager_numBytesOutPerSecond | The number of bytes this task emits per second. | Flink TaskManager | +| Netty UsedMemory | MB | meter_flink_taskManager_netty_usedMemory | The amount of used netty memory. | Flink TaskManager | +| Netty AvailableMemory | MB | meter_flink_taskManager_netty_availableMemory | The amount of available netty memory. | Flink TaskManager | +| IsBackPressured | Count | meter_flink_taskManager_isBackPressured | Whether the task is back-pressured. | Flink TaskManager | +| InPoolUsage | % | meter_flink_taskManager_inPoolUsage | An estimate of the input buffers usage. (ignores LocalInputChannels). | Flink TaskManager | +| OutPoolUsage | % | meter_flink_taskManager_outPoolUsage | An estimate of the output buffers usage. The pool usage can be > 100% if overdraft buffers are being used. | Flink TaskManager | +| SoftBackPressuredTimeMsPerSecond | ms | meter_flink_taskManager_softBackPressuredTimeMsPerSecond | The time this task is softly back pressured per second.Softly back pressured task will be still responsive and capable of for example triggering unaligned checkpoints. | Flink TaskManager | +| HardBackPressuredTimeMsPerSecond | ms | meter_flink_taskManager_hardBackPressuredTimeMsPerSecond | The time this task is back pressured in a hard way per second.During hard back pressured task is completely blocked and unresponsive preventing for example unaligned checkpoints from triggering. | Flink TaskManager | +| IdleTimeMsPerSecond | ms | meter_flink_taskManager_idleTimeMsPerSecond | The time this task is idle (has no data to process) per second. Idle time excludes back pressured time, so if the task is back pressured it is not idle. | Flink TaskManager | +| BusyTimeMsPerSecond | ms | meter_flink_taskManager_busyTimeMsPerSecond | The time this task is busy (neither idle nor back pressured) per second. Can be NaN, if the value could not be calculated. | Flink TaskManager | + +### Flink Endpoint Supported Metrics + +| Monitoring Panel | Unit | Metric Name | Description | Data Source | +|-------------------------|---------|-----------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------| +| Job RunningTime | min | meter_flink_job_runningTime | The job running time. | Flink JobManager | +| Job Restart Number | Count | meter_flink_job_restart_number | The number of job restart. | Flink JobManager | +| Job RestartingTime | min | meter_flink_job_restartingTime | The job restarting Time. | Flink JobManager | +| Job CancellingTime | min | meter_flink_job_cancellingTime | The job cancelling time. | Flink JobManager | +| Checkpoints Total | Count | meter_flink_job_checkpoints_total | The total number of checkpoints. | Flink JobManager | +| Checkpoints Failed | Count | meter_flink_job_checkpoints_failed | The number of failed checkpoints. | Flink JobManager | +| Checkpoints Completed | Count | meter_flink_job_checkpoints_completed | The number of completed checkpoints. | Flink JobManager | +| Checkpoints InProgress | Count | meter_flink_job_checkpoints_inProgress | The number of inProgress checkpoints. | Flink JobManager | +| CurrentEmitEventTimeLag | ms | meter_flink_job_currentEmitEventTimeLag | The latency between a data record's event time and its emission time from the source. | Flink TaskManager | +| NumRecordsIn | Count | meter_flink_job_numRecordsIn | The total number of records this operator/task has received. | Flink TaskManager | +| NumRecordsOut | Count | meter_flink_job_numRecordsOut | The total number of records this operator/task has emitted. | Flink TaskManager | +| NumBytesInPerSecond | Bytes/s | meter_flink_job_numBytesInPerSecond | The number of bytes this task received per second. | Flink TaskManager | +| NumBytesOutPerSecond | Bytes/s | meter_flink_job_numBytesOutPerSecond | The number of bytes this task emits per second. | Flink TaskManager | +| LastCheckpointSize | Bytes | meter_flink_job_lastCheckpointSize | The checkPointed size of the last checkpoint (in bytes), this metric could be different from lastCheckpointFullSize if incremental checkpoint or changelog is enabled. | Flink JobManager | +| LastCheckpointDuration | ms | meter_flink_job_lastCheckpointDuration | The time it took to complete the last checkpoint. | Flink JobManager | + +## Customizations + +You can customize your own metrics/expression/dashboard panel. +The metrics definition and expression rules are found +in `otel-rules/flink/flink-jobManager.yaml, otel-rules/flink/flink-taskManager.yaml, otel-rules/flink/flink-job.yaml`. +The Flink dashboard panel configurations are found in `ui-initialized-templates/flink`. \ No newline at end of file diff --git a/docs/en/setup/backend/opentelemetry-receiver.md b/docs/en/setup/backend/opentelemetry-receiver.md index 9f4287a96e1b..6e60c866b2e0 100644 --- a/docs/en/setup/backend/opentelemetry-receiver.md +++ b/docs/en/setup/backend/opentelemetry-receiver.md @@ -60,4 +60,7 @@ for identification of the metric data. | Metrics of ClickHouse | otel-rules/clickhouse/clickhouse-service.yaml | ClickHouse(embedded prometheus endpoint) -> OpenTelemetry Collector -- OTLP exporter --> SkyWalking OAP Server | | Metrics of RocketMQ | otel-rules/rocketmq/rocketmq-cluster.yaml | rocketmq-exporter -> OpenTelemetry Collector -- OTLP exporter --> SkyWalking OAP Server | | Metrics of RocketMQ | otel-rules/rocketmq/rocketmq-broker.yaml | rocketmq-exporter -> OpenTelemetry Collector -- OTLP exporter --> SkyWalking OAP Server | -| Metrics of RocketMQ | otel-rules/rocketmq/rocketmq-topic.yaml | rocketmq-exporter -> OpenTelemetry Collector -- OTLP exporter --> SkyWalking OAP Server | \ No newline at end of file +| Metrics of RocketMQ | otel-rules/rocketmq/rocketmq-topic.yaml | rocketmq-exporter -> OpenTelemetry Collector -- OTLP exporter --> SkyWalking OAP Server | +| Metrics of Flink | otel-rules/flink/flink-jobManager.yaml | flink jobManager -> OpenTelemetry Collector -- OTLP exporter --> SkyWalking OAP Server | +| Metrics of Flink | otel-rules/flink/flink-taskManager.yaml | flink taskManager -> OpenTelemetry Collector -- OTLP exporter --> SkyWalking OAP Server | +| Metrics of Flink | otel-rules/flink/flink-job.yaml | flink jobManager & flink taskManager-> OpenTelemetry Collector -- OTLP exporter --> SkyWalking OAP Server | diff --git a/docs/en/swip/SWIP-9.md b/docs/en/swip/SWIP-9.md index 53766e17a1fd..558602369065 100644 --- a/docs/en/swip/SWIP-9.md +++ b/docs/en/swip/SWIP-9.md @@ -12,27 +12,27 @@ Provide cluster, instance, and endpoint dimensions monitoring. ### Flink Cluster Supported Metrics -| Monitoring Panel | Unit | Metric Name | Description | Data Source | -|-------------------------------|-------|-------------------------------------------------------|---------------------------------------------------------------------------------------------------|------------------| -| Running Jobs | Count | meter_flink_jobManager_running_job_number | The number of running jobs. | Flink JobManager | -| TaskManagers | Count | meter_flink_jobManager_taskManagers_registered_number | The number of taskManagers. | Flink JobManager | -| JVM CPU Load | % | meter_flink_jobManager_jvm_cpu_load | The number of the jobManager JVM CPU load. | Flink JobManager | -| JVM thread count | Count | meter_flink_jobManager_jvm_thread_count | The total number of the jobManager JVM threads. | Flink JobManager | -| JVM Memory Heap Used | MB | meter_flink_jobManager_jvm_memory_heap_used | The amount of the jobManager JVM memory heap used. | Flink JobManager | -| JVM Memory NonHeap Used | MB | meter_flink_jobManager_jvm_memory_NonHeap_used | The amount of the jobManager JVM nonHeap memory used. | Flink JobManager | -| Task Managers Slots Total | Count | meter_flink_jobManager_taskManagers_slots_total | The number of total slots. | Flink JobManager | -| Task Managers Slots Available | Count | meter_flink_jobManager_taskManagers_slots_available | The number of available slots. | Flink JobManager | -| JVM CPU Time | ms | meter_flink_jobManager_jvm_cpu_time | The jobManager CPU time used by the JVM. | Flink JobManager | -| JVM Memory Heap Available | MB | meter_flink_jobManager_jvm_memory_heap_available | The amount of the jobManager available JVM memory Heap. | Flink JobManager | -| JVM Memory NoHeap Available | MB | meter_flink_jobManager_jvm_memory_nonHeap_available | The amount of the jobManager available JVM memory noHeap. | Flink JobManager | -| JVM Memory Metaspace Used | MB | meter_flink_jobManager_jvm_memory_metaspace_used | The amount of the jobManager Used JVM metaspace memory. | Flink JobManager | -| JVM Metaspace Available | MB | meter_flink_jobManager_jvm_memory_metaspace_available | The amount of the jobManager available JVM Metaspace Memory. | Flink JobManager | -| JVM G1 Young Generation Count | Count | meter_flink_jobManager_jvm_g1_young_generation_count | The number of the jobManager JVM g1 young generation count. | Flink JobManager | -| JVM G1 Old Generation Count | Count | meter_flink_jobManager_jvm_g1_old_generation_count | The number of the jobManager JVM g1 old generation count. | Flink JobManager | -| JVM G1 Young Generation Time | Count | meter_flink_jobManager_jvm_g1_young_generation_time | The time of the jobManager JVM g1 young generation. | Flink JobManager | -| JVM G1 Old Generation Time | ms | meter_flink_jobManager_jvm_g1_old_generation_time | The time of JVM g1 old generation. | Flink JobManager | -| JVM G1 Old Generation Count | Count | meter_flink_jobManager_jvm_all_garbageCollector_count | The number of the jobManager JVM all garbageCollector count. | Flink JobManager | -| JVM All GarbageCollector Time | ms | meter_flink_jobManager_jvm_all_garbageCollector_time | The time spent performing garbage collection for the given (or all) collector for the jobManager. | Flink JobManager | +| Monitoring Panel | Unit | Metric Name | Description | Data Source | +|-------------------------------|-------|-------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------|------------------| +| Running Jobs | Count | meter_flink_jobManager_running_job_number | The number of running jobs. | Flink JobManager | +| TaskManagers | Count | meter_flink_jobManager_taskManagers_registered_number | The number of taskManagers. | Flink JobManager | +| JVM CPU Load | % | meter_flink_jobManager_jvm_cpu_load | The number of the jobManager JVM CPU load. | Flink JobManager | +| JVM thread count | Count | meter_flink_jobManager_jvm_thread_count | The total number of the jobManager JVM live threads. | Flink JobManager | +| JVM Memory Heap Used | MB | meter_flink_jobManager_jvm_memory_heap_used | The amount of the jobManager JVM memory heap used. | Flink JobManager | +| JVM Memory NonHeap Used | MB | meter_flink_jobManager_jvm_memory_NonHeap_used | The amount of the jobManager JVM nonHeap memory used. | Flink JobManager | +| Task Managers Slots Total | Count | meter_flink_jobManager_taskManagers_slots_total | The number of total slots. | Flink JobManager | +| Task Managers Slots Available | Count | meter_flink_jobManager_taskManagers_slots_available | The number of available slots. | Flink JobManager | +| JVM CPU Time | ms | meter_flink_jobManager_jvm_cpu_time | The jobManager CPU time used by the JVM increase per minute. | Flink JobManager | +| JVM Memory Heap Available | MB | meter_flink_jobManager_jvm_memory_heap_available | The amount of the jobManager available JVM memory Heap. | Flink JobManager | +| JVM Memory NoHeap Available | MB | meter_flink_jobManager_jvm_memory_nonHeap_available | The amount of the jobManager available JVM memory noHeap. | Flink JobManager | +| JVM Memory Metaspace Used | MB | meter_flink_jobManager_jvm_memory_metaspace_used | The amount of the jobManager Used JVM metaspace memory. | Flink JobManager | +| JVM Metaspace Available | MB | meter_flink_jobManager_jvm_memory_metaspace_available | The amount of the jobManager available JVM Metaspace Memory. | Flink JobManager | +| JVM G1 Young Generation Count | Count | meter_flink_jobManager_jvm_g1_young_generation_count | The incremental number of the jobManager JVM G1 young generation count per minute. | Flink JobManager | +| JVM G1 Old Generation Count | Count | meter_flink_jobManager_jvm_g1_old_generation_count | The incremental number of the jobManager JVM G1 old generation count per minute. | Flink JobManager | +| JVM G1 Young Generation Time | Count | meter_flink_jobManager_jvm_g1_young_generation_time | The incremental time of the jobManager JVM G1 young generation per minute. | Flink JobManager | +| JVM G1 Old Generation Time | ms | meter_flink_jobManager_jvm_g1_old_generation_time | The incremental time of JVM G1 old generation increase per minute. | Flink JobManager | +| JVM G1 Old Generation Count | Count | meter_flink_jobManager_jvm_all_garbageCollector_count | The incremental number of the jobManager JVM all garbageCollector count per minute. | Flink JobManager | +| JVM All GarbageCollector Time | ms | meter_flink_jobManager_jvm_all_garbageCollector_time | The incremental time spent performing garbage collection for the given (or all) collector for the jobManager per minute. | Flink JobManager | ### Flink taskManager Supported Metrics @@ -40,16 +40,16 @@ Provide cluster, instance, and endpoint dimensions monitoring. | Monitoring Panel | Unit | Metric Name | Description | Data Source | |----------------------------------|---------|----------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------| | JVM CPU Load | % | meter_flink_taskManager_jvm_cpu_load | The number of the JVM CPU load. | Flink TaskManager | -| JVM Thread Count | Count | meter_flink_taskManager_jvm_thread_count | The total number of JVM threads. | Flink TaskManager | +| JVM Thread Count | Count | meter_flink_taskManager_jvm_thread_count | The total number of JVM live threads. | Flink TaskManager | | JVM Memory Heap Used | MB | meter_flink_taskManager_jvm_memory_heap_used | The amount of JVM memory heap used. | Flink TaskManager | | JVM Memory NonHeap Used | MB | meter_flink_taskManager_jvm_memory_nonHeap_used | The amount of JVM nonHeap memory used. | Flink TaskManager | -| JVM CPU Time | ms | meter_flink_taskManager_jvm_cpu_time | The CPU time used by the JVM. | Flink TaskManager | +| JVM CPU Time | ms | meter_flink_taskManager_jvm_cpu_time | The CPU time used by the JVM increase per minute. | Flink TaskManager | | JVM Memory Heap Available | MB | meter_flink_taskManager_jvm_memory_heap_available | The amount of available JVM memory Heap. | Flink TaskManager | | JVM Memory NonHeap Available | MB | meter_flink_taskManager_jvm_memory_nonHeap_available | The amount of available JVM memory nonHeap. | Flink TaskManager | | JVM Memory Metaspace Used | MB | meter_flink_taskManager_jvm_memory_metaspace_used | The amount of Used JVM metaspace memory. | Flink TaskManager | | JVM Metaspace Available | MB | meter_flink_taskManager_jvm_memory_metaspace_available | The amount of Available JVM Metaspace Memory. | Flink TaskManager | -| NumRecordsIn | Count | meter_flink_taskManager_numRecordsIn | The total number of records this task has received. | Flink TaskManager | -| NumRecordsOut | Count | meter_flink_taskManager_numRecordsOut | The total number of records this task has emitted. | Flink TaskManager | +| NumRecordsIn | Count | meter_flink_taskManager_numRecordsIn | The incremental number of records this task has received per minute.. | Flink TaskManager | +| NumRecordsOut | Count | meter_flink_taskManager_numRecordsOut | The incremental number of records this task has emitted per minute. | Flink TaskManager | | NumBytesInPerSecond | Bytes/s | meter_flink_taskManager_numBytesInPerSecond | The number of bytes received per second. | Flink TaskManager | | NumBytesOutPerSecond | Bytes/s | meter_flink_taskManager_numBytesOutPerSecond | The number of bytes this task emits per second. | Flink TaskManager | | Netty UsedMemory | MB | meter_flink_taskManager_netty_usedMemory | The amount of used netty memory. | Flink TaskManager | diff --git a/docs/menu.yml b/docs/menu.yml index 7479b6fa8206..80b061fd3656 100644 --- a/docs/menu.yml +++ b/docs/menu.yml @@ -152,6 +152,10 @@ catalog: path: "/en/setup/backend/dashboards-so11y-java-agent" - name: "SkyWalking Go Agent self telemetry" path: "/en/setup/backend/dashboards-so11y-go-agent" + - name: "Data Processing Engine" + catalog: + - name: "Flink" + path: "/en/setup/backend/backend-flink-monitoring.md" - name: "Configuration Vocabulary" path: "/en/setup/backend/configuration-vocabulary" - name: "Advanced Setup" diff --git a/oap-server/server-core/src/main/java/org/apache/skywalking/oap/server/core/analysis/Layer.java b/oap-server/server-core/src/main/java/org/apache/skywalking/oap/server/core/analysis/Layer.java index ccf46a9ec114..6e7380d409a5 100644 --- a/oap-server/server-core/src/main/java/org/apache/skywalking/oap/server/core/analysis/Layer.java +++ b/oap-server/server-core/src/main/java/org/apache/skywalking/oap/server/core/analysis/Layer.java @@ -251,7 +251,12 @@ public enum Layer { * The self observability of SkyWalking Go Agent, * which provides the abilities to measure the tracing performance and error statistics of plugins. */ - SO11Y_GO_AGENT(41, true); + SO11Y_GO_AGENT(41, true), + + /** + * Apache Flink is a framework and distributed processing engine for stateful computations over unbounded and bounded data streams + */ + FLINK(42, true); private final int value; /** diff --git a/oap-server/server-core/src/main/java/org/apache/skywalking/oap/server/core/management/ui/template/UITemplateInitializer.java b/oap-server/server-core/src/main/java/org/apache/skywalking/oap/server/core/management/ui/template/UITemplateInitializer.java index 706146a96155..c859c0de2133 100644 --- a/oap-server/server-core/src/main/java/org/apache/skywalking/oap/server/core/management/ui/template/UITemplateInitializer.java +++ b/oap-server/server-core/src/main/java/org/apache/skywalking/oap/server/core/management/ui/template/UITemplateInitializer.java @@ -79,6 +79,7 @@ public class UITemplateInitializer { Layer.SO11Y_JAVA_AGENT.name(), Layer.KONG.name(), Layer.SO11Y_GO_AGENT.name(), + Layer.FLINK.name(), "custom" }; private final UITemplateManagementService uiTemplateManagementService; diff --git a/oap-server/server-starter/src/main/resources/application.yml b/oap-server/server-starter/src/main/resources/application.yml index 925c268a9c4a..3db4859c80b2 100644 --- a/oap-server/server-starter/src/main/resources/application.yml +++ b/oap-server/server-starter/src/main/resources/application.yml @@ -370,7 +370,7 @@ receiver-otel: selector: ${SW_OTEL_RECEIVER:default} default: enabledHandlers: ${SW_OTEL_RECEIVER_ENABLED_HANDLERS:"otlp-metrics,otlp-logs"} - enabledOtelMetricsRules: ${SW_OTEL_RECEIVER_ENABLED_OTEL_METRICS_RULES:"apisix,nginx/*,k8s/*,istio-controlplane,vm,mysql/*,postgresql/*,oap,aws-eks/*,windows,aws-s3/*,aws-dynamodb/*,aws-gateway/*,redis/*,elasticsearch/*,rabbitmq/*,mongodb/*,kafka/*,pulsar/*,bookkeeper/*,rocketmq/*,clickhouse/*,activemq/*,kong/*"} + enabledOtelMetricsRules: ${SW_OTEL_RECEIVER_ENABLED_OTEL_METRICS_RULES:"apisix,nginx/*,k8s/*,istio-controlplane,vm,mysql/*,postgresql/*,oap,aws-eks/*,windows,aws-s3/*,aws-dynamodb/*,aws-gateway/*,redis/*,elasticsearch/*,rabbitmq/*,mongodb/*,kafka/*,pulsar/*,bookkeeper/*,rocketmq/*,clickhouse/*,activemq/*,kong/*,flink/*"} receiver-zipkin: selector: ${SW_RECEIVER_ZIPKIN:-} diff --git a/oap-server/server-starter/src/main/resources/otel-rules/flink/flink-job.yaml b/oap-server/server-starter/src/main/resources/otel-rules/flink/flink-job.yaml new file mode 100644 index 000000000000..b69d321b7a3e --- /dev/null +++ b/oap-server/server-starter/src/main/resources/otel-rules/flink/flink-job.yaml @@ -0,0 +1,72 @@ +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +# This will parse a textual representation of a duration. The formats +# accepted are based on the ISO-8601 duration format {@code PnDTnHnMn.nS} +# with days considered to be exactly 24 hours. +#
+# Examples: +#
+# "PT20.345S" -- parses as "20.345 seconds" +# "PT15M" -- parses as "15 minutes" (where a minute is 60 seconds) +# "PT10H" -- parses as "10 hours" (where an hour is 3600 seconds) +# "P2D" -- parses as "2 days" (where a day is 24 hours or 86400 seconds) +# "P2DT3H4M" -- parses as "2 days, 3 hours and 4 minutes" +# "P-6H3M" -- parses as "-6 hours and +3 minutes" +# "-P6H3M" -- parses as "-6 hours and -3 minutes" +# "-P-6H+3M" -- parses as "+6 hours and -3 minutes" +#+filter: "{ tags -> tags.job_name == 'flink-jobManager-monitoring' || tags.job_name == 'flink-taskManager-monitoring' }" # The OpenTelemetry job name +expSuffix: tag({tags -> tags.cluster = 'flink::' + tags.cluster}).endpoint(['cluster'], ['flink_job_name'], Layer.FLINK) +metricPrefix: meter_flink_job + +metricsRules: + + - name: restart_number + exp: flink_jobmanager_job_numRestarts.sum(['cluster','flink_job_name']) + - name: runningTime + exp: flink_jobmanager_job_runningTime.sum(['cluster','flink_job_name']) + - name: restartingTime + exp: flink_jobmanager_job_restartingTime.sum(['cluster','flink_job_name']) + - name: cancellingTime + exp: flink_jobmanager_job_cancellingTime.sum(['cluster','flink_job_name']) + +# checkpoints + - name: checkpoints_total + exp: flink_jobmanager_job_totalNumberOfCheckpoints.sum(['cluster','flink_job_name']) + - name: checkpoints_failed + exp: flink_jobmanager_job_numberOfFailedCheckpoints.sum(['cluster','flink_job_name']) + - name: checkpoints_completed + exp: flink_jobmanager_job_numberOfCompletedCheckpoints.sum(['cluster','flink_job_name']) + - name: checkpoints_inProgress + exp: flink_jobmanager_job_numberOfInProgressCheckpoints.sum(['cluster','flink_job_name']) + - name: lastCheckpointSize + exp: flink_jobmanager_job_lastCheckpointSize.sum(['cluster','flink_job_name']) + - name: lastCheckpointDuration + exp: flink_jobmanager_job_lastCheckpointDuration.sum(['cluster','flink_job_name']) + + - name: currentEmitEventTimeLag + exp: flink_taskmanager_job_task_operator_currentEmitEventTimeLag.sum(['cluster','flink_job_name','operator_name']) + + - name: numRecordsIn + exp: flink_taskmanager_job_task_operator_numRecordsIn.sum(['cluster','flink_job_name','operator_name']) + - name: numRecordsOut + exp: flink_taskmanager_job_task_operator_numRecordsOut.sum(['cluster','flink_job_name','operator_name']) + - name: numBytesInPerSecond + exp: flink_taskmanager_job_task_operator_numBytesInPerSecond.sum(['cluster','flink_job_name','operator_name']) + - name: numBytesOutPerSecond + exp: flink_taskmanager_job_task_operator_numBytesOutPerSecond.sum(['cluster','flink_job_name','operator_name']) + + diff --git a/oap-server/server-starter/src/main/resources/otel-rules/flink/flink-jobManager.yaml b/oap-server/server-starter/src/main/resources/otel-rules/flink/flink-jobManager.yaml new file mode 100644 index 000000000000..7a01653ed930 --- /dev/null +++ b/oap-server/server-starter/src/main/resources/otel-rules/flink/flink-jobManager.yaml @@ -0,0 +1,80 @@ +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +# This will parse a textual representation of a duration. The formats +# accepted are based on the ISO-8601 duration format {@code PnDTnHnMn.nS} +# with days considered to be exactly 24 hours. +#
+# Examples: +#
+# "PT20.345S" -- parses as "20.345 seconds" +# "PT15M" -- parses as "15 minutes" (where a minute is 60 seconds) +# "PT10H" -- parses as "10 hours" (where an hour is 3600 seconds) +# "P2D" -- parses as "2 days" (where a day is 24 hours or 86400 seconds) +# "P2DT3H4M" -- parses as "2 days, 3 hours and 4 minutes" +# "P-6H3M" -- parses as "-6 hours and +3 minutes" +# "-P6H3M" -- parses as "-6 hours and -3 minutes" +# "-P-6H+3M" -- parses as "+6 hours and -3 minutes" +#+filter: "{ tags -> tags.job_name == 'flink-jobManager-monitoring' }" # The OpenTelemetry job name +expSuffix: tag({tags -> tags.cluster = 'flink::' + tags.cluster}).service(['cluster'], Layer.FLINK) +metricPrefix: meter_flink_jobManager +metricsRules: + + # job + - name: running_job_number + exp: flink_jobmanager_numRunningJobs.sum(['cluster']) + + # task + - name: taskManagers_registered_number + exp: flink_jobmanager_numRegisteredTaskManagers.sum(['cluster']) + - name: taskManagers_slots_total + exp: flink_jobmanager_taskSlotsTotal.sum(['cluster']) + - name: taskManagers_slots_available + exp: flink_jobmanager_taskSlotsAvailable.sum(['cluster']) + + #jvm + - name: jvm_cpu_load + exp: flink_jobmanager_Status_JVM_CPU_Load.sum(['cluster','jobManager_node'])*1000 + - name: jvm_cpu_time + exp: flink_jobmanager_Status_JVM_CPU_Time.sum(['cluster','jobManager_node']).increase('PT1M') + - name: jvm_memory_heap_used + exp: flink_jobmanager_Status_JVM_Memory_Heap_Used.sum(['cluster','jobManager_node']) + - name: jvm_memory_heap_available + exp: flink_jobmanager_Status_JVM_Memory_Heap_Max.sum(['cluster','jobManager_node'])-flink_jobmanager_Status_JVM_Memory_Heap_Used.sum(['cluster','jobManager_node']) + - name: jvm_memory_nonHeap_used + exp: flink_jobmanager_Status_JVM_Memory_NonHeap_Used.sum(['cluster','jobManager_node']) + - name: jvm_memory_nonHeap_available + exp: flink_jobmanager_Status_JVM_Memory_NonHeap_Max.sum(['cluster','jobManager_node'])-flink_jobmanager_Status_JVM_Memory_NonHeap_Used.sum(['cluster','jobManager_node']) + - name: jvm_thread_count + exp: flink_jobmanager_Status_JVM_Threads_Count.sum(['cluster','jobManager_node']) + - name: jvm_memory_metaspace_used + exp: flink_jobmanager_Status_JVM_Memory_Metaspace_Used.sum(['cluster','jobManager_node']) + - name: jvm_memory_metaspace_available + exp: flink_jobmanager_Status_JVM_Memory_Metaspace_Max.sum(['cluster','jobManager_node'])-flink_jobmanager_Status_JVM_Memory_Metaspace_Used.sum(['cluster','jobManager_node']) + + - name: jvm_g1_young_generation_count + exp: flink_jobmanager_Status_JVM_GarbageCollector_G1_Young_Generation_Count.sum(['cluster','jobManager_node']).increase('PT1M') + - name: jvm_g1_old_generation_count + exp: flink_jobmanager_Status_JVM_GarbageCollector_G1_Old_Generation_Count.sum(['cluster','jobManager_node']).increase('PT1M') + - name: jvm_g1_young_generation_time + exp: flink_jobmanager_Status_JVM_GarbageCollector_G1_Young_Generation_Time.sum(['cluster','jobManager_node']).increase('PT1M') + - name: jvm_g1_old_generation_time + exp: flink_jobmanager_Status_JVM_GarbageCollector_G1_Old_Generation_Time.sum(['cluster','jobManager_node']).increase('PT1M') + + - name: jvm_all_garbageCollector_count + exp: flink_jobmanager_Status_JVM_GarbageCollector_All_Count.sum(['cluster','jobManager_node']).increase('PT1M') + - name: jvm_all_garbageCollector_time + exp: flink_jobmanager_Status_JVM_GarbageCollector_All_Time.sum(['cluster','jobManager_node']).increase('PT1M') diff --git a/oap-server/server-starter/src/main/resources/otel-rules/flink/flink-taskManager.yaml b/oap-server/server-starter/src/main/resources/otel-rules/flink/flink-taskManager.yaml new file mode 100644 index 000000000000..900594a2b1a7 --- /dev/null +++ b/oap-server/server-starter/src/main/resources/otel-rules/flink/flink-taskManager.yaml @@ -0,0 +1,89 @@ +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +# This will parse a textual representation of a duration. The formats +# accepted are based on the ISO-8601 duration format {@code PnDTnHnMn.nS} +# with days considered to be exactly 24 hours. +#
+# Examples: +#
+# "PT20.345S" -- parses as "20.345 seconds" +# "PT15M" -- parses as "15 minutes" (where a minute is 60 seconds) +# "PT10H" -- parses as "10 hours" (where an hour is 3600 seconds) +# "P2D" -- parses as "2 days" (where a day is 24 hours or 86400 seconds) +# "P2DT3H4M" -- parses as "2 days, 3 hours and 4 minutes" +# "P-6H3M" -- parses as "-6 hours and +3 minutes" +# "-P6H3M" -- parses as "-6 hours and -3 minutes" +# "-P-6H+3M" -- parses as "+6 hours and -3 minutes" +#+filter: "{ tags -> tags.job_name == 'flink-taskManager-monitoring' }" # The OpenTelemetry job name +expSuffix: tag({tags -> tags.cluster = 'flink::' + tags.cluster}).instance(['cluster'], ['taskManager_node'], Layer.FLINK) +metricPrefix: meter_flink_taskManager +metricsRules: + + # jvm + - name: jvm_cpu_load + exp: flink_taskmanager_Status_JVM_CPU_Load.sum(['cluster','taskManager_node'])*1000 + - name: jvm_cpu_time + exp: flink_taskmanager_Status_JVM_CPU_Time.sum(['cluster','taskManager_node']).increase('PT1M') + - name: jvm_memory_heap_used + exp: flink_taskmanager_Status_JVM_Memory_Heap_Used.sum(['cluster','taskManager_node']) + - name: jvm_memory_heap_available + exp: flink_taskmanager_Status_JVM_Memory_Heap_Max.sum(['cluster','taskManager_node'])-flink_taskmanager_Status_JVM_Memory_Heap_Used.sum(['cluster','taskManager_node']) + - name: jvm_thread_count + exp: flink_taskmanager_Status_JVM_Threads_Count.sum(['cluster','taskManager_node']) + - name: jvm_memory_metaspace_available + exp: flink_taskmanager_Status_JVM_Memory_Metaspace_Max.sum(['cluster','taskManager_node'])-flink_taskmanager_Status_JVM_Memory_Metaspace_Used.sum(['cluster','taskManager_node']) + - name: jvm_memory_metaspace_used + exp: flink_taskmanager_Status_JVM_Memory_Metaspace_Used.sum(['cluster','taskManager_node']) + + - name: jvm_memory_nonHeap_used + exp: flink_taskmanager_Status_JVM_Memory_NonHeap_Used.sum(['cluster','taskManager_node']) + - name: jvm_memory_nonHeap_available + exp: flink_taskmanager_Status_JVM_Memory_NonHeap_Max.sum(['cluster','taskManager_node'])-flink_taskmanager_Status_JVM_Memory_NonHeap_Used.sum(['cluster','taskManager_node']) + +# # records + - name: numRecordsIn + exp: flink_taskmanager_job_task_numRecordsIn.sum(['cluster','taskManager_node','flink_job_name','task_name']).increase('PT1M') + - name: numRecordsOut + exp: flink_taskmanager_job_task_numRecordsOut.sum(['cluster','taskManager_node','flink_job_name','task_name']).increase('PT1M') + - name: numBytesInPerSecond + exp: flink_taskmanager_job_task_numBytesInPerSecond.sum(['cluster','taskManager_node','flink_job_name','task_name']) + - name: numBytesOutPerSecond + exp: flink_taskmanager_job_task_numBytesOutPerSecond.sum(['cluster','taskManager_node','flink_job_name','task_name']) +# +# # network + - name: netty_usedMemory + exp: flink_taskmanager_Status_Shuffle_Netty_UsedMemory.sum(['cluster','taskManager_node']) + - name: netty_availableMemory + exp: flink_taskmanager_Status_Shuffle_Netty_AvailableMemory.sum(['cluster','taskManager_node']) + - name: inPoolUsage + exp: flink_taskmanager_job_task_Shuffle_Netty_Input_Buffers_inPoolUsage.sum(['cluster','taskManager_node','flink_job_name','task_name'])*100 + - name: outPoolUsage + exp: flink_taskmanager_job_task_Shuffle_Netty_Output_Buffers_outPoolUsage.sum(['cluster','taskManager_node','flink_job_name','task_name'])*100 + + # backPressured + - name: isBackPressured + exp: flink_taskmanager_job_task_isBackPressured.sum(['cluster','taskManager_node','flink_job_name','task_name']) + - name: idleTimeMsPerSecond + exp: flink_taskmanager_job_task_idleTimeMsPerSecond.sum(['cluster','taskManager_node','flink_job_name','task_name']) + - name: busyTimeMsPerSecond + exp: flink_taskmanager_job_task_busyTimeMsPerSecond.sum(['cluster','taskManager_node','flink_job_name','task_name']) + - name: softBackPressuredTimeMsPerSecond + exp: flink_taskmanager_job_task_softBackPressuredTimeMsPerSecond.sum(['cluster','taskManager_node','flink_job_name','task_name']) + - name: hardBackPressuredTimeMsPerSecond + exp: flink_taskmanager_job_task_hardBackPressuredTimeMsPerSecond.sum(['cluster','taskManager_node','flink_job_name','task_name']) + + diff --git a/oap-server/server-starter/src/main/resources/ui-initialized-templates/flink/flink-job.json b/oap-server/server-starter/src/main/resources/ui-initialized-templates/flink/flink-job.json new file mode 100644 index 000000000000..53f2571ba3f4 --- /dev/null +++ b/oap-server/server-starter/src/main/resources/ui-initialized-templates/flink/flink-job.json @@ -0,0 +1,410 @@ +[ + { + "id": "Flink-Job", + "configuration": { + "children": [ + { + "x": 0, + "y": 0, + "w": 4, + "h": 10, + "i": "2", + "type": "Widget", + "expressions": [ + "latest(meter_flink_job_runningTime)/1000/60" + ], + "widget": { + "name": "Job RunningTime", + "title": "Job RunningTime", + "tips": "The job running time" + }, + "metricConfig": [ + { + "unit": "min" + } + ], + "graph": { + "type": "Card", + "fontSize": 20, + "textAlign": "center", + "showUnit": true + }, + "id": "2", + "moved": false + }, + { + "x": 4, + "y": 10, + "w": 4, + "h": 10, + "i": "3", + "type": "Widget", + "expressions": [ + "latest(meter_flink_job_restart_number)" + ], + "widget": { + "name": "Job Restart Number", + "title": "Job Restart Number", + "tips": "The number of job restart" + }, + "graph": { + "type": "Card", + "fontSize": 20, + "textAlign": "center", + "showUnit": true + }, + "id": "3", + "moved": false + }, + { + "x": 0, + "y": 10, + "w": 4, + "h": 10, + "i": "4", + "type": "Widget", + "id": "4", + "moved": false, + "graph": { + "type": "Card", + "fontSize": 20, + "textAlign": "center", + "showUnit": true + }, + "expressions": [ + "latest(meter_flink_job_restartingTime)/1000/60" + ], + "metricConfig": [ + { + "unit": "min" + } + ], + "widget": { + "name": "Job RestartingTime", + "title": "Job RestartingTime", + "tips": "The job restarting Time" + } + }, + { + "x": 4, + "y": 0, + "w": 4, + "h": 10, + "i": "5", + "type": "Widget", + "id": "5", + "moved": false, + "expressions": [ + "latest(meter_flink_job_cancellingTime)/1000/60" + ], + "metricConfig": [ + { + "unit": "min" + } + ], + "graph": { + "type": "Card", + "fontSize": 20, + "textAlign": "center", + "showUnit": true + }, + "widget": { + "name": "Job CancellingTime", + "title": "Job CancellingTime", + "tips": "The job cancelling time" + } + }, + { + "x": 0, + "y": 36, + "w": 8, + "h": 16, + "i": "6", + "type": "Widget", + "id": "6", + "moved": false, + "graph": { + "type": "Line", + "step": false, + "smooth": false, + "showSymbol": true, + "showXAxis": true, + "showYAxis": true + }, + "expressions": [ + "meter_flink_job_checkpoints_total" + ], + "widget": { + "name": "Job Checkpoints Total", + "title": "Job Checkpoints Total", + "tips": "The total number of checkpoints" + } + }, + { + "x": 0, + "y": 20, + "w": 8, + "h": 16, + "i": "7", + "type": "Widget", + "id": "7", + "moved": false, + "graph": { + "type": "Line", + "step": false, + "smooth": false, + "showSymbol": true, + "showXAxis": true, + "showYAxis": true + }, + "expressions": [ + "meter_flink_job_checkpoints_failed" + ], + "widget": { + "name": "Checkpoints Failed", + "title": "Checkpoints Failed", + "tips": "The number of failed checkpoints" + } + }, + { + "x": 8, + "y": 20, + "w": 8, + "h": 16, + "i": "8", + "type": "Widget", + "id": "8", + "moved": false, + "graph": { + "type": "Line", + "step": false, + "smooth": false, + "showSymbol": true, + "showXAxis": true, + "showYAxis": true + }, + "expressions": [ + "meter_flink_job_checkpoints_completed" + ], + "widget": { + "name": "Checkpoints Completed", + "title": "Checkpoints Completed", + "tips": "The number of completed checkpoints" + } + }, + { + "x": 16, + "y": 20, + "w": 8, + "h": 16, + "i": "9", + "type": "Widget", + "id": "9", + "moved": false, + "graph": { + "type": "Line", + "step": false, + "smooth": false, + "showSymbol": true, + "showXAxis": true, + "showYAxis": true + }, + "expressions": [ + "meter_flink_job_checkpoints_inProgress" + ], + "widget": { + "name": "Checkpoints InProgress", + "title": "Checkpoints InProgress", + "tips": "The number of inProgress checkpoints" + } + }, + { + "x": 8, + "y": 0, + "w": 16, + "h": 20, + "i": "10", + "type": "Widget", + "id": "10", + "moved": false, + "graph": { + "type": "Line", + "step": false, + "smooth": false, + "showSymbol": true, + "showXAxis": true, + "showYAxis": true + }, + "expressions": [ + "meter_flink_job_currentEmitEventTimeLag" + ], + "widget": { + "name": "CurrentEmitEventTimeLag", + "title": "CurrentEmitEventTimeLag(ms)", + "tips": "The latency between a data record's event time and its emission time from the source." + } + }, + { + "x": 0, + "y": 52, + "w": 12, + "h": 18, + "i": "11", + "type": "Widget", + "id": "11", + "moved": false, + "expressions": [ + "meter_flink_job_numRecordsIn" + ], + "graph": { + "type": "Line", + "step": false, + "smooth": false, + "showSymbol": true, + "showXAxis": true, + "showYAxis": true + }, + "widget": { + "name": "NumRecordsIn", + "title": "NumRecordsIn", + "tips": "The total number of records this operator/task has received." + } + }, + { + "x": 12, + "y": 52, + "w": 12, + "h": 18, + "i": "12", + "type": "Widget", + "id": "12", + "moved": false, + "graph": { + "type": "Line", + "step": false, + "smooth": false, + "showSymbol": true, + "showXAxis": true, + "showYAxis": true + }, + "expressions": [ + "meter_flink_job_numRecordsOut" + ], + "widget": { + "name": "NumRecordsOut", + "title": "NumRecordsOut", + "tips": "The total number of records this operator/task has emitted." + } + }, + { + "x": 0, + "y": 70, + "w": 12, + "h": 16, + "i": "13", + "type": "Widget", + "id": "13", + "moved": false, + "expressions": [ + "meter_flink_job_numBytesInPerSecond" + ], + "graph": { + "type": "Line", + "step": false, + "smooth": false, + "showSymbol": true, + "showXAxis": true, + "showYAxis": true + }, + "widget": { + "name": "NumBytesInPerSecond", + "title": "NumBytesInPerSecond", + "tips": "The number of bytes this task received per second." + } + }, + { + "x": 12, + "y": 70, + "w": 12, + "h": 16, + "i": "14", + "type": "Widget", + "id": "14", + "moved": false, + "graph": { + "type": "Line", + "step": false, + "smooth": false, + "showSymbol": true, + "showXAxis": true, + "showYAxis": true + }, + "expressions": [ + "meter_flink_job_numBytesOutPerSecond" + ], + "widget": { + "name": "NumBytesOutPerSecond", + "title": "NumBytesOutPerSecond", + "tips": "The number of bytes this task emits per second." + } + }, + { + "x": 16, + "y": 36, + "w": 8, + "h": 16, + "i": "15", + "type": "Widget", + "id": "15", + "moved": false, + "graph": { + "type": "Line", + "step": false, + "smooth": false, + "showSymbol": true, + "showXAxis": true, + "showYAxis": true + }, + "expressions": [ + "meter_flink_job_lastCheckpointSize" + ], + "widget": { + "name": "LastCheckpointSize", + "title": "LastCheckpointSize(Bytes)", + "tips": "The checkPointed size of the last checkpoint (in bytes), this metric could be different from lastCheckpointFullSize if incremental checkpoint or changelog is enabled." + } + }, + { + "x": 8, + "y": 36, + "w": 8, + "h": 16, + "i": "16", + "type": "Widget", + "id": "16", + "moved": false, + "expressions": [ + "meter_flink_job_lastCheckpointDuration" + ], + "widget": { + "name": "LastCheckpointDuration", + "title": "LastCheckpointDuration(ms)", + "tips": "The time it took to complete the last checkpoint." + }, + "graph": { + "type": "Line", + "step": false, + "smooth": false, + "showSymbol": true, + "showXAxis": true, + "showYAxis": true + } + } + ], + "layer": "FLINK", + "entity": "Endpoint", + "name": "Flink-Job", + "id": "Flink-Job" + } + } +] diff --git a/oap-server/server-starter/src/main/resources/ui-initialized-templates/flink/flink-jobManager.json b/oap-server/server-starter/src/main/resources/ui-initialized-templates/flink/flink-jobManager.json new file mode 100644 index 000000000000..052fba205bc8 --- /dev/null +++ b/oap-server/server-starter/src/main/resources/ui-initialized-templates/flink/flink-jobManager.json @@ -0,0 +1,639 @@ +[ + { + "id": "Flink-JobManager", + "configuration":{ + "children": [ + { + "x": 0, + "y": 0, + "w": 24, + "h": 77, + "i": "0", + "type": "Tab", + "children": [ + { + "name": "Overview", + "children": [ + { + "x": 0, + "y": 0, + "w": 3, + "h": 8, + "i": "1", + "type": "Widget", + "id": "0-0-1", + "moved": false, + "expressions": [ + "latest(meter_flink_jobManager_running_job_number)" + ], + "typesOfMQE": [ + "SINGLE_VALUE" + ], + "graph": { + "type": "Card", + "fontSize": 19, + "textAlign": "center", + "showUnit": true + }, + "widget": { + "name": "Running Jobs", + "title": "Running Jobs", + "tips": "The number of running jobs" + } + }, + { + "x": 3, + "y": 8, + "w": 3, + "h": 8, + "i": "2", + "type": "Widget", + "id": "0-0-2", + "moved": false, + "expressions": [ + "latest(meter_flink_jobManager_taskManagers_registered_number)" + ], + "typesOfMQE": [ + "SINGLE_VALUE" + ], + "graph": { + "type": "Card", + "fontSize": 20, + "textAlign": "center", + "showUnit": true + }, + "widget": { + "title": "TaskManagers", + "tips": "The number of taskManagers", + "name": "TaskManagers" + } + }, + { + "x": 6, + "y": 0, + "w": 9, + "h": 16, + "i": "3", + "type": "Widget", + "id": "0-0-3", + "moved": false, + "expressions": [ + "meter_flink_jobManager_jvm_cpu_load/10" + ], + "typesOfMQE": [ + "TIME_SERIES_VALUES" + ], + "graph": { + "type": "Line", + "step": false, + "smooth": false, + "showSymbol": true, + "showXAxis": true, + "showYAxis": true + }, + "widget": { + "name": "JVM CPU Load", + "title": "JVM CPU load(%)", + "tips": "The number of the JVM CPU load" + } + }, + { + "x": 16, + "y": 71, + "w": 8, + "h": 18, + "i": "4", + "type": "Widget", + "id": "0-0-4", + "moved": false, + "expressions": [ + "meter_flink_jobManager_jvm_thread_count" + ], + "typesOfMQE": [ + "TIME_SERIES_VALUES" + ], + "graph": { + "type": "Line", + "step": false, + "smooth": false, + "showSymbol": true, + "showXAxis": true, + "showYAxis": true + }, + "widget": { + "name": "JVM Thread Count", + "title": "JVM Thread Count", + "tips": "The total number of the jobManager JVM live threads" + } + }, + { + "x": 0, + "y": 71, + "w": 8, + "h": 18, + "i": "5", + "type": "Widget", + "id": "0-0-5", + "moved": false, + "expressions": [ + "meter_flink_jobManager_jvm_memory_heap_used/1024/1024" + ], + "typesOfMQE": [ + "TIME_SERIES_VALUES" + ], + "graph": { + "type": "Area", + "opacity": 0.4, + "showXAxis": true, + "showYAxis": true + }, + "widget": { + "name": "JVM Memory Heap Used", + "title": "JVM Memory Heap Used(MB)", + "tips": "The amount of the jobManager JVM memory heap used" + } + }, + { + "x": 0, + "y": 89, + "w": 12, + "h": 21, + "i": "7", + "type": "Widget", + "id": "0-0-7", + "moved": false, + "expressions": [ + "(meter_flink_jobManager_jvm_memory_nonHeap_used)/1024/1024" + ], + "typesOfMQE": [ + "TIME_SERIES_VALUES" + ], + "graph": { + "type": "Area", + "opacity": 0.4, + "showXAxis": true, + "showYAxis": true + }, + "widget": { + "name": "JVM Memory NonHeap Used", + "title": "JVM Memory NonHeap Used(MB)", + "tips": "The amount of the jobManager JVM nonHeap memory used" + } + }, + { + "x": 0, + "y": 8, + "w": 3, + "h": 8, + "i": "8", + "type": "Widget", + "id": "0-0-8", + "moved": false, + "expressions": [ + "latest(meter_flink_jobManager_taskManagers_slots_total)" + ], + "typesOfMQE": [ + "SINGLE_VALUE" + ], + "graph": { + "type": "Card", + "fontSize": 20, + "textAlign": "center", + "showUnit": true + }, + "widget": { + "name": "Task Managers Slots Total", + "title": "Total Slots", + "tips": "The number of total slots" + } + }, + { + "x": 3, + "y": 0, + "w": 3, + "h": 8, + "i": "9", + "type": "Widget", + "id": "0-0-9", + "moved": false, + "expressions": [ + "latest(meter_flink_jobManager_taskManagers_slots_available)" + ], + "typesOfMQE": [ + "SINGLE_VALUE" + ], + "graph": { + "type": "Card", + "fontSize": 20, + "textAlign": "center", + "showUnit": true + }, + "widget": { + "name": "Task Managers Slots Available", + "tips": "The number of available slots", + "title": "Available Slots" + } + }, + { + "x": 15, + "y": 0, + "w": 9, + "h": 16, + "i": "13", + "type": "Widget", + "id": "0-0-13", + "moved": false, + "expressions": [ + "meter_flink_jobManager_jvm_cpu_time/1000/1000" + ], + "typesOfMQE": [ + "TIME_SERIES_VALUES" + ], + "graph": { + "type": "Line", + "step": false, + "smooth": false, + "showSymbol": true, + "showXAxis": true, + "showYAxis": true + }, + "widget": { + "title": "JVM CPU Time(ms) Increase (Per Minute)", + "name": "JVM CPU Time", + "tips": "The jobManager CPU time used by the JVM increase per minute" + } + }, + { + "x": 8, + "y": 71, + "w": 8, + "h": 18, + "i": "14", + "type": "Widget", + "id": "0-0-14", + "moved": false, + "metricConfig": [ + { + "unit": "MB" + } + ], + "expressions": [ + "(meter_flink_jobManager_jvm_memory_heap_available)/1024/1024" + ], + "typesOfMQE": [ + "TIME_SERIES_VALUES" + ], + "graph": { + "type": "Area", + "opacity": 0.4, + "showXAxis": true, + "showYAxis": true, + "showUnit": true + }, + "widget": { + "name": "JVM Memory Heap Available", + "title": "JVM Memory Heap Available(MB)", + "tips": "The amount of the jobManager available JVM memory Heap" + } + }, + { + "x": 12, + "y": 89, + "w": 12, + "h": 21, + "i": "15", + "type": "Widget", + "id": "0-0-15", + "moved": false, + "graph": { + "type": "Area", + "opacity": 0.4, + "showXAxis": true, + "showYAxis": true + }, + "expressions": [ + "(meter_flink_jobManager_jvm_memory_nonHeap_available)/1024/1024)" + ], + "typesOfMQE": [ + "TIME_SERIES_VALUES" + ], + "widget": { + "name": "JVM Memory NoHeap Available", + "title": "JVM Memory NoHeap Available(MB)", + "tips": "The amount of the jobManager available JVM memory noHeap" + } + }, + { + "x": 0, + "y": 110, + "w": 12, + "h": 19, + "i": "16", + "type": "Widget", + "id": "0-0-16", + "moved": false, + "expressions": [ + "meter_flink_jobManager_jvm_memory_metaspace_used/1024/1024" + ], + "typesOfMQE": [ + "TIME_SERIES_VALUES" + ], + "graph": { + "type": "Area", + "opacity": 0.4, + "showXAxis": true, + "showYAxis": true + }, + "widget": { + "name": "JVM Memory Metaspace Used", + "title": "JVM Memory Metaspace Used(MB)", + "tips": "The amount of the jobManager Used JVM metaspace memory" + } + }, + { + "x": 12, + "y": 110, + "w": 12, + "h": 19, + "i": "17", + "type": "Widget", + "id": "0-0-17", + "moved": false, + "expressions": [ + "(meter_flink_jobManager_jvm_memory_metaspace_available)/1024/1024" + ], + "typesOfMQE": [ + "UNKNOWN" + ], + "graph": { + "type": "Area", + "opacity": 0.4, + "showXAxis": true, + "showYAxis": true + }, + "widget": { + "title": "JVM Metaspace Available(MB)", + "tips": "The amount of the jobManager available JVM Metaspace Memory", + "name": "JVM Metaspace Available" + } + }, + { + "x": 0, + "y": 35, + "w": 12, + "h": 18, + "i": "18", + "type": "Widget", + "id": "0-0-18", + "moved": false, + "expressions": [ + "meter_flink_jobManager_jvm_g1_young_generation_count" + ], + "graph": { + "type": "Line", + "step": false, + "smooth": false, + "showSymbol": true, + "showXAxis": true, + "showYAxis": true + }, + "typesOfMQE": [ + "TIME_SERIES_VALUES" + ], + "widget": { + "name": "JVM G1 Young Generation Count", + "title": "JVM G1 Young Generation Count Increase (Per Minute)", + "tips": "The incremental number of the jobManager JVM G1 young generation count per minute" + } + }, + { + "x": 0, + "y": 53, + "w": 12, + "h": 18, + "i": "19", + "type": "Widget", + "id": "0-0-19", + "moved": false, + "expressions": [ + "meter_flink_jobManager_jvm_g1_old_generation_count" + ], + "typesOfMQE": [ + "TIME_SERIES_VALUES" + ], + "graph": { + "type": "Line", + "step": false, + "smooth": false, + "showSymbol": true, + "showXAxis": true, + "showYAxis": true + }, + "widget": { + "name": "JVM G1 Old Generation Count", + "title": "JVM G1 Old Generation Count Increase (Per Minute)", + "tips": "The incremental number of the jobManager JVM G1 old generation count per minute" + } + }, + { + "x": 12, + "y": 35, + "w": 12, + "h": 18, + "i": "20", + "type": "Widget", + "id": "0-0-20", + "moved": false, + "expressions": [ + "meter_flink_jobManager_jvm_g1_young_generation_time" + ], + "typesOfMQE": [ + "TIME_SERIES_VALUES" + ], + "widget": { + "name": "JVM G1 Young Generation Time", + "title": "JVM G1 Young Generation Time(ms) Increase (Per Minute)", + "tips": "The incremental time of the jobManager JVM G1 young generation per minute" + }, + "graph": { + "type": "Line", + "step": false, + "smooth": false, + "showSymbol": true, + "showXAxis": true, + "showYAxis": true + } + }, + { + "x": 12, + "y": 53, + "w": 12, + "h": 18, + "i": "21", + "type": "Widget", + "id": "0-0-21", + "moved": false, + "expressions": [ + "meter_flink_jobManager_jvm_g1_old_generation_time" + ], + "typesOfMQE": [ + "TIME_SERIES_VALUES" + ], + "graph": { + "type": "Line", + "step": false, + "smooth": false, + "showSymbol": true, + "showXAxis": true, + "showYAxis": true + }, + "widget": { + "name": "JVM G1 Old Generation Time", + "title": "JVM G1 Old Generation Time(ms) Increase (Per Minute)", + "tips": "The incremental time of the jobManager JVM G1 old generation per minute" + } + }, + { + "x": 0, + "y": 16, + "w": 12, + "h": 19, + "i": "22", + "type": "Widget", + "id": "0-0-22", + "moved": false, + "expressions": [ + "meter_flink_jobManager_jvm_all_garbageCollector_count" + ], + "typesOfMQE": [ + "UNKNOWN" + ], + "graph": { + "type": "Line", + "step": false, + "smooth": false, + "showSymbol": true, + "showXAxis": true, + "showYAxis": true + }, + "widget": { + "name": "JVM All GarbageCollector Count", + "title": "JVM All GarbageCollector Count Increase (Per Minute)", + "tips": "The number of the jobManager JVM all garbageCollector count increase per minute" + } + }, + { + "x": 12, + "y": 16, + "w": 12, + "h": 19, + "i": "23", + "type": "Widget", + "id": "0-0-23", + "moved": false, + "expressions": [ + "meter_flink_jobManager_jvm_all_garbageCollector_time" + ], + "graph": { + "type": "Line", + "step": false, + "smooth": false, + "showSymbol": true, + "showXAxis": true, + "showYAxis": true + }, + "typesOfMQE": [ + "UNKNOWN" + ], + "widget": { + "name": "JVM All GarbageCollector Time", + "title": "JVM All GarbageCollector Time(ms) Increase (Per Minute)", + "tips": "The incremental time spent performing garbage collection for the given (or all) collector for the jobManager per minute" + } + } + ] + }, + { + "name": "TaskManager", + "children": [ + { + "x": 0, + "y": 0, + "w": 24, + "h": 30, + "i": "0", + "type": "Widget", + "graph": { + "type": "InstanceList", + "dashboardName": "Flink-TaskManager", + "fontSize": 12 + }, + "id": "0-1-0", + "moved": false + } + ] + }, + { + "name": "Job", + "children": [ + { + "x": 0, + "y": 0, + "w": 15, + "h": 32, + "i": "0", + "type": "Widget", + "graph": { + "type": "EndpointList", + "dashboardName": "Flink-Job", + "fontSize": 12 + }, + "widget": { + "name": "Job List", + "title": "Job List" + }, + "expressions": [], + "id": "0-2-0", + "moved": false + }, + { + "x": 15, + "y": 0, + "w": 9, + "h": 32, + "i": "1", + "type": "Widget", + "expressions": [ + "top_n(meter_flink_job_runningTime,10,des)" + ], + "valueRelatedDashboard": "Flink-Job", + "relatedTrace": { + "refIdType": "none" + }, + "graph": { + "type": "TopList", + "color": "purple" + }, + "widget": { + "title": "Top10 runningTime job", + "tips": "Top10 runningTime job" + }, + "id": "0-2-1", + "moved": false + } + ] + } + ], + "id": "0", + "activedTabIndex": 0, + "moved": false + } + ], + "layer": "FLINK", + "entity": "Service", + "name": "Flink-JobManager", + "id": "Flink-JobManager", + "isRoot": false + } + } +] diff --git a/oap-server/server-starter/src/main/resources/ui-initialized-templates/flink/flink-root.json b/oap-server/server-starter/src/main/resources/ui-initialized-templates/flink/flink-root.json new file mode 100644 index 000000000000..34d157c4c31b --- /dev/null +++ b/oap-server/server-starter/src/main/resources/ui-initialized-templates/flink/flink-root.json @@ -0,0 +1,46 @@ +[ + { + "id": "Flink-Root", + "configuration": { + "children": [ + { + "x": 0, + "y": 3, + "w": 24, + "h": 29, + "i": "0", + "type": "Widget", + "graph": { + "type": "ServiceList", + "dashboardName": "Flink-JobManager", + "fontSize": 12, + "showXAxis": false, + "showYAxis": false, + "showGroup": true + } + }, + { + "x": 0, + "y": 0, + "w": 24, + "h": 3, + "i": "1", + "type": "Text", + "graph": { + "fontColor": "theme", + "backgroundColor": "theme", + "content": "Provide Flink monitoring through OpenTelemetry's Prometheus Receiver", + "fontSize": 14, + "textAlign": "left", + "url": "https://skywalking.apache.org/docs/main/next/en/setup/backend/backend-Rocketmq-monitoring/" + } + } + ], + "id": "Flink-Root", + "layer": "FLINK", + "entity": "All", + "name": "Flink-Root", + "isRoot": true + } + } +] diff --git a/oap-server/server-starter/src/main/resources/ui-initialized-templates/flink/flink-taskManager.json b/oap-server/server-starter/src/main/resources/ui-initialized-templates/flink/flink-taskManager.json new file mode 100644 index 000000000000..79d32083dd2c --- /dev/null +++ b/oap-server/server-starter/src/main/resources/ui-initialized-templates/flink/flink-taskManager.json @@ -0,0 +1,641 @@ +[ + { + "id": "Flink-TaskManager", + "configuration": { + "children": [ + { + "x": 0, + "y": 0, + "w": 8, + "h": 15, + "i": "3", + "type": "Widget", + "id": "3", + "moved": false, + "expressions": [ + "meter_flink_taskManager_jvm_cpu_load/10" + ], + "typesOfMQE": [ + "TIME_SERIES_VALUES" + ], + "graph": { + "type": "Line", + "step": false, + "smooth": false, + "showSymbol": true, + "showXAxis": true, + "showYAxis": true + }, + "widget": { + "name": "JVM CPU Load", + "title": "JVM CPU load(%)", + "tips": "The number of the JVM CPU load" + } + }, + { + "x": 16, + "y": 0, + "w": 8, + "h": 15, + "i": "4", + "type": "Widget", + "id": "4", + "moved": false, + "expressions": [ + "meter_flink_taskManager_jvm_thread_count" + ], + "typesOfMQE": [ + "TIME_SERIES_VALUES" + ], + "graph": { + "type": "Line", + "step": false, + "smooth": false, + "showSymbol": true, + "showXAxis": true, + "showYAxis": true + }, + "widget": { + "name": "JVM thread count", + "title": "JVM Thread Count", + "tips": "The total number of JVM live threads" + } + }, + { + "x": 0, + "y": 15, + "w": 8, + "h": 17, + "i": "5", + "type": "Widget", + "id": "5", + "moved": false, + "expressions": [ + "meter_flink_taskManager_jvm_memory_heap_used/1024/1024" + ], + "typesOfMQE": [ + "TIME_SERIES_VALUES" + ], + "graph": { + "type": "Area", + "opacity": 0.4, + "showXAxis": true, + "showYAxis": true + }, + "widget": { + "name": "JVM Memory Heap Used", + "title": "JVM Memory Heap Used(MB)", + "tips": "The amount of jvm memory heap used" + } + }, + { + "x": 0, + "y": 32, + "w": 8, + "h": 16, + "i": "7", + "type": "Widget", + "id": "7", + "moved": false, + "expressions": [ + "(meter_flink_taskManager_jvm_memory_nonHeap_used)/1024/1024" + ], + "typesOfMQE": [ + "TIME_SERIES_VALUES" + ], + "graph": { + "type": "Area", + "opacity": 0.4, + "showXAxis": true, + "showYAxis": true + }, + "widget": { + "name": "JVM Memory NonHeap Used", + "title": "JVM Memory NonHeap Used(MB)", + "tips": "The amount of JVM nonHeap memory used" + } + }, + { + "x": 8, + "y": 0, + "w": 8, + "h": 15, + "i": "13", + "type": "Widget", + "id": "13", + "moved": false, + "expressions": [ + "meter_flink_taskManager_jvm_cpu_time/1000/1000" + ], + "typesOfMQE": [ + "TIME_SERIES_VALUES" + ], + "graph": { + "type": "Line", + "step": false, + "smooth": false, + "showSymbol": true, + "showXAxis": true, + "showYAxis": true + }, + "widget": { + "title": "JVM CPU Time(ms) Increase (Per Minute)", + "name": "JVM CPU Time", + "tips": "The CPU time used by the JVM increase per minute" + } + }, + { + "x": 8, + "y": 15, + "w": 8, + "h": 17, + "i": "14", + "type": "Widget", + "id": "14", + "moved": false, + "metricConfig": [ + { + "unit": "MB" + } + ], + "expressions": [ + "(meter_flink_taskManager_jvm_memory_heap_available)/1024/1024" + ], + "typesOfMQE": [ + "TIME_SERIES_VALUES" + ], + "graph": { + "type": "Area", + "opacity": 0.4, + "showXAxis": true, + "showYAxis": true, + "showUnit": true + }, + "widget": { + "name": "JVM Memory Heap Available", + "title": "JVM Memory Heap Available(MB)", + "tips": "The amount of available JVM memory Heap" + } + }, + { + "x": 8, + "y": 32, + "w": 8, + "h": 16, + "i": "15", + "type": "Widget", + "id": "15", + "moved": false, + "graph": { + "type": "Area", + "opacity": 0.4, + "showXAxis": true, + "showYAxis": true + }, + "expressions": [ + "(meter_flink_taskManager_jvm_memory_nonHeap_available)/1024/1024" + ], + "typesOfMQE": [ + "TIME_SERIES_VALUES" + ], + "widget": { + "name": "JVM Memory NonHeap Available", + "title": "JVM Memory NonHeap Available(MB)", + "tips": "The amount of available JVM memory nonHeap" + } + }, + { + "x": 16, + "y": 32, + "w": 8, + "h": 16, + "i": "16", + "type": "Widget", + "id": "16", + "moved": false, + "expressions": [ + "meter_flink_taskManager_jvm_memory_metaspace_used/1024/1024" + ], + "typesOfMQE": [ + "TIME_SERIES_VALUES" + ], + "graph": { + "type": "Area", + "opacity": 0.4, + "showXAxis": true, + "showYAxis": true + }, + "widget": { + "name": "JVM Memory Metaspace Used", + "title": "JVM Memory Metaspace Used(MB)", + "tips": "The amount of Used JVM metaspace memory" + } + }, + { + "x": 16, + "y": 15, + "w": 8, + "h": 17, + "i": "17", + "type": "Widget", + "id": "17", + "moved": false, + "expressions": [ + "(meter_flink_taskManager_jvm_memory_metaspace_available)/1024/1024" + ], + "typesOfMQE": [ + "TIME_SERIES_VALUES" + ], + "graph": { + "type": "Area", + "opacity": 0.4, + "showXAxis": true, + "showYAxis": true + }, + "widget": { + "title": "JVM Metaspace Available(MB)", + "tips": "The amount of Available JVM Metaspace Memory", + "name": "JVM Metaspace Available" + } + }, + { + "x": 0, + "y": 120, + "w": 12, + "h": 18, + "i": "18", + "type": "Widget", + "id": "18", + "moved": false, + "expressions": [ + "meter_flink_taskManager_numRecordsIn" + ], + "typesOfMQE": [ + "TIME_SERIES_VALUES" + ], + "graph": { + "type": "Line", + "step": false, + "smooth": false, + "showSymbol": true, + "showXAxis": true, + "showYAxis": true + }, + "widget": { + "name": "NumRecordsIn", + "title": "NumRecordsIn Increase (Per Minute)", + "tips": "The incremental number of records this task has received per minute." + } + }, + { + "x": 12, + "y": 120, + "w": 12, + "h": 18, + "i": "19", + "type": "Widget", + "id": "19", + "moved": false, + "graph": { + "type": "Line", + "step": false, + "smooth": false, + "showSymbol": true, + "showXAxis": true, + "showYAxis": true + }, + "expressions": [ + "meter_flink_taskManager_numRecordsOut" + ], + "typesOfMQE": [ + "TIME_SERIES_VALUES" + ], + "widget": { + "name": "NumRecordsOut", + "title": "NumRecordsOut Increase (Per Minute)", + "tips": "The incremental number of records this task has emitted per minute." + } + }, + { + "x": 0, + "y": 138, + "w": 12, + "h": 18, + "i": "20", + "type": "Widget", + "id": "20", + "moved": false, + "expressions": [ + "meter_flink_taskManager_numBytesInPerSecond" + ], + "typesOfMQE": [ + "TIME_SERIES_VALUES" + ], + "graph": { + "type": "Line", + "step": false, + "smooth": false, + "showSymbol": true, + "showXAxis": true, + "showYAxis": true + }, + "widget": { + "name": "NumBytesInPerSecond", + "title": "NumBytesInPerSecond", + "tips": "The number of bytes received per second" + } + }, + { + "x": 12, + "y": 138, + "w": 12, + "h": 18, + "i": "21", + "type": "Widget", + "id": "21", + "moved": false, + "expressions": [ + "meter_flink_taskManager_numBytesOutPerSecond" + ], + "typesOfMQE": [ + "TIME_SERIES_VALUES" + ], + "widget": { + "name": "NumBytesOutPerSecond", + "title": "NumBytesOutPerSecond", + "tips": "The number of bytes this task emits per second" + }, + "graph": { + "type": "Line", + "step": false, + "smooth": false, + "showSymbol": true, + "showXAxis": true, + "showYAxis": true + } + }, + { + "x": 8, + "y": 48, + "w": 8, + "h": 18, + "i": "22", + "type": "Widget", + "id": "22", + "moved": false, + "expressions": [ + "meter_flink_taskManager_netty_usedMemory/1024/1024" + ], + "graph": { + "type": "Area", + "opacity": 0.4, + "showXAxis": true, + "showYAxis": true + }, + "typesOfMQE": [ + "TIME_SERIES_VALUES" + ], + "widget": { + "name": "Netty UsedMemory", + "title": "Netty UsedMemory(MB)", + "tips": "The amount of used netty memory" + } + }, + { + "x": 0, + "y": 48, + "w": 8, + "h": 18, + "i": "23", + "type": "Widget", + "id": "23", + "moved": false, + "expressions": [ + "meter_flink_taskManager_netty_availableMemory/1024/1024" + ], + "graph": { + "type": "Area", + "opacity": 0.4, + "showXAxis": true, + "showYAxis": true + }, + "typesOfMQE": [ + "TIME_SERIES_VALUES" + ], + "widget": { + "name": "Netty AvailableMemory", + "title": "Netty AvailableMemory(MB)", + "tips": "The amount of available netty memory" + } + }, + { + "x": 16, + "y": 48, + "w": 8, + "h": 18, + "i": "24", + "type": "Widget", + "id": "24", + "moved": false, + "expressions": [ + "meter_flink_taskManager_isBackPressured" + ], + "graph": { + "type": "Line", + "step": false, + "smooth": false, + "showSymbol": true, + "showXAxis": true, + "showYAxis": true + }, + "typesOfMQE": [ + "TIME_SERIES_VALUES" + ], + "widget": { + "name": "IsBackPressured", + "title": "IsBackPressured", + "tips": "Whether the task is back-pressured." + } + }, + { + "x": 0, + "y": 102, + "w": 12, + "h": 18, + "i": "25", + "type": "Widget", + "id": "25", + "moved": false, + "expressions": [ + "meter_flink_taskManager_inPoolUsage*10" + ], + "graph": { + "type": "Line", + "step": false, + "smooth": false, + "showSymbol": true, + "showXAxis": true, + "showYAxis": true + }, + "typesOfMQE": [ + "TIME_SERIES_VALUES" + ], + "widget": { + "name": "InPoolUsage", + "title": "InPoolUsage(%)", + "tips": "An estimate of the input buffers usage. (ignores LocalInputChannels)" + } + }, + { + "x": 12, + "y": 102, + "w": 12, + "h": 18, + "i": "26", + "type": "Widget", + "id": "26", + "moved": false, + "graph": { + "type": "Line", + "step": false, + "smooth": false, + "showSymbol": true, + "showXAxis": true, + "showYAxis": true + }, + "expressions": [ + "meter_flink_taskManager_outPoolUsage" + ], + "typesOfMQE": [ + "TIME_SERIES_VALUES" + ], + "widget": { + "name": "OutPoolUsage", + "title": "OutPoolUsage(%)", + "tips": "An estimate of the output buffers usage. The pool usage can be > 100% if overdraft buffers are being used." + } + }, + { + "x": 12, + "y": 84, + "w": 12, + "h": 18, + "i": "27", + "type": "Widget", + "id": "27", + "moved": false, + "expressions": [ + "meter_flink_taskManager_softBackPressuredTimeMsPerSecond" + ], + "graph": { + "type": "Line", + "step": false, + "smooth": false, + "showSymbol": true, + "showXAxis": true, + "showYAxis": true + }, + "typesOfMQE": [ + "TIME_SERIES_VALUES" + ], + "widget": { + "name": "SoftBackPressuredTimeMsPerSecond", + "title": "SoftBackPressuredTimeMsPerSecond(ms)", + "tips": "The time this task is softly back pressured per second.Softly back pressured task will be still responsive and capable of for example triggering unaligned checkpoints." + } + }, + { + "x": 0, + "y": 84, + "w": 12, + "h": 18, + "i": "28", + "type": "Widget", + "id": "28", + "moved": false, + "graph": { + "type": "Line", + "step": false, + "smooth": false, + "showSymbol": true, + "showXAxis": true, + "showYAxis": true + }, + "expressions": [ + "meter_flink_taskManager_hardBackPressuredTimeMsPerSecond" + ], + "typesOfMQE": [ + "UNKNOWN" + ], + "widget": { + "name": "HardBackPressuredTimeMsPerSecond", + "title": "HardBackPressuredTimeMsPerSecond(ms)", + "tips": "The time this task is back pressured in a hard way per second.During hard back pressured task is completely blocked and unresponsive preventing for example unaligned checkpoints from triggering." + } + }, + { + "x": 0, + "y": 66, + "w": 12, + "h": 18, + "i": "29", + "type": "Widget", + "id": "29", + "moved": false, + "expressions": [ + "meter_flink_taskManager_idleTimeMsPerSecond" + ], + "graph": { + "type": "Line", + "step": false, + "smooth": false, + "showSymbol": true, + "showXAxis": true, + "showYAxis": true + }, + "typesOfMQE": [ + "TIME_SERIES_VALUES" + ], + "widget": { + "name": "IdleTimeMsPerSecond", + "title": "IdleTimeMsPerSecond(ms)", + "tips": "The time this task is idle (has no data to process) per second. Idle time excludes back pressured time, so if the task is back pressured it is not idle." + } + }, + { + "x": 12, + "y": 66, + "w": 12, + "h": 18, + "i": "30", + "type": "Widget", + "id": "30", + "moved": false, + "graph": { + "type": "Line", + "step": false, + "smooth": false, + "showSymbol": true, + "showXAxis": true, + "showYAxis": true + }, + "expressions": [ + "meter_flink_taskManager_busyTimeMsPerSecond" + ], + "typesOfMQE": [ + "TIME_SERIES_VALUES" + ], + "widget": { + "name": "BusyTimeMsPerSecond", + "title": "BusyTimeMsPerSecond(ms)", + "tips": "The time this task is busy (neither idle nor back pressured) per second. Can be NaN, if the value could not be calculated." + } + } + ], + "layer": "FLINK", + "entity": "ServiceInstance", + "name": "Flink-TaskManager", + "id": "Flink-TaskManager" + } + } +] diff --git a/oap-server/server-starter/src/main/resources/ui-initialized-templates/menu.yaml b/oap-server/server-starter/src/main/resources/ui-initialized-templates/menu.yaml index ee51388981fd..0002973d2271 100644 --- a/oap-server/server-starter/src/main/resources/ui-initialized-templates/menu.yaml +++ b/oap-server/server-starter/src/main/resources/ui-initialized-templates/menu.yaml @@ -257,3 +257,13 @@ menus: description: The Go Agent for Apache SkyWalking, which provides the native tracing/metrics/logging abilities for Golang projects. documentLink: https://skywalking.apache.org/docs/main/next/en/setup/backend/dashboards-so11y-go-agent/ i18nKey: self_observability_go_agent + - title: Data Processing Engine + icon: data_processing_engine + description: A data processing engine is a system designed to efficiently process, transform, and analyze large-scale data in real time or batch mode. + i18nKey: data_processing_engine + menus: + - title: Flink + layer: FLINK + description: Provide Flink monitoring through OpenTelemetry's Prometheus Receiver. + documentLink: https://skywalking.apache.org/docs/main/next/en/setup/backend/backend-flink-monitoring/ + i18nKey: data_processing_engine_flink \ No newline at end of file diff --git a/skywalking-ui b/skywalking-ui index 687ae07bb0fd..0ef6b57caecd 160000 --- a/skywalking-ui +++ b/skywalking-ui @@ -1 +1 @@ -Subproject commit 687ae07bb0fda6a0828823243709d0df9310bc33 +Subproject commit 0ef6b57caecd1a9c585384afc247c0eb76f4fc76 diff --git a/test/e2e-v2/cases/flink/docker-compose.yml b/test/e2e-v2/cases/flink/docker-compose.yml new file mode 100644 index 000000000000..7a8f1b102a80 --- /dev/null +++ b/test/e2e-v2/cases/flink/docker-compose.yml @@ -0,0 +1,102 @@ +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +version: "3" + +services: + oap: + extends: + file: ../../script/docker-compose/base-compose.yml + service: oap + ports: + - "12800:12800" + networks: + - e2e + + banyandb: + extends: + file: ../../script/docker-compose/base-compose.yml + service: banyandb + ports: + - 17912 + + jobmanager: + image: flink:2.0-preview1 + environment: + - | + FLINK_PROPERTIES= + jobmanager.rpc.address: jobmanager + metrics.reporter.prom.factory.class: org.apache.flink.metrics.prometheus.PrometheusReporterFactory + metrics.reporter.prom.port: 9260 + ports: + - "8081:8081" + - "9260:9260" + command: jobmanager + healthcheck: + test: ["CMD", "curl", "-f", "http://localhost:8081"] + interval: 30s + timeout: 10s + retries: 3 + networks: + - e2e + + taskmanager: + image: flink:2.0-preview1 + environment: + - | + FLINK_PROPERTIES= + jobmanager.rpc.address: jobmanager + metrics.reporter.prom.factory.class: org.apache.flink.metrics.prometheus.PrometheusReporterFactory + metrics.reporter.prom.port: 9261 + depends_on: + jobmanager: + condition: service_healthy + ports: + - "9261:9261" + command: taskmanager + healthcheck: + test: ["CMD", "curl", "-f", "http://localhost:9261/metrics"] + interval: 30s + timeout: 10s + retries: 3 + networks: + - e2e + + executeJob: + image: flink:2.0-preview1 + depends_on: + taskmanager: + condition: service_healthy + command: > + bash -c " + ./bin/flink run -m jobmanager:8081 examples/streaming/WindowJoin.jar" + networks: + - e2e + + otel-collector: + image: otel/opentelemetry-collector:${OTEL_COLLECTOR_VERSION} + networks: + - e2e + command: [ "--config=/etc/otel-collector-config.yaml" ] + volumes: + - ./otel-collector-config.yaml:/etc/otel-collector-config.yaml + expose: + - 55678 + depends_on: + oap: + condition: service_healthy + +networks: + e2e: \ No newline at end of file diff --git a/test/e2e-v2/cases/flink/e2e.yaml b/test/e2e-v2/cases/flink/e2e.yaml new file mode 100644 index 000000000000..2c523e2d9bf3 --- /dev/null +++ b/test/e2e-v2/cases/flink/e2e.yaml @@ -0,0 +1,45 @@ +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +# This file is used to show how to write configuration files and can be used to test. + +setup: + env: compose + file: docker-compose.yml + timeout: 20m + init-system-environment: ../../script/env + steps: + - name: set PATH + command: export PATH=/tmp/skywalking-infra-e2e/bin:$PATH + - name: install yq + command: bash test/e2e-v2/script/prepare/setup-e2e-shell/install.sh yq + - name: install swctl + command: bash test/e2e-v2/script/prepare/setup-e2e-shell/install.sh swctl + + +trigger: + action: http + interval: 3s + times: 10 + url: http://${jobmanager_host}:${jobmanager_9260}/metrics + method: GET + +verify: + retry: + count: 60 + interval: 3s + cases: + - includes: + - ./flink-cases.yaml diff --git a/test/e2e-v2/cases/flink/expected/endpoint.yml b/test/e2e-v2/cases/flink/expected/endpoint.yml new file mode 100644 index 000000000000..347f8c0dd988 --- /dev/null +++ b/test/e2e-v2/cases/flink/expected/endpoint.yml @@ -0,0 +1,19 @@ +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +{{- contains .}} +- id: {{ b64enc "flink::flink-cluster" }}.1_{{ b64enc "Windowed_Join_Example" }} + name: Windowed_Join_Example +{{- end}} diff --git a/test/e2e-v2/cases/flink/expected/instance.yml b/test/e2e-v2/cases/flink/expected/instance.yml new file mode 100644 index 000000000000..c2041deb7a15 --- /dev/null +++ b/test/e2e-v2/cases/flink/expected/instance.yml @@ -0,0 +1,22 @@ +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +{{- contains . }} +- id: {{ notEmpty .id }} + name: taskmanager:9261 + instanceuuid: {{ notEmpty .instanceuuid }} + attributes: [] + language: UNKNOWN +{{- end }} diff --git a/test/e2e-v2/cases/flink/expected/metrics-has-value-job-task-label.yml b/test/e2e-v2/cases/flink/expected/metrics-has-value-job-task-label.yml new file mode 100644 index 000000000000..fc9bdda9e331 --- /dev/null +++ b/test/e2e-v2/cases/flink/expected/metrics-has-value-job-task-label.yml @@ -0,0 +1,38 @@ +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +debuggingtrace: null +type: TIME_SERIES_VALUES +results: + {{- contains .results }} + - metric: + labels: + - key: flink_job_name + value: Windowed_Join_Example + - key: task_name + value: Source:_Grades_Data_Generator____Map + values: + {{- contains .values }} + - id: {{ notEmpty .id }} + value: {{ .value }} + traceid: null + owner: null + - id: {{ notEmpty .id }} + value: null + traceid: null + owner: null + {{- end}} + {{- end}} +error: null diff --git a/test/e2e-v2/cases/flink/expected/metrics-has-value-jobManager-node-label.yml b/test/e2e-v2/cases/flink/expected/metrics-has-value-jobManager-node-label.yml new file mode 100644 index 000000000000..aef844ec5419 --- /dev/null +++ b/test/e2e-v2/cases/flink/expected/metrics-has-value-jobManager-node-label.yml @@ -0,0 +1,36 @@ +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +debuggingtrace: null +type: TIME_SERIES_VALUES +results: + {{- contains .results }} + - metric: + labels: + - key: jobManager_node + value: jobmanager:9260 + values: + {{- contains .values }} + - id: {{ notEmpty .id }} + value: {{ .value }} + traceid: null + owner: null + - id: {{ notEmpty .id }} + value: null + traceid: null + owner: null + {{- end}} + {{- end}} +error: null diff --git a/test/e2e-v2/cases/flink/expected/metrics-has-value-operator-name-label.yml b/test/e2e-v2/cases/flink/expected/metrics-has-value-operator-name-label.yml new file mode 100644 index 000000000000..e563f9388320 --- /dev/null +++ b/test/e2e-v2/cases/flink/expected/metrics-has-value-operator-name-label.yml @@ -0,0 +1,36 @@ +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +debuggingtrace: null +type: TIME_SERIES_VALUES +results: + {{- contains .results }} + - metric: + labels: + - key: operator_name + value: Source:_Grades_Data_Generator + values: + {{- contains .values }} + - id: {{ notEmpty .id }} + value: {{ .value }} + traceid: null + owner: null + - id: {{ notEmpty .id }} + value: null + traceid: null + owner: null + {{- end}} + {{- end}} +error: null diff --git a/test/e2e-v2/cases/flink/expected/metrics-has-value.yml b/test/e2e-v2/cases/flink/expected/metrics-has-value.yml new file mode 100644 index 000000000000..7b1676057120 --- /dev/null +++ b/test/e2e-v2/cases/flink/expected/metrics-has-value.yml @@ -0,0 +1,34 @@ +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +debuggingtrace: null +type: TIME_SERIES_VALUES +results: + {{- contains .results }} + - metric: + labels: [] + values: + {{- contains .values }} + - id: {{ notEmpty .id }} + value: {{ .value }} + traceid: null + owner: null + - id: {{ notEmpty .id }} + value: null + traceid: null + owner: null + {{- end}} + {{- end}} +error: null \ No newline at end of file diff --git a/test/e2e-v2/cases/flink/expected/service.yml b/test/e2e-v2/cases/flink/expected/service.yml new file mode 100644 index 000000000000..a521c6daea80 --- /dev/null +++ b/test/e2e-v2/cases/flink/expected/service.yml @@ -0,0 +1,24 @@ +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +{{- contains . }} +- id: {{ b64enc "flink::flink-cluster" }}.1 + name: flink::flink-cluster + group: flink + shortname: flink-cluster + layers: + - FLINK + normal: true +{{- end }} \ No newline at end of file diff --git a/test/e2e-v2/cases/flink/flink-cases.yaml b/test/e2e-v2/cases/flink/flink-cases.yaml new file mode 100644 index 000000000000..a31017db2ef2 --- /dev/null +++ b/test/e2e-v2/cases/flink/flink-cases.yaml @@ -0,0 +1,152 @@ +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +# This file is used to show how to write configuration files and can be used to test. + +cases: + # service cases + - query: swctl --display yaml --base-url=http://${oap_host}:${oap_12800}/graphql service ls + expected: expected/service.yml + - query: swctl --display yaml --base-url=http://${oap_host}:${oap_12800}/graphql metrics exec --expression=meter_flink_jobManager_running_job_number --service-name=flink::flink-cluster + expected: expected/metrics-has-value.yml + - query: swctl --display yaml --base-url=http://${oap_host}:${oap_12800}/graphql metrics exec --expression=meter_flink_jobManager_taskManagers_registered_number --service-name=flink::flink-cluster + expected: expected/metrics-has-value.yml + - query: swctl --display yaml --base-url=http://${oap_host}:${oap_12800}/graphql metrics exec --expression=meter_flink_jobManager_taskManagers_slots_total --service-name=flink::flink-cluster + expected: expected/metrics-has-value.yml + - query: swctl --display yaml --base-url=http://${oap_host}:${oap_12800}/graphql metrics exec --expression=meter_flink_jobManager_taskManagers_slots_available --service-name=flink::flink-cluster + expected: expected/metrics-has-value.yml + - query: swctl --display yaml --base-url=http://${oap_host}:${oap_12800}/graphql metrics exec --expression=meter_flink_jobManager_jvm_cpu_load --service-name=flink::flink-cluster + expected: expected/metrics-has-value-jobManager-node-label.yml + - query: swctl --display yaml --base-url=http://${oap_host}:${oap_12800}/graphql metrics exec --expression=meter_flink_jobManager_jvm_cpu_time --service-name=flink::flink-cluster + expected: expected/metrics-has-value-jobManager-node-label.yml + - query: swctl --display yaml --base-url=http://${oap_host}:${oap_12800}/graphql metrics exec --expression=meter_flink_jobManager_jvm_memory_heap_used --service-name=flink::flink-cluster + expected: expected/metrics-has-value-jobManager-node-label.yml + - query: swctl --display yaml --base-url=http://${oap_host}:${oap_12800}/graphql metrics exec --expression=meter_flink_jobManager_jvm_memory_heap_available --service-name=flink::flink-cluster + expected: expected/metrics-has-value-jobManager-node-label.yml + - query: swctl --display yaml --base-url=http://${oap_host}:${oap_12800}/graphql metrics exec --expression=meter_flink_jobManager_jvm_memory_nonHeap_used --service-name=flink::flink-cluster + expected: expected/metrics-has-value-jobManager-node-label.yml + - query: swctl --display yaml --base-url=http://${oap_host}:${oap_12800}/graphql metrics exec --expression=meter_flink_jobManager_jvm_memory_nonHeap_available --service-name=flink::flink-cluster + expected: expected/metrics-has-value-jobManager-node-label.yml + - query: swctl --display yaml --base-url=http://${oap_host}:${oap_12800}/graphql metrics exec --expression=meter_flink_jobManager_jvm_thread_count --service-name=flink::flink-cluster + expected: expected/metrics-has-value-jobManager-node-label.yml + - query: swctl --display yaml --base-url=http://${oap_host}:${oap_12800}/graphql metrics exec --expression=meter_flink_jobManager_jvm_memory_metaspace_used --service-name=flink::flink-cluster + expected: expected/metrics-has-value-jobManager-node-label.yml + - query: swctl --display yaml --base-url=http://${oap_host}:${oap_12800}/graphql metrics exec --expression=meter_flink_jobManager_jvm_memory_metaspace_available --service-name=flink::flink-cluster + expected: expected/metrics-has-value-jobManager-node-label.yml + - query: swctl --display yaml --base-url=http://${oap_host}:${oap_12800}/graphql metrics exec --expression=meter_flink_jobManager_jvm_g1_young_generation_count --service-name=flink::flink-cluster + expected: expected/metrics-has-value-jobManager-node-label.yml + - query: swctl --display yaml --base-url=http://${oap_host}:${oap_12800}/graphql metrics exec --expression=meter_flink_jobManager_jvm_g1_old_generation_count --service-name=flink::flink-cluster + expected: expected/metrics-has-value-jobManager-node-label.yml + - query: swctl --display yaml --base-url=http://${oap_host}:${oap_12800}/graphql metrics exec --expression=meter_flink_jobManager_jvm_g1_young_generation_time --service-name=flink::flink-cluster + expected: expected/metrics-has-value-jobManager-node-label.yml + - query: swctl --display yaml --base-url=http://${oap_host}:${oap_12800}/graphql metrics exec --expression=meter_flink_jobManager_jvm_g1_old_generation_time --service-name=flink::flink-cluster + expected: expected/metrics-has-value-jobManager-node-label.yml + - query: swctl --display yaml --base-url=http://${oap_host}:${oap_12800}/graphql metrics exec --expression=meter_flink_jobManager_jvm_all_garbageCollector_count --service-name=flink::flink-cluster + expected: expected/metrics-has-value-jobManager-node-label.yml + - query: swctl --display yaml --base-url=http://${oap_host}:${oap_12800}/graphql metrics exec --expression=meter_flink_jobManager_jvm_all_garbageCollector_time --service-name=flink::flink-cluster + expected: expected/metrics-has-value-jobManager-node-label.yml + + # instance cases + - query: swctl --display yaml --base-url=http://${oap_host}:${oap_12800}/graphql instance ls --service-name=flink::flink-cluster + expected: expected/instance.yml + - query: swctl --display yaml --base-url=http://${oap_host}:${oap_12800}/graphql metrics exec --expression=meter_flink_taskManager_jvm_cpu_load --service-name=flink::flink-cluster --instance-name=taskmanager:9261 + expected: expected/metrics-has-value.yml + - query: swctl --display yaml --base-url=http://${oap_host}:${oap_12800}/graphql metrics exec --expression=meter_flink_taskManager_jvm_cpu_time --service-name=flink::flink-cluster --instance-name=taskmanager:9261 + expected: expected/metrics-has-value.yml + - query: swctl --display yaml --base-url=http://${oap_host}:${oap_12800}/graphql metrics exec --expression=meter_flink_taskManager_jvm_memory_heap_used --service-name=flink::flink-cluster --instance-name=taskmanager:9261 + expected: expected/metrics-has-value.yml + - query: swctl --display yaml --base-url=http://${oap_host}:${oap_12800}/graphql metrics exec --expression=meter_flink_taskManager_jvm_memory_heap_available --service-name=flink::flink-cluster --instance-name=taskmanager:9261 + expected: expected/metrics-has-value.yml + - query: swctl --display yaml --base-url=http://${oap_host}:${oap_12800}/graphql metrics exec --expression=meter_flink_taskManager_jvm_thread_count --service-name=flink::flink-cluster --instance-name=taskmanager:9261 + expected: expected/metrics-has-value.yml + - query: swctl --display yaml --base-url=http://${oap_host}:${oap_12800}/graphql metrics exec --expression=meter_flink_taskManager_jvm_memory_metaspace_used --service-name=flink::flink-cluster --instance-name=taskmanager:9261 + expected: expected/metrics-has-value.yml + - query: swctl --display yaml --base-url=http://${oap_host}:${oap_12800}/graphql metrics exec --expression=meter_flink_taskManager_jvm_memory_metaspace_available --service-name=flink::flink-cluster --instance-name=taskmanager:9261 + expected: expected/metrics-has-value.yml + - query: swctl --display yaml --base-url=http://${oap_host}:${oap_12800}/graphql metrics exec --expression=meter_flink_taskManager_jvm_memory_nonHeap_used --service-name=flink::flink-cluster --instance-name=taskmanager:9261 + expected: expected/metrics-has-value.yml + - query: swctl --display yaml --base-url=http://${oap_host}:${oap_12800}/graphql metrics exec --expression=meter_flink_taskManager_jvm_memory_nonHeap_available --service-name=flink::flink-cluster --instance-name=taskmanager:9261 + expected: expected/metrics-has-value.yml + + - query: swctl --display yaml --base-url=http://${oap_host}:${oap_12800}/graphql metrics exec --expression=meter_flink_taskManager_numRecordsIn --service-name=flink::flink-cluster --instance-name=taskmanager:9261 + expected: expected/metrics-has-value-job-task-label.yml + - query: swctl --display yaml --base-url=http://${oap_host}:${oap_12800}/graphql metrics exec --expression=meter_flink_taskManager_numRecordsOut --service-name=flink::flink-cluster --instance-name=taskmanager:9261 + expected: expected/metrics-has-value-job-task-label.yml + - query: swctl --display yaml --base-url=http://${oap_host}:${oap_12800}/graphql metrics exec --expression=meter_flink_taskManager_numBytesInPerSecond --service-name=flink::flink-cluster --instance-name=taskmanager:9261 + expected: expected/metrics-has-value-job-task-label.yml + - query: swctl --display yaml --base-url=http://${oap_host}:${oap_12800}/graphql metrics exec --expression=meter_flink_taskManager_numBytesOutPerSecond --service-name=flink::flink-cluster --instance-name=taskmanager:9261 + expected: expected/metrics-has-value-job-task-label.yml + + - query: swctl --display yaml --base-url=http://${oap_host}:${oap_12800}/graphql metrics exec --expression=meter_flink_taskManager_netty_usedMemory --service-name=flink::flink-cluster --instance-name=taskmanager:9261 + expected: expected/metrics-has-value.yml + - query: swctl --display yaml --base-url=http://${oap_host}:${oap_12800}/graphql metrics exec --expression=meter_flink_taskManager_netty_availableMemory --service-name=flink::flink-cluster --instance-name=taskmanager:9261 + expected: expected/metrics-has-value.yml + + - query: swctl --display yaml --base-url=http://${oap_host}:${oap_12800}/graphql metrics exec --expression=meter_flink_taskManager_inPoolUsage --service-name=flink::flink-cluster --instance-name=taskmanager:9261 + expected: expected/metrics-has-value-job-task-label.yml + - query: swctl --display yaml --base-url=http://${oap_host}:${oap_12800}/graphql metrics exec --expression=meter_flink_taskManager_outPoolUsage --service-name=flink::flink-cluster --instance-name=taskmanager:9261 + expected: expected/metrics-has-value-job-task-label.yml + + - query: swctl --display yaml --base-url=http://${oap_host}:${oap_12800}/graphql metrics exec --expression=meter_flink_taskManager_isBackPressured --service-name=flink::flink-cluster --instance-name=taskmanager:9261 + expected: expected/metrics-has-value-job-task-label.yml + - query: swctl --display yaml --base-url=http://${oap_host}:${oap_12800}/graphql metrics exec --expression=meter_flink_taskManager_idleTimeMsPerSecond --service-name=flink::flink-cluster --instance-name=taskmanager:9261 + expected: expected/metrics-has-value-job-task-label.yml + - query: swctl --display yaml --base-url=http://${oap_host}:${oap_12800}/graphql metrics exec --expression=meter_flink_taskManager_isBackPressured --service-name=flink::flink-cluster --instance-name=taskmanager:9261 + expected: expected/metrics-has-value-job-task-label.yml + - query: swctl --display yaml --base-url=http://${oap_host}:${oap_12800}/graphql metrics exec --expression=meter_flink_taskManager_idleTimeMsPerSecond --service-name=flink::flink-cluster --instance-name=taskmanager:9261 + expected: expected/metrics-has-value-job-task-label.yml + - query: swctl --display yaml --base-url=http://${oap_host}:${oap_12800}/graphql metrics exec --expression=meter_flink_taskManager_busyTimeMsPerSecond --service-name=flink::flink-cluster --instance-name=taskmanager:9261 + expected: expected/metrics-has-value-job-task-label.yml + - query: swctl --display yaml --base-url=http://${oap_host}:${oap_12800}/graphql metrics exec --expression=meter_flink_taskManager_softBackPressuredTimeMsPerSecond --service-name=flink::flink-cluster --instance-name=taskmanager:9261 + expected: expected/metrics-has-value-job-task-label.yml + - query: swctl --display yaml --base-url=http://${oap_host}:${oap_12800}/graphql metrics exec --expression=meter_flink_taskManager_hardBackPressuredTimeMsPerSecond --service-name=flink::flink-cluster --instance-name=taskmanager:9261 + expected: expected/metrics-has-value-job-task-label.yml + + # endpoint cases + - query: swctl --display yaml --base-url=http://${oap_host}:${oap_12800}/graphql endpoint ls --service-name=flink::flink-cluster + expected: expected/endpoint.yml + - query: swctl --display yaml --base-url=http://${oap_host}:${oap_12800}/graphql metrics exec --expression=meter_flink_job_restart_number --service-name=flink::flink-cluster --endpoint-name=Windowed_Join_Example + expected: expected/metrics-has-value.yml + - query: swctl --display yaml --base-url=http://${oap_host}:${oap_12800}/graphql metrics exec --expression=meter_flink_job_runningTime --service-name=flink::flink-cluster --endpoint-name=Windowed_Join_Example + expected: expected/metrics-has-value.yml + - query: swctl --display yaml --base-url=http://${oap_host}:${oap_12800}/graphql metrics exec --expression=meter_flink_job_restartingTime --service-name=flink::flink-cluster --endpoint-name=Windowed_Join_Example + expected: expected/metrics-has-value.yml + - query: swctl --display yaml --base-url=http://${oap_host}:${oap_12800}/graphql metrics exec --expression=meter_flink_job_cancellingTime --service-name=flink::flink-cluster --endpoint-name=Windowed_Join_Example + expected: expected/metrics-has-value.yml + - query: swctl --display yaml --base-url=http://${oap_host}:${oap_12800}/graphql metrics exec --expression=meter_flink_job_checkpoints_total --service-name=flink::flink-cluster --endpoint-name=Windowed_Join_Example + expected: expected/metrics-has-value.yml + - query: swctl --display yaml --base-url=http://${oap_host}:${oap_12800}/graphql metrics exec --expression=meter_flink_job_checkpoints_failed --service-name=flink::flink-cluster --endpoint-name=Windowed_Join_Example + expected: expected/metrics-has-value.yml + - query: swctl --display yaml --base-url=http://${oap_host}:${oap_12800}/graphql metrics exec --expression=meter_flink_job_checkpoints_completed --service-name=flink::flink-cluster --endpoint-name=Windowed_Join_Example + expected: expected/metrics-has-value.yml + - query: swctl --display yaml --base-url=http://${oap_host}:${oap_12800}/graphql metrics exec --expression=meter_flink_job_checkpoints_inProgress --service-name=flink::flink-cluster --endpoint-name=Windowed_Join_Example + expected: expected/metrics-has-value.yml + - query: swctl --display yaml --base-url=http://${oap_host}:${oap_12800}/graphql metrics exec --expression=meter_flink_job_lastCheckpointSize --service-name=flink::flink-cluster --endpoint-name=Windowed_Join_Example + expected: expected/metrics-has-value.yml + - query: swctl --display yaml --base-url=http://${oap_host}:${oap_12800}/graphql metrics exec --expression=meter_flink_job_lastCheckpointDuration --service-name=flink::flink-cluster --endpoint-name=Windowed_Join_Example + expected: expected/metrics-has-value.yml + - query: swctl --display yaml --base-url=http://${oap_host}:${oap_12800}/graphql metrics exec --expression=meter_flink_job_currentEmitEventTimeLag --service-name=flink::flink-cluster --endpoint-name=Windowed_Join_Example + expected: expected/metrics-has-value-operator-name-label.yml + - query: swctl --display yaml --base-url=http://${oap_host}:${oap_12800}/graphql metrics exec --expression=meter_flink_job_numRecordsIn --service-name=flink::flink-cluster --endpoint-name=Windowed_Join_Example + expected: expected/metrics-has-value-operator-name-label.yml + - query: swctl --display yaml --base-url=http://${oap_host}:${oap_12800}/graphql metrics exec --expression=meter_flink_job_numRecordsOut --service-name=flink::flink-cluster --endpoint-name=Windowed_Join_Example + expected: expected/metrics-has-value-operator-name-label.yml + - query: swctl --display yaml --base-url=http://${oap_host}:${oap_12800}/graphql metrics exec --expression=meter_flink_job_numBytesInPerSecond --service-name=flink::flink-cluster --endpoint-name=Windowed_Join_Example + expected: expected/metrics-has-value-operator-name-label.yml + - query: swctl --display yaml --base-url=http://${oap_host}:${oap_12800}/graphql metrics exec --expression=meter_flink_job_numBytesOutPerSecond --service-name=flink::flink-cluster --endpoint-name=Windowed_Join_Example + expected: expected/metrics-has-value-operator-name-label.yml + + + diff --git a/test/e2e-v2/cases/flink/otel-collector-config.yaml b/test/e2e-v2/cases/flink/otel-collector-config.yaml new file mode 100644 index 000000000000..e9c0a8c4d58e --- /dev/null +++ b/test/e2e-v2/cases/flink/otel-collector-config.yaml @@ -0,0 +1,75 @@ +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +receivers: + prometheus: + config: + scrape_configs: + - job_name: "flink-jobManager-monitoring" + scrape_interval: 30s + static_configs: + - targets: ['jobmanager:9260'] + labels: + cluster: flink-cluster + relabel_configs: + - source_labels: [ __address__ ] + target_label: jobManager_node + replacement: $$1 + metric_relabel_configs: + - source_labels: [ job_name ] + action: replace + target_label: flink_job_name + replacement: $$1 + - source_labels: [ ] + target_label: job_name + replacement: flink-jobManager-monitoring + + - job_name: "flink-taskManager-monitoring" + scrape_interval: 30s + static_configs: + - targets: [ "taskmanager:9261" ] + labels: + cluster: flink-cluster + relabel_configs: + - source_labels: [ __address__ ] + regex: (.+) + target_label: taskManager_node + replacement: $$1 + metric_relabel_configs: + - source_labels: [ job_name ] + action: replace + target_label: flink_job_name + replacement: $$1 + - source_labels: [ ] + target_label: job_name + replacement: flink-taskManager-monitoring + +exporters: + otlp: + endpoint: oap:11800 + tls: + insecure: true + +processors: + batch: +service: + pipelines: + metrics: + receivers: + - prometheus + processors: + - batch + exporters: + - otlp \ No newline at end of file