Skip to content

Conversation

@dongjiang1989
Copy link
Member

@dongjiang1989 dongjiang1989 commented Jan 14, 2025

What type of PR is this?

/kind feature
What this PR does / why we need it:
More metrics info for prometheus and pod hpa

Which issue(s) this PR fixes:
Fixes #

Special notes for your reviewer:

Does this PR introduce a user-facing change?:

(Instrumentation)`karmada-operator`: Introduced `karmada_build_info` metrics to emit the the build info, as well as a bunch of Go runtime metrics.

@karmada-bot karmada-bot added the kind/feature Categorizes issue or PR as related to a new feature. label Jan 14, 2025
@karmada-bot karmada-bot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Jan 14, 2025
@codecov-commenter
Copy link

codecov-commenter commented Jan 14, 2025

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

Attention: Patch coverage is 0% with 27 lines in your changes missing coverage. Please review.

Project coverage is 47.97%. Comparing base (43f2953) to head (276cb0d).
Report is 156 commits behind head on master.

Files with missing lines Patch % Lines
pkg/metrics/version.go 0.00% 17 Missing ⚠️
pkg/version/version.go 0.00% 8 Missing ⚠️
operator/cmd/operator/app/operator.go 0.00% 2 Missing ⚠️

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #6044      +/-   ##
==========================================
- Coverage   48.36%   47.97%   -0.40%     
==========================================
  Files         666      675       +9     
  Lines       54831    55880    +1049     
==========================================
+ Hits        26519    26806     +287     
- Misses      26594    27328     +734     
- Partials     1718     1746      +28     
Flag Coverage Δ
unittests 47.97% <0.00%> (-0.40%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@RainbowMango
Copy link
Member

Thanks @dongjiang1989, can you help to explain why these metrics are necessary for you?
Also, can you give a list of metrics ?

@dongjiang1989
Copy link
Member Author

dongjiang1989 commented Jan 14, 2025

Thanks @dongjiang1989, can you help to explain why these metrics are necessary for you? Also, can you give a list of metrics ?

Thanks @RainbowMango
When inappropriate GOGC, GOMAXPROCS and GOMEMLIMIT are set in the container, it will cause varying degrees of abnormalities in go's scheduling, memory usage, GC and distributed lock time.
Therefore, multiple golang runtime metrics are needed for further monitoring and alarming.

build_info for image upgrade monitoring.

Core Metrics list:

# HELP karmada_operator_build_info A metric with a constant '1' value labeled by version, revision, branch, goversion from which karmada_operator was built, and the goos and goarch for the build.
# TYPE karmada_operator_build_info gauge
karmada_operator_build_info{branch="",goarch="amd64",goos="darwin",goversion="go1.23.3",revision="851c785640e5d66b653f31d5a5ece7f7288116c8-modified",tags="unknown",version=""} 1
# HELP go_sync_mutex_wait_total_seconds_total Approximate cumulative time goroutines have spent blocked on a sync.Mutex, sync.RWMutex, or runtime-internal lock. This metric is useful for identifying global changes in lock contention. Collect a mutex or block profile using the runtime/pprof package for more detailed contention data.
# TYPE go_sync_mutex_wait_total_seconds_total counter
go_sync_mutex_wait_total_seconds_total 0.000339344
# HELP go_sched_gomaxprocs_threads The current runtime.GOMAXPROCS setting, or the number of operating system threads that can execute user-level Go code simultaneously.
# TYPE go_sched_gomaxprocs_threads gauge
go_sched_gomaxprocs_threads 8
# HELP go_sched_goroutines_goroutines Count of live goroutines.
# TYPE go_sched_goroutines_goroutines gauge
go_sched_goroutines_goroutines 19
# HELP go_sched_latencies_seconds Distribution of the time goroutines have spent in the scheduler in a runnable state before actually running. Bucket counts increase monotonically.
# TYPE go_sched_latencies_seconds histogram
go_sched_latencies_seconds_bucket{le="6.399999999999999e-08"} 25
go_sched_latencies_seconds_bucket{le="6.399999999999999e-07"} 27
go_sched_latencies_seconds_bucket{le="7.167999999999999e-06"} 41
go_sched_latencies_seconds_bucket{le="8.191999999999999e-05"} 51
go_sched_latencies_seconds_bucket{le="0.0009175039999999999"} 52
go_sched_latencies_seconds_bucket{le="0.010485759999999998"} 52
go_sched_latencies_seconds_bucket{le="0.11744051199999998"} 52
go_sched_latencies_seconds_bucket{le="+Inf"} 52
go_sched_latencies_seconds_sum 0.000162688
go_sched_latencies_seconds_count 52

@RainbowMango
Copy link
Member

Thanks for the clarification. I will take a look and get back to you.

@RainbowMango
Copy link
Member

By the way, are you building some services based on Karmada?

@dongjiang1989
Copy link
Member Author

By the way, are you building some services based on Karmada?

Yes. Like: Model hub and Cloud platform.
I am willing to continue to contribute to karmada. 😄

@RainbowMango
Copy link
Member

Is it possible to introduce your platform on the Karmada community meeting?

Copy link
Member

@RainbowMango RainbowMango left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/assign

@dongjiang1989
Copy link
Member Author

Is it possible to introduce your platform on the Karmada community meeting?

Very willing to share it.

@RainbowMango
Copy link
Member

That would be great!
Note that, the latest English Meeting will be on 2025-01-21, Chinese Meeting will at 2025-02-11.
Feel free to add an agenda to the Meeting Notes.

@karmada-bot karmada-bot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jan 16, 2025
@dongjiang1989
Copy link
Member Author

That would be great! Note that, the latest English Meeting will be on 2025-01-21, Chinese Meeting will at 2025-02-11. Feel free to add an agenda to the Meeting Notes.

@RainbowMango I don't have editing permissions. Maybe can share it at 2025-02-11.

@RainbowMango
Copy link
Member

Echo from the notes from meeting notes:

Please know:
By joining the google groups you will be able to edit the meeting notes.
Join google group mailing list: https://groups.google.com/forum/#!forum/karmada

Please join the mail group and all members from the mail group can edit this doc.

Copy link
Member

@RainbowMango RainbowMango left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given we already have some metrics related to gc, goroutine, like:

# HELP go_gc_duration_seconds A summary of the pause duration of garbage collection cycles.
# TYPE go_gc_duration_seconds summary
go_gc_duration_seconds{quantile="0"} 3.0146e-05
go_gc_duration_seconds{quantile="0.25"} 4.6354e-05
go_gc_duration_seconds{quantile="0.5"} 5.3922e-05
go_gc_duration_seconds{quantile="0.75"} 0.000103302
go_gc_duration_seconds{quantile="1"} 0.000227208
go_gc_duration_seconds_sum 0.001537905
go_gc_duration_seconds_count 18
# HELP go_goroutines Number of goroutines that currently exist.
# TYPE go_goroutines gauge
go_goroutines 43
# HELP go_info Information about the Go environment.
# TYPE go_info gauge
go_info{version="go1.22.9"} 1
# HELP go_memstats_alloc_bytes Number of bytes allocated and still in use.
# TYPE go_memstats_alloc_bytes gauge
go_memstats_alloc_bytes 6.101576e+06

I wound like to understand the list of metrics introduced or removed by this PR and the reason behind these changes.
Additionally, If this change is accepted, my intension is to extend these metrics to all other components, not just the karmada-operator.

PS: Here is the full list metrics grabbed from v1.12.2:

Click Me to Show the List

HELP certwatcher_read_certificate_errors_total Total number of certificate read errors

TYPE certwatcher_read_certificate_errors_total counter

certwatcher_read_certificate_errors_total 0

HELP certwatcher_read_certificate_total Total number of certificate reads

TYPE certwatcher_read_certificate_total counter

certwatcher_read_certificate_total 0

HELP controller_runtime_active_workers Number of currently used workers per controller

TYPE controller_runtime_active_workers gauge

controller_runtime_active_workers{controller="karmada-operator-controller"} 0

HELP controller_runtime_max_concurrent_reconciles Maximum number of concurrent reconciles per controller

TYPE controller_runtime_max_concurrent_reconciles gauge

controller_runtime_max_concurrent_reconciles{controller="karmada-operator-controller"} 5

HELP controller_runtime_reconcile_errors_total Total number of reconciliation errors per controller

TYPE controller_runtime_reconcile_errors_total counter

controller_runtime_reconcile_errors_total{controller="karmada-operator-controller"} 0

HELP controller_runtime_reconcile_panics_total Total number of reconciliation panics per controller

TYPE controller_runtime_reconcile_panics_total counter

controller_runtime_reconcile_panics_total{controller="karmada-operator-controller"} 0

HELP controller_runtime_reconcile_time_seconds Length of time per reconciliation per controller

TYPE controller_runtime_reconcile_time_seconds histogram

controller_runtime_reconcile_time_seconds_bucket{controller="karmada-operator-controller",le="0.005"} 0
controller_runtime_reconcile_time_seconds_bucket{controller="karmada-operator-controller",le="0.01"} 0
controller_runtime_reconcile_time_seconds_bucket{controller="karmada-operator-controller",le="0.025"} 0
controller_runtime_reconcile_time_seconds_bucket{controller="karmada-operator-controller",le="0.05"} 0
controller_runtime_reconcile_time_seconds_bucket{controller="karmada-operator-controller",le="0.1"} 0
controller_runtime_reconcile_time_seconds_bucket{controller="karmada-operator-controller",le="0.15"} 0
controller_runtime_reconcile_time_seconds_bucket{controller="karmada-operator-controller",le="0.2"} 0
controller_runtime_reconcile_time_seconds_bucket{controller="karmada-operator-controller",le="0.25"} 0
controller_runtime_reconcile_time_seconds_bucket{controller="karmada-operator-controller",le="0.3"} 0
controller_runtime_reconcile_time_seconds_bucket{controller="karmada-operator-controller",le="0.35"} 0
controller_runtime_reconcile_time_seconds_bucket{controller="karmada-operator-controller",le="0.4"} 0
controller_runtime_reconcile_time_seconds_bucket{controller="karmada-operator-controller",le="0.45"} 0
controller_runtime_reconcile_time_seconds_bucket{controller="karmada-operator-controller",le="0.5"} 0
controller_runtime_reconcile_time_seconds_bucket{controller="karmada-operator-controller",le="0.6"} 0
controller_runtime_reconcile_time_seconds_bucket{controller="karmada-operator-controller",le="0.7"} 0
controller_runtime_reconcile_time_seconds_bucket{controller="karmada-operator-controller",le="0.8"} 0
controller_runtime_reconcile_time_seconds_bucket{controller="karmada-operator-controller",le="0.9"} 0
controller_runtime_reconcile_time_seconds_bucket{controller="karmada-operator-controller",le="1"} 0
controller_runtime_reconcile_time_seconds_bucket{controller="karmada-operator-controller",le="1.25"} 0
controller_runtime_reconcile_time_seconds_bucket{controller="karmada-operator-controller",le="1.5"} 0
controller_runtime_reconcile_time_seconds_bucket{controller="karmada-operator-controller",le="1.75"} 0
controller_runtime_reconcile_time_seconds_bucket{controller="karmada-operator-controller",le="2"} 0
controller_runtime_reconcile_time_seconds_bucket{controller="karmada-operator-controller",le="2.5"} 0
controller_runtime_reconcile_time_seconds_bucket{controller="karmada-operator-controller",le="3"} 0
controller_runtime_reconcile_time_seconds_bucket{controller="karmada-operator-controller",le="3.5"} 0
controller_runtime_reconcile_time_seconds_bucket{controller="karmada-operator-controller",le="4"} 0
controller_runtime_reconcile_time_seconds_bucket{controller="karmada-operator-controller",le="4.5"} 0
controller_runtime_reconcile_time_seconds_bucket{controller="karmada-operator-controller",le="5"} 0
controller_runtime_reconcile_time_seconds_bucket{controller="karmada-operator-controller",le="6"} 0
controller_runtime_reconcile_time_seconds_bucket{controller="karmada-operator-controller",le="7"} 0
controller_runtime_reconcile_time_seconds_bucket{controller="karmada-operator-controller",le="8"} 0
controller_runtime_reconcile_time_seconds_bucket{controller="karmada-operator-controller",le="9"} 1
controller_runtime_reconcile_time_seconds_bucket{controller="karmada-operator-controller",le="10"} 1
controller_runtime_reconcile_time_seconds_bucket{controller="karmada-operator-controller",le="15"} 1
controller_runtime_reconcile_time_seconds_bucket{controller="karmada-operator-controller",le="20"} 1
controller_runtime_reconcile_time_seconds_bucket{controller="karmada-operator-controller",le="25"} 1
controller_runtime_reconcile_time_seconds_bucket{controller="karmada-operator-controller",le="30"} 1
controller_runtime_reconcile_time_seconds_bucket{controller="karmada-operator-controller",le="40"} 1
controller_runtime_reconcile_time_seconds_bucket{controller="karmada-operator-controller",le="50"} 1
controller_runtime_reconcile_time_seconds_bucket{controller="karmada-operator-controller",le="60"} 1
controller_runtime_reconcile_time_seconds_bucket{controller="karmada-operator-controller",le="+Inf"} 1
controller_runtime_reconcile_time_seconds_sum{controller="karmada-operator-controller"} 8.374751902
controller_runtime_reconcile_time_seconds_count{controller="karmada-operator-controller"} 1

HELP controller_runtime_reconcile_total Total number of reconciliations per controller

TYPE controller_runtime_reconcile_total counter

controller_runtime_reconcile_total{controller="karmada-operator-controller",result="error"} 0
controller_runtime_reconcile_total{controller="karmada-operator-controller",result="requeue"} 0
controller_runtime_reconcile_total{controller="karmada-operator-controller",result="requeue_after"} 0
controller_runtime_reconcile_total{controller="karmada-operator-controller",result="success"} 1

HELP controller_runtime_terminal_reconcile_errors_total Total number of terminal reconciliation errors per controller

TYPE controller_runtime_terminal_reconcile_errors_total counter

controller_runtime_terminal_reconcile_errors_total{controller="karmada-operator-controller"} 0

HELP controller_runtime_webhook_panics_total Total number of webhook panics

TYPE controller_runtime_webhook_panics_total counter

controller_runtime_webhook_panics_total 0

HELP go_gc_duration_seconds A summary of the pause duration of garbage collection cycles.

TYPE go_gc_duration_seconds summary

go_gc_duration_seconds{quantile="0"} 3.0146e-05
go_gc_duration_seconds{quantile="0.25"} 4.6354e-05
go_gc_duration_seconds{quantile="0.5"} 5.3922e-05
go_gc_duration_seconds{quantile="0.75"} 0.000103302
go_gc_duration_seconds{quantile="1"} 0.000227208
go_gc_duration_seconds_sum 0.001537905
go_gc_duration_seconds_count 18

HELP go_goroutines Number of goroutines that currently exist.

TYPE go_goroutines gauge

go_goroutines 43

HELP go_info Information about the Go environment.

TYPE go_info gauge

go_info{version="go1.22.9"} 1

HELP go_memstats_alloc_bytes Number of bytes allocated and still in use.

TYPE go_memstats_alloc_bytes gauge

go_memstats_alloc_bytes 6.101576e+06

HELP go_memstats_alloc_bytes_total Total number of bytes allocated, even if freed.

TYPE go_memstats_alloc_bytes_total counter

go_memstats_alloc_bytes_total 5.3223024e+07

HELP go_memstats_buck_hash_sys_bytes Number of bytes used by the profiling bucket hash table.

TYPE go_memstats_buck_hash_sys_bytes gauge

go_memstats_buck_hash_sys_bytes 1.47988e+06

HELP go_memstats_frees_total Total number of frees.

TYPE go_memstats_frees_total counter

go_memstats_frees_total 362491

HELP go_memstats_gc_sys_bytes Number of bytes used for garbage collection system metadata.

TYPE go_memstats_gc_sys_bytes gauge

go_memstats_gc_sys_bytes 3.811048e+06

HELP go_memstats_heap_alloc_bytes Number of heap bytes allocated and still in use.

TYPE go_memstats_heap_alloc_bytes gauge

go_memstats_heap_alloc_bytes 6.101576e+06

HELP go_memstats_heap_idle_bytes Number of heap bytes waiting to be used.

TYPE go_memstats_heap_idle_bytes gauge

go_memstats_heap_idle_bytes 6.88128e+06

HELP go_memstats_heap_inuse_bytes Number of heap bytes that are in use.

TYPE go_memstats_heap_inuse_bytes gauge

go_memstats_heap_inuse_bytes 9.0112e+06

HELP go_memstats_heap_objects Number of allocated objects.

TYPE go_memstats_heap_objects gauge

go_memstats_heap_objects 47775

HELP go_memstats_heap_released_bytes Number of heap bytes released to OS.

TYPE go_memstats_heap_released_bytes gauge

go_memstats_heap_released_bytes 6.127616e+06

HELP go_memstats_heap_sys_bytes Number of heap bytes obtained from system.

TYPE go_memstats_heap_sys_bytes gauge

go_memstats_heap_sys_bytes 1.589248e+07

HELP go_memstats_last_gc_time_seconds Number of seconds since 1970 of last garbage collection.

TYPE go_memstats_last_gc_time_seconds gauge

go_memstats_last_gc_time_seconds 1.7373615791665132e+09

HELP go_memstats_lookups_total Total number of pointer lookups.

TYPE go_memstats_lookups_total counter

go_memstats_lookups_total 0

HELP go_memstats_mallocs_total Total number of mallocs.

TYPE go_memstats_mallocs_total counter

go_memstats_mallocs_total 410266

HELP go_memstats_mcache_inuse_bytes Number of bytes in use by mcache structures.

TYPE go_memstats_mcache_inuse_bytes gauge

go_memstats_mcache_inuse_bytes 4800

HELP go_memstats_mcache_sys_bytes Number of bytes used for mcache structures obtained from system.

TYPE go_memstats_mcache_sys_bytes gauge

go_memstats_mcache_sys_bytes 15600

HELP go_memstats_mspan_inuse_bytes Number of bytes in use by mspan structures.

TYPE go_memstats_mspan_inuse_bytes gauge

go_memstats_mspan_inuse_bytes 179840

HELP go_memstats_mspan_sys_bytes Number of bytes used for mspan structures obtained from system.

TYPE go_memstats_mspan_sys_bytes gauge

go_memstats_mspan_sys_bytes 228480

HELP go_memstats_next_gc_bytes Number of heap bytes when next garbage collection will take place.

TYPE go_memstats_next_gc_bytes gauge

go_memstats_next_gc_bytes 1.0777016e+07

HELP go_memstats_other_sys_bytes Number of bytes used for other system allocations.

TYPE go_memstats_other_sys_bytes gauge

go_memstats_other_sys_bytes 843752

HELP go_memstats_stack_inuse_bytes Number of bytes in use by the stack allocator.

TYPE go_memstats_stack_inuse_bytes gauge

go_memstats_stack_inuse_bytes 884736

HELP go_memstats_stack_sys_bytes Number of bytes obtained from system for stack allocator.

TYPE go_memstats_stack_sys_bytes gauge

go_memstats_stack_sys_bytes 884736

HELP go_memstats_sys_bytes Number of bytes obtained from system.

TYPE go_memstats_sys_bytes gauge

go_memstats_sys_bytes 2.3155976e+07

HELP go_threads Number of OS threads created.

TYPE go_threads gauge

go_threads 10

HELP leader_election_master_status Gauge of if the reporting system is master of the relevant lease, 0 indicates backup, 1 indicates master. 'name' is the string used to identify the lease. Please make sure to group by name.

TYPE leader_election_master_status gauge

leader_election_master_status{name="karmada-operator"} 1

HELP process_cpu_seconds_total Total user and system CPU time spent in seconds.

TYPE process_cpu_seconds_total counter

process_cpu_seconds_total 2.51

HELP process_max_fds Maximum number of open file descriptors.

TYPE process_max_fds gauge

process_max_fds 1.048576e+06

HELP process_open_fds Number of open file descriptors.

TYPE process_open_fds gauge

process_open_fds 11

HELP process_resident_memory_bytes Resident memory size in bytes.

TYPE process_resident_memory_bytes gauge

process_resident_memory_bytes 4.9225728e+07

HELP process_start_time_seconds Start time of the process since unix epoch in seconds.

TYPE process_start_time_seconds gauge

process_start_time_seconds 1.73736126681e+09

HELP process_virtual_memory_bytes Virtual memory size in bytes.

TYPE process_virtual_memory_bytes gauge

process_virtual_memory_bytes 1.307947008e+09

HELP process_virtual_memory_max_bytes Maximum amount of virtual memory available in bytes.

TYPE process_virtual_memory_max_bytes gauge

process_virtual_memory_max_bytes 1.8446744073709552e+19

HELP rest_client_requests_total Number of HTTP requests, partitioned by status code, method, and host.

TYPE rest_client_requests_total counter

rest_client_requests_total{code="200",host="10.96.0.1:443",method="GET"} 35
rest_client_requests_total{code="200",host="10.96.0.1:443",method="PUT"} 226
rest_client_requests_total{code="200",host="172.18.0.7:32574",method="GET"} 13
rest_client_requests_total{code="200",host="172.18.0.7:32574",method="PATCH"} 2
rest_client_requests_total{code="200",host="172.18.0.7:32574",method="PUT"} 8
rest_client_requests_total{code="201",host="10.96.0.1:443",method="POST"} 2
rest_client_requests_total{code="201",host="172.18.0.7:32574",method="POST"} 4
rest_client_requests_total{code="409",host="10.96.0.1:443",method="POST"} 17
rest_client_requests_total{code="409",host="172.18.0.7:32574",method="POST"} 26

HELP workqueue_adds_total Total number of adds handled by workqueue

TYPE workqueue_adds_total counter

workqueue_adds_total{controller="karmada-operator-controller",name="karmada-operator-controller"} 1

HELP workqueue_depth Current depth of workqueue

TYPE workqueue_depth gauge

workqueue_depth{controller="karmada-operator-controller",name="karmada-operator-controller"} 0

HELP workqueue_longest_running_processor_seconds How many seconds has the longest running processor for workqueue been running.

TYPE workqueue_longest_running_processor_seconds gauge

workqueue_longest_running_processor_seconds{controller="karmada-operator-controller",name="karmada-operator-controller"} 0

HELP workqueue_queue_duration_seconds How long in seconds an item stays in workqueue before being requested

TYPE workqueue_queue_duration_seconds histogram

workqueue_queue_duration_seconds_bucket{controller="karmada-operator-controller",name="karmada-operator-controller",le="1e-08"} 0
workqueue_queue_duration_seconds_bucket{controller="karmada-operator-controller",name="karmada-operator-controller",le="1e-07"} 0
workqueue_queue_duration_seconds_bucket{controller="karmada-operator-controller",name="karmada-operator-controller",le="1e-06"} 0
workqueue_queue_duration_seconds_bucket{controller="karmada-operator-controller",name="karmada-operator-controller",le="9.999999999999999e-06"} 0
workqueue_queue_duration_seconds_bucket{controller="karmada-operator-controller",name="karmada-operator-controller",le="9.999999999999999e-05"} 1
workqueue_queue_duration_seconds_bucket{controller="karmada-operator-controller",name="karmada-operator-controller",le="0.001"} 1
workqueue_queue_duration_seconds_bucket{controller="karmada-operator-controller",name="karmada-operator-controller",le="0.01"} 1
workqueue_queue_duration_seconds_bucket{controller="karmada-operator-controller",name="karmada-operator-controller",le="0.1"} 1
workqueue_queue_duration_seconds_bucket{controller="karmada-operator-controller",name="karmada-operator-controller",le="1"} 1
workqueue_queue_duration_seconds_bucket{controller="karmada-operator-controller",name="karmada-operator-controller",le="10"} 1
workqueue_queue_duration_seconds_bucket{controller="karmada-operator-controller",name="karmada-operator-controller",le="100"} 1
workqueue_queue_duration_seconds_bucket{controller="karmada-operator-controller",name="karmada-operator-controller",le="1000"} 1
workqueue_queue_duration_seconds_bucket{controller="karmada-operator-controller",name="karmada-operator-controller",le="+Inf"} 1
workqueue_queue_duration_seconds_sum{controller="karmada-operator-controller",name="karmada-operator-controller"} 1.3487e-05
workqueue_queue_duration_seconds_count{controller="karmada-operator-controller",name="karmada-operator-controller"} 1

HELP workqueue_retries_total Total number of retries handled by workqueue

TYPE workqueue_retries_total counter

workqueue_retries_total{controller="karmada-operator-controller",name="karmada-operator-controller"} 0

HELP workqueue_unfinished_work_seconds How many seconds of work has been done that is in progress and hasn't been observed by work_duration. Large values indicate stuck threads. One can deduce the number of stuck threads by observing the rate at which this increases.

TYPE workqueue_unfinished_work_seconds gauge

workqueue_unfinished_work_seconds{controller="karmada-operator-controller",name="karmada-operator-controller"} 0

HELP workqueue_work_duration_seconds How long in seconds processing an item from workqueue takes.

TYPE workqueue_work_duration_seconds histogram

workqueue_work_duration_seconds_bucket{controller="karmada-operator-controller",name="karmada-operator-controller",le="1e-08"} 0
workqueue_work_duration_seconds_bucket{controller="karmada-operator-controller",name="karmada-operator-controller",le="1e-07"} 0
workqueue_work_duration_seconds_bucket{controller="karmada-operator-controller",name="karmada-operator-controller",le="1e-06"} 0
workqueue_work_duration_seconds_bucket{controller="karmada-operator-controller",name="karmada-operator-controller",le="9.999999999999999e-06"} 0
workqueue_work_duration_seconds_bucket{controller="karmada-operator-controller",name="karmada-operator-controller",le="9.999999999999999e-05"} 0
workqueue_work_duration_seconds_bucket{controller="karmada-operator-controller",name="karmada-operator-controller",le="0.001"} 0
workqueue_work_duration_seconds_bucket{controller="karmada-operator-controller",name="karmada-operator-controller",le="0.01"} 0
workqueue_work_duration_seconds_bucket{controller="karmada-operator-controller",name="karmada-operator-controller",le="0.1"} 0
workqueue_work_duration_seconds_bucket{controller="karmada-operator-controller",name="karmada-operator-controller",le="1"} 0
workqueue_work_duration_seconds_bucket{controller="karmada-operator-controller",name="karmada-operator-controller",le="10"} 1
workqueue_work_duration_seconds_bucket{controller="karmada-operator-controller",name="karmada-operator-controller",le="100"} 1
workqueue_work_duration_seconds_bucket{controller="karmada-operator-controller",name="karmada-operator-controller",le="1000"} 1
workqueue_work_duration_seconds_bucket{controller="karmada-operator-controller",name="karmada-operator-controller",le="+Inf"} 1
workqueue_work_duration_seconds_sum{controller="karmada-operator-controller",name="karmada-operator-controller"} 8.374788526
workqueue_work_duration_seconds_count{controller="karmada-operator-controller",name="karmada-operator-controller"} 1

@RainbowMango
Copy link
Member

BTW, I will extend these metrics to all other components.

Hold on, we can postpone this until we make the final decision.

@dongjiang1989
Copy link
Member Author

/cc @RainbowMango Remove gitbranch done.

@RainbowMango
Copy link
Member

I left my comments on #6044 (comment) after exploring some docs about the short commit id.

Copy link
Member

@RainbowMango RainbowMango left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested the name of the short commit ID.

I'll focus on the metrics after we the consensus about the name.

@karmada-bot karmada-bot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Feb 14, 2025
Copy link
Member

@RainbowMango RainbowMango left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the quick response, generally looks good to me for the short commit part except a few more nits.

I will run another look for the metrcis.

func NewCollector(program string) prometheus.Collector {
return prometheus.NewGaugeFunc(
prometheus.GaugeOpts{
Namespace: program,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can omit the component name?
Because the metric emitted from a component indicates it's build info.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The metrcs specification defines different metrics names, and is not distinguished by different tags of the same metrics name.
ref: https://github.com/prometheus/client_golang/blob/main/prometheus/collectors/version/version.go#L28-L30

),
ConstLabels: prometheus.Labels{
"version": Get().GitVersion,
"revision": Get().GitAbbreviativeCommit,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"revision": Get().GitAbbreviativeCommit,
"commit": Get().GitCommit,
"abbr-commit": Get().GitAbbreviativeCommit,

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmmm....🤔 @RainbowMango Probably best to stay consistent with prometheus's specification.

Ref: https://github.com/prometheus/client_golang/blob/main/prometheus/collectors/version/version.go#L35-L42

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I apologize for not explaining why I suggested those changes.
I think the most important aspect of naming is clarity, and I feel the terms version and reversion are ambiguous, which could lead to misunderstandings. For example, I can't tell their difference in the context of Prometheus. (Can you give an example of the build info metric from Prometheus?)

Regarding your point about maintaining consistency with Prometheus, I'd love to, but Karmada might emit both commit and abbreviate commit, we should differentiate them in terms of naming to avoid confusion.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

example:

prometheus_operator_build_info{branch="refs/tags/v0.67.1", goarch="amd64", goos="linux", goversion="go1.20.6", instance="xxx.xxx.xxx.xxx:10250", job="kube-prometheus-stack-operator", namespace="monitoring", pod="kube-prometheus-stack-operator-84fbdc5578-wxqll", revision="ce782eafa", tags="unknown", version="0.67.1"}

Keep branch, revision, version consistent with prometheus's specification.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An ideal metric in my mind would look something like this:

// NewBuildInfoCollector returns a collector that exports metrics about the current version
// information.
func NewBuildInfoCollector() prometheus.Collector {
	return prometheus.NewGaugeFunc(
		prometheus.GaugeOpts{
			Name: "component_build_info",
			Help: "Component build metadata exposed as labels with a constant value of 1.",
			ConstLabels: prometheus.Labels{
				"git_version":      Get().GitVersion,
				"git_commit":       Get().GitCommit,
				"git_short_commit": Get().GitShortCommit,
				"git_tree_state":   Get().GitTreeState,
				"build_date":       Get().BuildDate,
				"go_version":       Get().GoVersion,
				"compiler":         Get().Compiler,
				"platform":         Get().Platform,
			},
		},
		func() float64 { return 1 },
	)
}
  1. I don't think the component name needed to be as namespace, because the component emitting the metric inherently owns it.
  2. All build metadata information is clearly expressed through standardized labels to avoid ambiguity.
  3. I'd love to follow the mature practice like you mentioned above, but I don't think that fit us, mostly because we want more clear label names.

Comment on lines 116 to 118
// Unregister default NewGoCollector
ctrlmetrics.Registry.Unregister(collectors.NewGoCollector())

ctrlmetrics.Registry.MustRegister(
// Go Runtime metrics about debug.GCStats (base metrics) and
// runtime/metrics.
collectors.NewGoCollector(
collectors.WithGoCollectorRuntimeMetrics(
// go runtime gc metrics. (e.g. `go_gc_duration_seconds`
// means garbage collection cycle pause duration)
collectors.MetricsGC,
// go runtime scheduler metrics. (e.g. `go_sched_gomaxprocs_threads`
// means the current runtime.GOMAXPROCS setting)
collectors.MetricsScheduler,
// go runtime memory metrics. (e.g. `go_memstats_alloc_bytes`
// means number of bytes allocated and still in use)
collectors.MetricsMemory,
// go runtime sync lock metrics. (e.g. `go_sync_mutex_wait_total_seconds_total`
// means Approximate cumulative time goroutines have spent blocked on a sync.Mutex, sync.RWMutex, or runtime-internal lock)
collectors.GoRuntimeMetricsRule{Matcher: regexp.MustCompile(`^/sync/.*`)},
),
),
)
// `karmada_operator_build_info` metrics for operator version upgrade
ctrlmetrics.Registry.MustRegister(
version.NewCollector("karmada_operator"),
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @dongjiang1989 The [email protected] included your PR(kubernetes-sigs/controller-runtime#3070). I remember you said(at the last community meeting) that we can do the same if we can bump controller-runtime, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. update controller-runtime to v0.19.6

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Overview

This PR adds new metrics to expose build information and Go runtime metrics for the operator, supporting enhanced observability for Prometheus and pod HPA.

  • Introduces a new field (GitAbbreviativeCommit) in version information with associated updates to metrics collection and tests.
  • Registers the new build_info metrics in the operator's controller manager.

Reviewed Changes

File Description
pkg/version/version.go Adds GitAbbreviativeCommit to Info struct and updates NewCollector.
operator/cmd/operator/app/operator.go Registers the new build_info metrics for operator version upgrades.
pkg/version/base.go Adds a gitAbbreviativeCommit variable to support the new metrics.
pkg/version/version_test.go Updates the expected output to include the new GitAbbreviativeCommit field.

Copilot reviewed 17 out of 17 changed files in this pull request and generated no comments.

Comments suppressed due to low confidence (2)

pkg/version/version.go:30

  • [nitpick] The field name 'GitAbbreviativeCommit' might be less clear than alternatives such as 'GitShortCommit'. Consider renaming for clarity and to align with common versioning terminology.
+	GitAbbreviativeCommit string `json:"gitAbbreviativeCommit"`

pkg/version/base.go:29

  • [nitpick] Consider renaming 'gitAbbreviativeCommit' to 'gitShortCommit' to improve clarity and consistency with conventional naming.
+	gitAbbreviativeCommit = "unknown" // short sha1 from git, output of $(git rev-parse --short HEAD)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why we need this .gitignore file here. Was it committed by mistake?

gitVersion = "v0.0.0-master"
gitCommit = "unknown" // sha1 from git, output of $(git rev-parse HEAD)
gitTreeState = "unknown" // state of git tree, either "clean" or "dirty"
gitAbbreviativeCommit = "unknown" // short sha1 from git, output of $(git rev-parse --short HEAD)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Copilot gives one comment regarding the field name gitAbbreviativeCommit as follows:

The field name 'GitAbbreviativeCommit' might be less clear than alternatives such as 'GitShortCommit'. Consider renaming for clarity and to align with common versioning terminology.

I'm sold by Copilot, and feel GitShortCommit might be more clear, and apologize for my back-and-forth.
Let me know how you think?

),
ConstLabels: prometheus.Labels{
"version": Get().GitVersion,
"revision": Get().GitAbbreviativeCommit,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An ideal metric in my mind would look something like this:

// NewBuildInfoCollector returns a collector that exports metrics about the current version
// information.
func NewBuildInfoCollector() prometheus.Collector {
	return prometheus.NewGaugeFunc(
		prometheus.GaugeOpts{
			Name: "component_build_info",
			Help: "Component build metadata exposed as labels with a constant value of 1.",
			ConstLabels: prometheus.Labels{
				"git_version":      Get().GitVersion,
				"git_commit":       Get().GitCommit,
				"git_short_commit": Get().GitShortCommit,
				"git_tree_state":   Get().GitTreeState,
				"build_date":       Get().BuildDate,
				"go_version":       Get().GoVersion,
				"compiler":         Get().Compiler,
				"platform":         Get().Platform,
			},
		},
		func() float64 { return 1 },
	)
}
  1. I don't think the component name needed to be as namespace, because the component emitting the metric inherently owns it.
  2. All build metadata information is clearly expressed through standardized labels to avoid ambiguity.
  3. I'd love to follow the mature practice like you mentioned above, but I don't think that fit us, mostly because we want more clear label names.

@RainbowMango
Copy link
Member

Two more comments are:

  1. I'd love to put the new build info metric in package pkg/metrics (maybe a new file version.go) where we manage metrics across the project.
  2. Squash the commits once we are ready to go.

@dongjiang1989 dongjiang1989 force-pushed the add-operator-metrics branch 2 times, most recently from 7488879 to 3d1ba66 Compare March 6, 2025 12:33
@dongjiang1989
Copy link
Member Author

Two more comments are:

  1. I'd love to put the new build info metric in package pkg/metrics (maybe a new file version.go) where we manage metrics across the project.
  2. Squash the commits once we are ready to go.

Thanks @RainbowMango Please re-check it

func NewCollector(program string) prometheus.Collector {
return prometheus.NewGaugeFunc(
prometheus.GaugeOpts{
Namespace: program,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@CharlesQQ Do you think using the component name as a prefix for the metrics is necessary?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it is usually used in combination with metrics and prometheus; when prometheus aggregates data, users usually add specific labels related to components from different sources.

@dongjiang1989 dongjiang1989 force-pushed the add-operator-metrics branch 2 times, most recently from 9bb6f2a to 9990b07 Compare March 11, 2025 03:43
@RainbowMango RainbowMango requested a review from Copilot March 11, 2025 07:43
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Overview

This PR introduces additional metrics for the operator by adding a build_info collector that exposes versioning details and Go runtime information.

  • Added a new build_info metric collector in pkg/metrics/version.go using Prometheus GaugeFunc.
  • Updated the operator to register the new build_info metric.
  • Introduced a new GitShortCommit field in version information and updated corresponding tests.

Reviewed Changes

File Description
pkg/metrics/version.go Adds a build_info Prometheus collector for operator version metrics.
operator/cmd/operator/app/operator.go Registers the new build_info metric collector in the operator runtime.
pkg/version/base.go, pkg/version/version.go, pkg/version/version_test.go Introduces and tests the GitShortCommit field in version information.

Copilot reviewed 17 out of 17 changed files in this pull request and generated 1 comment.

@RainbowMango
Copy link
Member

I tested this patch on my side and basically this patch introduced the build info metric as follows:

# HELP karmada_operator_build_info A metric with a constant '1' value labeled by version, commit, short_commit, tree_state, goversion. build_date from which karmada_operator was built, and the goos and goarch for the build.
# TYPE karmada_operator_build_info gauge
karmada_operator_build_info{build_date="2025-03-11T08:21:20Z",compiler="gc",git_commit="f365da6cab870f6d9ff9548ef0b3d9ba0a33ffbe",git_short_commit="f365da6ca",git_tree_state="clean",git_version="v1.13.0-alpha.1-17-gf365da6ca",go_version="go1.22.9",platform="linux/amd64"} 1

In addition, along with the controller-runtime version bump, some other Go runtime metrics are introduced.

Click me to show the full list
# HELP go_cgo_go_to_c_calls_calls_total Count of calls made from Go to C by the current process.
# TYPE go_cgo_go_to_c_calls_calls_total counter
go_cgo_go_to_c_calls_calls_total 0
# HELP go_cpu_classes_gc_mark_assist_cpu_seconds_total Estimated total CPU time goroutines spent performing GC tasks to assist the GC and prevent it from falling behind the application. This metric is an overestimate, and not directly comparable to system CPU time measurements. Compare only with other /cpu/classes metrics.
# TYPE go_cpu_classes_gc_mark_assist_cpu_seconds_total counter
go_cpu_classes_gc_mark_assist_cpu_seconds_total 0.002710444
# HELP go_cpu_classes_gc_mark_dedicated_cpu_seconds_total Estimated total CPU time spent performing GC tasks on processors (as defined by GOMAXPROCS) dedicated to those tasks. This metric is an overestimate, and not directly comparable to system CPU time measurements. Compare only with other /cpu/classes metrics.
# TYPE go_cpu_classes_gc_mark_dedicated_cpu_seconds_total counter
go_cpu_classes_gc_mark_dedicated_cpu_seconds_total 0.002630747
# HELP go_cpu_classes_gc_mark_idle_cpu_seconds_total Estimated total CPU time spent performing GC tasks on spare CPU resources that the Go scheduler could not otherwise find a use for. This should be subtracted from the total GC CPU time to obtain a measure of compulsory GC CPU time. This metric is an overestimate, and not directly comparable to system CPU time measurements. Compare only with other /cpu/classes metrics.
# TYPE go_cpu_classes_gc_mark_idle_cpu_seconds_total counter
go_cpu_classes_gc_mark_idle_cpu_seconds_total 0.006917879
# HELP go_cpu_classes_gc_pause_cpu_seconds_total Estimated total CPU time spent with the application paused by the GC. Even if only one thread is running during the pause, this is computed as GOMAXPROCS times the pause latency because nothing else can be executing. This is the exact sum of samples in /sched/pauses/total/gc:seconds if each sample is multiplied by GOMAXPROCS at the time it is taken. This metric is an overestimate, and not directly comparable to system CPU time measurements. Compare only with other /cpu/classes metrics.
# TYPE go_cpu_classes_gc_pause_cpu_seconds_total counter
go_cpu_classes_gc_pause_cpu_seconds_total 0.00441078
# HELP go_cpu_classes_gc_total_cpu_seconds_total Estimated total CPU time spent performing GC tasks. This metric is an overestimate, and not directly comparable to system CPU time measurements. Compare only with other /cpu/classes metrics. Sum of all metrics in /cpu/classes/gc.
# TYPE go_cpu_classes_gc_total_cpu_seconds_total counter
go_cpu_classes_gc_total_cpu_seconds_total 0.01666985
# HELP go_cpu_classes_idle_cpu_seconds_total Estimated total available CPU time not spent executing any Go or Go runtime code. In other words, the part of /cpu/classes/total:cpu-seconds that was unused. This metric is an overestimate, and not directly comparable to system CPU time measurements. Compare only with other /cpu/classes metrics.
# TYPE go_cpu_classes_idle_cpu_seconds_total counter
go_cpu_classes_idle_cpu_seconds_total 71.508330871
# HELP go_cpu_classes_scavenge_assist_cpu_seconds_total Estimated total CPU time spent returning unused memory to the underlying platform in response eagerly in response to memory pressure. This metric is an overestimate, and not directly comparable to system CPU time measurements. Compare only with other /cpu/classes metrics.
# TYPE go_cpu_classes_scavenge_assist_cpu_seconds_total counter
go_cpu_classes_scavenge_assist_cpu_seconds_total 4.4e-07
# HELP go_cpu_classes_scavenge_background_cpu_seconds_total Estimated total CPU time spent performing background tasks to return unused memory to the underlying platform. This metric is an overestimate, and not directly comparable to system CPU time measurements. Compare only with other /cpu/classes metrics.
# TYPE go_cpu_classes_scavenge_background_cpu_seconds_total counter
go_cpu_classes_scavenge_background_cpu_seconds_total 2.71e-07
# HELP go_cpu_classes_scavenge_total_cpu_seconds_total Estimated total CPU time spent performing tasks that return unused memory to the underlying platform. This metric is an overestimate, and not directly comparable to system CPU time measurements. Compare only with other /cpu/classes metrics. Sum of all metrics in /cpu/classes/scavenge.
# TYPE go_cpu_classes_scavenge_total_cpu_seconds_total counter
go_cpu_classes_scavenge_total_cpu_seconds_total 7.11e-07
# HELP go_cpu_classes_total_cpu_seconds_total Estimated total available CPU time for user Go code or the Go runtime, as defined by GOMAXPROCS. In other words, GOMAXPROCS integrated over the wall-clock duration this process has been executing for. This metric is an overestimate, and not directly comparable to system CPU time measurements. Compare only with other /cpu/classes metrics. Sum of all metrics in /cpu/classes.
# TYPE go_cpu_classes_total_cpu_seconds_total counter
go_cpu_classes_total_cpu_seconds_total 71.5817923
# HELP go_cpu_classes_user_cpu_seconds_total Estimated total CPU time spent running user Go code. This may also include some small amount of time spent in the Go runtime. This metric is an overestimate, and not directly comparable to system CPU time measurements. Compare only with other /cpu/classes metrics.
# TYPE go_cpu_classes_user_cpu_seconds_total counter
go_cpu_classes_user_cpu_seconds_total 0.056790868
# HELP go_gc_cycles_automatic_gc_cycles_total Count of completed GC cycles generated by the Go runtime.
# TYPE go_gc_cycles_automatic_gc_cycles_total counter
go_gc_cycles_automatic_gc_cycles_total 3
# HELP go_gc_cycles_forced_gc_cycles_total Count of completed GC cycles forced by the application.
# TYPE go_gc_cycles_forced_gc_cycles_total counter
go_gc_cycles_forced_gc_cycles_total 0
# HELP go_gc_cycles_total_gc_cycles_total Count of all completed GC cycles.
# TYPE go_gc_cycles_total_gc_cycles_total counter
go_gc_cycles_total_gc_cycles_total 3

# HELP go_gc_gogc_percent Heap size target percentage configured by the user, otherwise 100. This value is set by the GOGC environment variable, and the runtime/debug.SetGCPercent function.
# TYPE go_gc_gogc_percent gauge
go_gc_gogc_percent 100
# HELP go_gc_gomemlimit_bytes Go runtime memory limit configured by the user, otherwise math.MaxInt64. This value is set by the GOMEMLIMIT environment variable, and the runtime/debug.SetMemoryLimit function.
# TYPE go_gc_gomemlimit_bytes gauge
go_gc_gomemlimit_bytes 9.223372036854776e+18
# HELP go_gc_heap_allocs_by_size_bytes Distribution of heap allocations by approximate size. Bucket counts increase monotonically. Note that this does not include tiny objects as defined by /gc/heap/tiny/allocs:objects, only tiny blocks.
# TYPE go_gc_heap_allocs_by_size_bytes histogram
go_gc_heap_allocs_by_size_bytes_bucket{le="8.999999999999998"} 4265
go_gc_heap_allocs_by_size_bytes_bucket{le="24.999999999999996"} 14926
go_gc_heap_allocs_by_size_bytes_bucket{le="64.99999999999999"} 28563
go_gc_heap_allocs_by_size_bytes_bucket{le="144.99999999999997"} 37128
go_gc_heap_allocs_by_size_bytes_bucket{le="320.99999999999994"} 40099
go_gc_heap_allocs_by_size_bytes_bucket{le="704.9999999999999"} 41370
go_gc_heap_allocs_by_size_bytes_bucket{le="1536.9999999999998"} 41841
go_gc_heap_allocs_by_size_bytes_bucket{le="3200.9999999999995"} 42022
go_gc_heap_allocs_by_size_bytes_bucket{le="6528.999999999999"} 42142
go_gc_heap_allocs_by_size_bytes_bucket{le="13568.999999999998"} 42187
go_gc_heap_allocs_by_size_bytes_bucket{le="27264.999999999996"} 42211
go_gc_heap_allocs_by_size_bytes_bucket{le="+Inf"} 42231
go_gc_heap_allocs_by_size_bytes_sum 6.455864e+06
go_gc_heap_allocs_by_size_bytes_count 42231
# HELP go_gc_heap_allocs_bytes_total Cumulative sum of memory allocated to the heap by the application.
# TYPE go_gc_heap_allocs_bytes_total counter
go_gc_heap_allocs_bytes_total 6.455864e+06
# HELP go_gc_heap_allocs_objects_total Cumulative count of heap allocations triggered by the application. Note that this does not include tiny objects as defined by /gc/heap/tiny/allocs:objects, only tiny blocks.
# TYPE go_gc_heap_allocs_objects_total counter
go_gc_heap_allocs_objects_total 42231
# HELP go_gc_heap_frees_by_size_bytes Distribution of freed heap allocations by approximate size. Bucket counts increase monotonically. Note that this does not include tiny objects as defined by /gc/heap/tiny/allocs:objects, only tiny blocks.
# TYPE go_gc_heap_frees_by_size_bytes histogram
go_gc_heap_frees_by_size_bytes_bucket{le="8.999999999999998"} 3154
go_gc_heap_frees_by_size_bytes_bucket{le="24.999999999999996"} 9687
go_gc_heap_frees_by_size_bytes_bucket{le="64.99999999999999"} 16837
go_gc_heap_frees_by_size_bytes_bucket{le="144.99999999999997"} 23107
go_gc_heap_frees_by_size_bytes_bucket{le="320.99999999999994"} 24351
go_gc_heap_frees_by_size_bytes_bucket{le="704.9999999999999"} 25118
go_gc_heap_frees_by_size_bytes_bucket{le="1536.9999999999998"} 25387
go_gc_heap_frees_by_size_bytes_bucket{le="3200.9999999999995"} 25473
go_gc_heap_frees_by_size_bytes_bucket{le="6528.999999999999"} 25518
go_gc_heap_frees_by_size_bytes_bucket{le="13568.999999999998"} 25535
go_gc_heap_frees_by_size_bytes_bucket{le="27264.999999999996"} 25542
go_gc_heap_frees_by_size_bytes_bucket{le="+Inf"} 25551
go_gc_heap_frees_by_size_bytes_sum 3.086032e+06
go_gc_heap_frees_by_size_bytes_count 25551
# HELP go_gc_heap_frees_bytes_total Cumulative sum of heap memory freed by the garbage collector.
# TYPE go_gc_heap_frees_bytes_total counter
go_gc_heap_frees_bytes_total 3.086032e+06
# HELP go_gc_heap_frees_objects_total Cumulative count of heap allocations whose storage was freed by the garbage collector. Note that this does not include tiny objects as defined by /gc/heap/tiny/allocs:objects, only tiny blocks.
# TYPE go_gc_heap_frees_objects_total counter
go_gc_heap_frees_objects_total 25551
# HELP go_gc_heap_goal_bytes Heap size target for the end of the GC cycle.
# TYPE go_gc_heap_goal_bytes gauge
go_gc_heap_goal_bytes 7.086984e+06
# HELP go_gc_heap_live_bytes Heap memory occupied by live objects that were marked by the previous GC.
# TYPE go_gc_heap_live_bytes gauge
go_gc_heap_live_bytes 3.208352e+06
# HELP go_gc_heap_objects_objects Number of objects, live or unswept, occupying heap memory.
# TYPE go_gc_heap_objects_objects gauge
go_gc_heap_objects_objects 16680
# HELP go_gc_heap_tiny_allocs_objects_total Count of small allocations that are packed together into blocks. These allocations are counted separately from other allocations because each individual allocation is not tracked by the runtime, only their block. Each block is already accounted for in allocs-by-size and frees-by-size.
# TYPE go_gc_heap_tiny_allocs_objects_total counter
go_gc_heap_tiny_allocs_objects_total 1725
# HELP go_gc_limiter_last_enabled_gc_cycle GC cycle the last time the GC CPU limiter was enabled. This metric is useful for diagnosing the root cause of an out-of-memory error, because the limiter trades memory for CPU time when the GC's CPU time gets too high. This is most likely to occur with use of SetMemoryLimit. The first GC cycle is cycle 1, so a value of 0 indicates that it was never enabled.
# TYPE go_gc_limiter_last_enabled_gc_cycle gauge
go_gc_limiter_last_enabled_gc_cycle 0
# HELP go_gc_pauses_seconds Deprecated. Prefer the identical /sched/pauses/total/gc:seconds.
# TYPE go_gc_pauses_seconds histogram
go_gc_pauses_seconds_bucket{le="6.399999999999999e-08"} 0
go_gc_pauses_seconds_bucket{le="6.399999999999999e-07"} 0
go_gc_pauses_seconds_bucket{le="7.167999999999999e-06"} 0
go_gc_pauses_seconds_bucket{le="8.191999999999999e-05"} 4
go_gc_pauses_seconds_bucket{le="0.0009175039999999999"} 6
go_gc_pauses_seconds_bucket{le="0.010485759999999998"} 6
go_gc_pauses_seconds_bucket{le="0.11744051199999998"} 6
go_gc_pauses_seconds_bucket{le="+Inf"} 6
go_gc_pauses_seconds_sum 0.000192512
go_gc_pauses_seconds_count 6
# HELP go_gc_scan_globals_bytes The total amount of global variable space that is scannable.
# TYPE go_gc_scan_globals_bytes gauge
go_gc_scan_globals_bytes 640608
# HELP go_gc_scan_heap_bytes The total amount of heap space that is scannable.
# TYPE go_gc_scan_heap_bytes gauge
go_gc_scan_heap_bytes 2.780048e+06
# HELP go_gc_scan_stack_bytes The number of bytes of stack that were scanned last GC cycle.
# TYPE go_gc_scan_stack_bytes gauge
go_gc_scan_stack_bytes 29672
# HELP go_gc_scan_total_bytes The total amount space that is scannable. Sum of all metrics in /gc/scan.
# TYPE go_gc_scan_total_bytes gauge
go_gc_scan_total_bytes 3.450328e+06
# HELP go_gc_stack_starting_size_bytes The stack size of new goroutines.
# TYPE go_gc_stack_starting_size_bytes gauge
go_gc_stack_starting_size_bytes 2048
# HELP go_godebug_non_default_behavior_execerrdot_events_total The number of non-default behaviors executed by the os/exec package due to a non-default GODEBUG=execerrdot=... setting.
# TYPE go_godebug_non_default_behavior_execerrdot_events_total counter
go_godebug_non_default_behavior_execerrdot_events_total 0
# HELP go_godebug_non_default_behavior_gocachehash_events_total The number of non-default behaviors executed by the cmd/go package due to a non-default GODEBUG=gocachehash=... setting.
# TYPE go_godebug_non_default_behavior_gocachehash_events_total counter
go_godebug_non_default_behavior_gocachehash_events_total 0
# HELP go_godebug_non_default_behavior_gocachetest_events_total The number of non-default behaviors executed by the cmd/go package due to a non-default GODEBUG=gocachetest=... setting.
# TYPE go_godebug_non_default_behavior_gocachetest_events_total counter
go_godebug_non_default_behavior_gocachetest_events_total 0
# HELP go_godebug_non_default_behavior_gocacheverify_events_total The number of non-default behaviors executed by the cmd/go package due to a non-default GODEBUG=gocacheverify=... setting.
# TYPE go_godebug_non_default_behavior_gocacheverify_events_total counter
go_godebug_non_default_behavior_gocacheverify_events_total 0
# HELP go_godebug_non_default_behavior_gotypesalias_events_total The number of non-default behaviors executed by the go/types package due to a non-default GODEBUG=gotypesalias=... setting.
# TYPE go_godebug_non_default_behavior_gotypesalias_events_total counter
go_godebug_non_default_behavior_gotypesalias_events_total 0
# HELP go_godebug_non_default_behavior_http2client_events_total The number of non-default behaviors executed by the net/http package due to a non-default GODEBUG=http2client=... setting.
# TYPE go_godebug_non_default_behavior_http2client_events_total counter
go_godebug_non_default_behavior_http2client_events_total 0
# HELP go_godebug_non_default_behavior_http2server_events_total The number of non-default behaviors executed by the net/http package due to a non-default GODEBUG=http2server=... setting.
# TYPE go_godebug_non_default_behavior_http2server_events_total counter
go_godebug_non_default_behavior_http2server_events_total 0
# HELP go_godebug_non_default_behavior_httplaxcontentlength_events_total The number of non-default behaviors executed by the net/http package due to a non-default GODEBUG=httplaxcontentlength=... setting.
# TYPE go_godebug_non_default_behavior_httplaxcontentlength_events_total counter
go_godebug_non_default_behavior_httplaxcontentlength_events_total 0
# HELP go_godebug_non_default_behavior_httpmuxgo121_events_total The number of non-default behaviors executed by the net/http package due to a non-default GODEBUG=httpmuxgo121=... setting.
# TYPE go_godebug_non_default_behavior_httpmuxgo121_events_total counter
go_godebug_non_default_behavior_httpmuxgo121_events_total 0
# HELP go_godebug_non_default_behavior_installgoroot_events_total The number of non-default behaviors executed by the go/build package due to a non-default GODEBUG=installgoroot=... setting.
# TYPE go_godebug_non_default_behavior_installgoroot_events_total counter
go_godebug_non_default_behavior_installgoroot_events_total 0
# HELP go_godebug_non_default_behavior_jstmpllitinterp_events_total The number of non-default behaviors executed by the html/template package due to a non-default GODEBUG=jstmpllitinterp=... setting.
# TYPE go_godebug_non_default_behavior_jstmpllitinterp_events_total counter
go_godebug_non_default_behavior_jstmpllitinterp_events_total 0
# HELP go_godebug_non_default_behavior_multipartmaxheaders_events_total The number of non-default behaviors executed by the mime/multipart package due to a non-default GODEBUG=multipartmaxheaders=... setting.
# TYPE go_godebug_non_default_behavior_multipartmaxheaders_events_total counter
go_godebug_non_default_behavior_multipartmaxheaders_events_total 0
# HELP go_godebug_non_default_behavior_multipartmaxparts_events_total The number of non-default behaviors executed by the mime/multipart package due to a non-default GODEBUG=multipartmaxparts=... setting.
# TYPE go_godebug_non_default_behavior_multipartmaxparts_events_total counter
go_godebug_non_default_behavior_multipartmaxparts_events_total 0
# HELP go_godebug_non_default_behavior_multipathtcp_events_total The number of non-default behaviors executed by the net package due to a non-default GODEBUG=multipathtcp=... setting.
# TYPE go_godebug_non_default_behavior_multipathtcp_events_total counter
go_godebug_non_default_behavior_multipathtcp_events_total 0
# HELP go_godebug_non_default_behavior_netedns0_events_total The number of non-default behaviors executed by the net package due to a non-default GODEBUG=netedns0=... setting.
# TYPE go_godebug_non_default_behavior_netedns0_events_total counter
go_godebug_non_default_behavior_netedns0_events_total 0
# HELP go_godebug_non_default_behavior_panicnil_events_total The number of non-default behaviors executed by the runtime package due to a non-default GODEBUG=panicnil=... setting.
# TYPE go_godebug_non_default_behavior_panicnil_events_total counter
go_godebug_non_default_behavior_panicnil_events_total 0
# HELP go_godebug_non_default_behavior_randautoseed_events_total The number of non-default behaviors executed by the math/rand package due to a non-default GODEBUG=randautoseed=... setting.
# TYPE go_godebug_non_default_behavior_randautoseed_events_total counter
go_godebug_non_default_behavior_randautoseed_events_total 0
# HELP go_godebug_non_default_behavior_tarinsecurepath_events_total The number of non-default behaviors executed by the archive/tar package due to a non-default GODEBUG=tarinsecurepath=... setting.
# TYPE go_godebug_non_default_behavior_tarinsecurepath_events_total counter
go_godebug_non_default_behavior_tarinsecurepath_events_total 0
# HELP go_godebug_non_default_behavior_tls10server_events_total The number of non-default behaviors executed by the crypto/tls package due to a non-default GODEBUG=tls10server=... setting.
# TYPE go_godebug_non_default_behavior_tls10server_events_total counter
go_godebug_non_default_behavior_tls10server_events_total 0
# HELP go_godebug_non_default_behavior_tlsmaxrsasize_events_total The number of non-default behaviors executed by the crypto/tls package due to a non-default GODEBUG=tlsmaxrsasize=... setting.
# TYPE go_godebug_non_default_behavior_tlsmaxrsasize_events_total counter
go_godebug_non_default_behavior_tlsmaxrsasize_events_total 0
# HELP go_godebug_non_default_behavior_tlsrsakex_events_total The number of non-default behaviors executed by the crypto/tls package due to a non-default GODEBUG=tlsrsakex=... setting.
# TYPE go_godebug_non_default_behavior_tlsrsakex_events_total counter
go_godebug_non_default_behavior_tlsrsakex_events_total 0
# HELP go_godebug_non_default_behavior_tlsunsafeekm_events_total The number of non-default behaviors executed by the crypto/tls package due to a non-default GODEBUG=tlsunsafeekm=... setting.
# TYPE go_godebug_non_default_behavior_tlsunsafeekm_events_total counter
go_godebug_non_default_behavior_tlsunsafeekm_events_total 0
# HELP go_godebug_non_default_behavior_x509sha1_events_total The number of non-default behaviors executed by the crypto/x509 package due to a non-default GODEBUG=x509sha1=... setting.
# TYPE go_godebug_non_default_behavior_x509sha1_events_total counter
go_godebug_non_default_behavior_x509sha1_events_total 0
# HELP go_godebug_non_default_behavior_x509usefallbackroots_events_total The number of non-default behaviors executed by the crypto/x509 package due to a non-default GODEBUG=x509usefallbackroots=... setting.
# TYPE go_godebug_non_default_behavior_x509usefallbackroots_events_total counter
go_godebug_non_default_behavior_x509usefallbackroots_events_total 0
# HELP go_godebug_non_default_behavior_x509usepolicies_events_total The number of non-default behaviors executed by the crypto/x509 package due to a non-default GODEBUG=x509usepolicies=... setting.
# TYPE go_godebug_non_default_behavior_x509usepolicies_events_total counter
go_godebug_non_default_behavior_x509usepolicies_events_total 0
# HELP go_godebug_non_default_behavior_zipinsecurepath_events_total The number of non-default behaviors executed by the archive/zip package due to a non-default GODEBUG=zipinsecurepath=... setting.
# TYPE go_godebug_non_default_behavior_zipinsecurepath_events_total counter
go_godebug_non_default_behavior_zipinsecurepath_events_total 0

# HELP go_memory_classes_heap_free_bytes Memory that is completely free and eligible to be returned to the underlying system, but has not been. This metric is the runtime's estimate of free address space that is backed by physical memory.
# TYPE go_memory_classes_heap_free_bytes gauge
go_memory_classes_heap_free_bytes 393216
# HELP go_memory_classes_heap_objects_bytes Memory occupied by live objects and dead objects that have not yet been marked free by the garbage collector.
# TYPE go_memory_classes_heap_objects_bytes gauge
go_memory_classes_heap_objects_bytes 3.369832e+06
# HELP go_memory_classes_heap_released_bytes Memory that is completely free and has been returned to the underlying system. This metric is the runtime's estimate of free address space that is still mapped into the process, but is not backed by physical memory.
# TYPE go_memory_classes_heap_released_bytes gauge
go_memory_classes_heap_released_bytes 1.523712e+06
# HELP go_memory_classes_heap_stacks_bytes Memory allocated from the heap that is reserved for stack space, whether or not it is currently in-use. Currently, this represents all stack memory for goroutines. It also includes all OS thread stacks in non-cgo programs. Note that stacks may be allocated differently in the future, and this may change.
# TYPE go_memory_classes_heap_stacks_bytes gauge
go_memory_classes_heap_stacks_bytes 753664
# HELP go_memory_classes_heap_unused_bytes Memory that is reserved for heap objects but is not currently used to hold heap objects.
# TYPE go_memory_classes_heap_unused_bytes gauge
go_memory_classes_heap_unused_bytes 2.348184e+06
# HELP go_memory_classes_metadata_mcache_free_bytes Memory that is reserved for runtime mcache structures, but not in-use.
# TYPE go_memory_classes_metadata_mcache_free_bytes gauge
go_memory_classes_metadata_mcache_free_bytes 10800
# HELP go_memory_classes_metadata_mcache_inuse_bytes Memory that is occupied by runtime mcache structures that are currently being used.
# TYPE go_memory_classes_metadata_mcache_inuse_bytes gauge
go_memory_classes_metadata_mcache_inuse_bytes 4800
# HELP go_memory_classes_metadata_mspan_free_bytes Memory that is reserved for runtime mspan structures, but not in-use.
# TYPE go_memory_classes_metadata_mspan_free_bytes gauge
go_memory_classes_metadata_mspan_free_bytes 14720
# HELP go_memory_classes_metadata_mspan_inuse_bytes Memory that is occupied by runtime mspan structures that are currently being used.
# TYPE go_memory_classes_metadata_mspan_inuse_bytes gauge
go_memory_classes_metadata_mspan_inuse_bytes 115840
# HELP go_memory_classes_metadata_other_bytes Memory that is reserved for or used to hold runtime metadata.
# TYPE go_memory_classes_metadata_other_bytes gauge
go_memory_classes_metadata_other_bytes 3.206776e+06
# HELP go_memory_classes_os_stacks_bytes Stack memory allocated by the underlying operating system. In non-cgo programs this metric is currently zero. This may change in the future.In cgo programs this metric includes OS thread stacks allocated directly from the OS. Currently, this only accounts for one stack in c-shared and c-archive build modes, and other sources of stacks from the OS are not measured. This too may change in the future.
# TYPE go_memory_classes_os_stacks_bytes gauge
go_memory_classes_os_stacks_bytes 0
# HELP go_memory_classes_other_bytes Memory used by execution trace buffers, structures for debugging the runtime, finalizer and profiler specials, and more.
# TYPE go_memory_classes_other_bytes gauge
go_memory_classes_other_bytes 1.048419e+06
# HELP go_memory_classes_profiling_buckets_bytes Memory that is used by the stack trace hash map used for profiling.
# TYPE go_memory_classes_profiling_buckets_bytes gauge
go_memory_classes_profiling_buckets_bytes 1.453117e+06
# HELP go_memory_classes_total_bytes All memory mapped by the Go runtime into the current process as read-write. Note that this does not include memory mapped by code called via cgo or via the syscall package. Sum of all metrics in /memory/classes.
# TYPE go_memory_classes_total_bytes gauge
go_memory_classes_total_bytes 1.424308e+07

# HELP go_sched_gomaxprocs_threads The current runtime.GOMAXPROCS setting, or the number of operating system threads that can execute user-level Go code simultaneously.
# TYPE go_sched_gomaxprocs_threads gauge
go_sched_gomaxprocs_threads 4
# HELP go_sched_goroutines_goroutines Count of live goroutines.
# TYPE go_sched_goroutines_goroutines gauge
go_sched_goroutines_goroutines 29
# HELP go_sched_latencies_seconds Distribution of the time goroutines have spent in the scheduler in a runnable state before actually running. Bucket counts increase monotonically.
# TYPE go_sched_latencies_seconds histogram
go_sched_latencies_seconds_bucket{le="6.399999999999999e-08"} 45
go_sched_latencies_seconds_bucket{le="6.399999999999999e-07"} 54
go_sched_latencies_seconds_bucket{le="7.167999999999999e-06"} 66
go_sched_latencies_seconds_bucket{le="8.191999999999999e-05"} 100
go_sched_latencies_seconds_bucket{le="0.0009175039999999999"} 108
go_sched_latencies_seconds_bucket{le="0.010485759999999998"} 108
go_sched_latencies_seconds_bucket{le="0.11744051199999998"} 108
go_sched_latencies_seconds_bucket{le="+Inf"} 108
go_sched_latencies_seconds_sum 0.000907328
go_sched_latencies_seconds_count 108
# HELP go_sched_pauses_stopping_gc_seconds Distribution of individual GC-related stop-the-world stopping latencies. This is the time it takes from deciding to stop the world until all Ps are stopped. This is a subset of the total GC-related stop-the-world time (/sched/pauses/total/gc:seconds). During this time, some threads may be executing. Bucket counts increase monotonically.
# TYPE go_sched_pauses_stopping_gc_seconds histogram
go_sched_pauses_stopping_gc_seconds_bucket{le="6.399999999999999e-08"} 0
go_sched_pauses_stopping_gc_seconds_bucket{le="6.399999999999999e-07"} 1
go_sched_pauses_stopping_gc_seconds_bucket{le="7.167999999999999e-06"} 2
go_sched_pauses_stopping_gc_seconds_bucket{le="8.191999999999999e-05"} 5
go_sched_pauses_stopping_gc_seconds_bucket{le="0.0009175039999999999"} 6
go_sched_pauses_stopping_gc_seconds_bucket{le="0.010485759999999998"} 6
go_sched_pauses_stopping_gc_seconds_bucket{le="0.11744051199999998"} 6
go_sched_pauses_stopping_gc_seconds_bucket{le="+Inf"} 6
go_sched_pauses_stopping_gc_seconds_sum 0.000104128
go_sched_pauses_stopping_gc_seconds_count 6
# HELP go_sched_pauses_stopping_other_seconds Distribution of individual non-GC-related stop-the-world stopping latencies. This is the time it takes from deciding to stop the world until all Ps are stopped. This is a subset of the total non-GC-related stop-the-world time (/sched/pauses/total/other:seconds). During this time, some threads may be executing. Bucket counts increase monotonically.
# TYPE go_sched_pauses_stopping_other_seconds histogram
go_sched_pauses_stopping_other_seconds_bucket{le="6.399999999999999e-08"} 0
go_sched_pauses_stopping_other_seconds_bucket{le="6.399999999999999e-07"} 0
go_sched_pauses_stopping_other_seconds_bucket{le="7.167999999999999e-06"} 0
go_sched_pauses_stopping_other_seconds_bucket{le="8.191999999999999e-05"} 0
go_sched_pauses_stopping_other_seconds_bucket{le="0.0009175039999999999"} 0
go_sched_pauses_stopping_other_seconds_bucket{le="0.010485759999999998"} 0
go_sched_pauses_stopping_other_seconds_bucket{le="0.11744051199999998"} 0
go_sched_pauses_stopping_other_seconds_bucket{le="+Inf"} 0
go_sched_pauses_stopping_other_seconds_sum 0
go_sched_pauses_stopping_other_seconds_count 0
# HELP go_sched_pauses_total_gc_seconds Distribution of individual GC-related stop-the-world pause latencies. This is the time from deciding to stop the world until the world is started again. Some of this time is spent getting all threads to stop (this is measured directly in /sched/pauses/stopping/gc:seconds), during which some threads may still be running. Bucket counts increase monotonically.
# TYPE go_sched_pauses_total_gc_seconds histogram
go_sched_pauses_total_gc_seconds_bucket{le="6.399999999999999e-08"} 0
go_sched_pauses_total_gc_seconds_bucket{le="6.399999999999999e-07"} 0
go_sched_pauses_total_gc_seconds_bucket{le="7.167999999999999e-06"} 0
go_sched_pauses_total_gc_seconds_bucket{le="8.191999999999999e-05"} 4
go_sched_pauses_total_gc_seconds_bucket{le="0.0009175039999999999"} 6
go_sched_pauses_total_gc_seconds_bucket{le="0.010485759999999998"} 6
go_sched_pauses_total_gc_seconds_bucket{le="0.11744051199999998"} 6
go_sched_pauses_total_gc_seconds_bucket{le="+Inf"} 6
go_sched_pauses_total_gc_seconds_sum 0.000192512
go_sched_pauses_total_gc_seconds_count 6
# HELP go_sched_pauses_total_other_seconds Distribution of individual non-GC-related stop-the-world pause latencies. This is the time from deciding to stop the world until the world is started again. Some of this time is spent getting all threads to stop (measured directly in /sched/pauses/stopping/other:seconds). Bucket counts increase monotonically.
# TYPE go_sched_pauses_total_other_seconds histogram
go_sched_pauses_total_other_seconds_bucket{le="6.399999999999999e-08"} 0
go_sched_pauses_total_other_seconds_bucket{le="6.399999999999999e-07"} 0
go_sched_pauses_total_other_seconds_bucket{le="7.167999999999999e-06"} 0
go_sched_pauses_total_other_seconds_bucket{le="8.191999999999999e-05"} 0
go_sched_pauses_total_other_seconds_bucket{le="0.0009175039999999999"} 0
go_sched_pauses_total_other_seconds_bucket{le="0.010485759999999998"} 0
go_sched_pauses_total_other_seconds_bucket{le="0.11744051199999998"} 0
go_sched_pauses_total_other_seconds_bucket{le="+Inf"} 0
go_sched_pauses_total_other_seconds_sum 0
go_sched_pauses_total_other_seconds_count 0
# HELP go_sync_mutex_wait_total_seconds_total Approximate cumulative time goroutines have spent blocked on a sync.Mutex, sync.RWMutex, or runtime-internal lock. This metric is useful for identifying global changes in lock contention. Collect a mutex or block profile using the runtime/pprof package for more detailed contention data.
# TYPE go_sync_mutex_wait_total_seconds_total counter
go_sync_mutex_wait_total_seconds_total 0.000212208

@RainbowMango
Copy link
Member

Generally looks good to me. (though, I still can not understand why it is necessary to have the component name karmada_operator as the prefix of the metric name.)
@jabellard , I wonder if you use metrics for monitoring? Do you want to take another look?

@dongjiang1989
Copy link
Member Author

Generally looks good to me. (though, I still can not understand why it is necessary to have the component name karmada_operator as the prefix of the metric name.) @jabellard , I wonder if you use metrics for monitoring? Do you want to take another look?

Thanks @RainbowMango

Some message:

foobar_build_info (for a pseudo-metric that provides `metadata` about the running binary)

from Prometheus Metric and label naming

@RainbowMango
Copy link
Member

Thanks for the info.
I'm not concerned with the naming practice. In my opinion, we are not violating the Prometheus naming practice you mentioned above, because the build info is kind of a generic metric and can be shared by all of the components, so, another approach is we can name the metric with karamda_build_info.

Compared to the naming rule, I'm more curious about how you use the metric, especially what issues might raise if component names are not used as prefixes for the metric.

@jabellard
Copy link
Member

Generally looks good to me. (though, I still can not understand why it is necessary to have the component name karmada_operator as the prefix of the metric name.)
@jabellard , I wonder if you use metrics for monitoring? Do you want to take another look?

Observabity is very important for us as we want to be informed of events such as a control plane failing to reach a registered member cluster, amongst other things, via alerts/alarms. Currently, we're not using the metrics from the operator for anything, but maybe will make use of them at some point.

Signed-off-by: dongjiang <[email protected]>
Co-Authored-By: Copilot <[email protected]>
Copy link
Member

@RainbowMango RainbowMango left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

Now waiting for CI signals.

@karmada-bot karmada-bot added the lgtm Indicates that a PR is ready to be merged. label Mar 20, 2025
Copy link
Member

@RainbowMango RainbowMango left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/approve

@karmada-bot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: RainbowMango

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@karmada-bot karmada-bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 20, 2025
@karmada-bot karmada-bot merged commit 0713e4b into karmada-io:master Mar 20, 2025
24 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. kind/feature Categorizes issue or PR as related to a new feature. lgtm Indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants