-
Notifications
You must be signed in to change notification settings - Fork 1k
Description
Describe the bug
OTel jvm process cpu utilization metrics values are higher than cpu values captured by k8s cadvisor
Steps to reproduce
The Java service is injected into the OTel Agent by opentelemetry-operator and runs for a while.
Expected behavior
The cpu utilization values, captured by the OTel Agent and the k8s cadvisor, are roughly the same.
Actual behavior
OTel jvm process cpu utilization metrics values are higher than cpu values captured by k8s cadvisor

Javaagent or library instrumentation version
1.33.5
Environment
JDK: Temurin-21.0.5+11
OS: CentOS Linux 7
start command: java -XX:+PrintFlagsFinal -XX:MaxRAMPercentage=75.0 -Djava.security.egd=file:/dev/./urandom -jar ./otel-demo-provider-0.0.1-SNAPSHOT.jar
exec to container: java -XshowSettings:system -version:
Operating System Metrics:
Provider: cgroupv1
Effective CPU Count: 1
CPU Period: 100000us
CPU Quota: 50000us
CPU Shares: 307us
List of Processors, 16 total:
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
List of Effective Processors, 16 total:
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
List of Memory Nodes, 1 total:
0
List of Available Memory Nodes, 1 total:
0
Memory Limit: 1000.00M
Memory Soft Limit: Unlimited
Memory & Swap Limit: 1000.00M
Maximum Processes Limit: Unlimited
openjdk version "21.0.5" 2024-10-15 LTS
OpenJDK Runtime Environment Temurin-21.0.5+11 (build 21.0.5+11-LTS)
OpenJDK 64-Bit Server VM Temurin-21.0.5+11 (build 21.0.5+11-LTS, mixed mode, sharing)
Additional context
I tried looking at the corresponding source code, not entirely sure if the source location is correct.
process_runtime_jvm_cpu_utilization Definition: https://github.com/open-telemetry/opentelemetry-java-instrumentation/blob/v1.33.5/instrumentation/runtime-telemetry/runtime-telemetry-java17/library/src/main/java/io/opentelemetry/instrumentation/runtimemetrics/java17/internal/cpu/OverallCpuLoadHandler.java#L23
It is implemented through the getProcessCpuLoad() function:
https://github.com/openjdk/jdk/blob/master/src/jdk.management/linux/native/libmanagement_ext/UnixOperatingSystem.c#L327
container_cpu_usage_seconds_total Definition: https://github.com/kubernetes/kubernetes/blob/master/vendor/github.com/google/cadvisor/metrics/prometheus.go#L164
This is accomplished by reading cpuacct.usage under the container cgroup: https://github.com/kubernetes/kubernetes/blob/master/vendor/github.com/opencontainers/runc/libcontainer/cgroups/fs/cpuacct.go#L54