Skip to content

OTel jvm process cpu utilization metrics values are higher than cpu values captured by k8s cadvisor #12793

@chinaran

Description

@chinaran

Describe the bug

OTel jvm process cpu utilization metrics values are higher than cpu values captured by k8s cadvisor

Steps to reproduce

The Java service is injected into the OTel Agent by opentelemetry-operator and runs for a while.

Expected behavior

The cpu utilization values, captured by the OTel Agent and the k8s cadvisor, are roughly the same.

Actual behavior

OTel jvm process cpu utilization metrics values are higher than cpu values captured by k8s cadvisor
image

Javaagent or library instrumentation version

1.33.5

Environment

JDK: Temurin-21.0.5+11
OS: CentOS Linux 7


start command: java -XX:+PrintFlagsFinal -XX:MaxRAMPercentage=75.0 -Djava.security.egd=file:/dev/./urandom -jar ./otel-demo-provider-0.0.1-SNAPSHOT.jar

exec to container: java -XshowSettings:system -version:

Operating System Metrics:
    Provider: cgroupv1
    Effective CPU Count: 1
    CPU Period: 100000us
    CPU Quota: 50000us
    CPU Shares: 307us
    List of Processors, 16 total: 
    0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 
    List of Effective Processors, 16 total: 
    0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 
    List of Memory Nodes, 1 total: 
    0 
    List of Available Memory Nodes, 1 total: 
    0 
    Memory Limit: 1000.00M
    Memory Soft Limit: Unlimited
    Memory & Swap Limit: 1000.00M
    Maximum Processes Limit: Unlimited

openjdk version "21.0.5" 2024-10-15 LTS
OpenJDK Runtime Environment Temurin-21.0.5+11 (build 21.0.5+11-LTS)
OpenJDK 64-Bit Server VM Temurin-21.0.5+11 (build 21.0.5+11-LTS, mixed mode, sharing)

Additional context

I tried looking at the corresponding source code, not entirely sure if the source location is correct.

process_runtime_jvm_cpu_utilization Definition: https://github.com/open-telemetry/opentelemetry-java-instrumentation/blob/v1.33.5/instrumentation/runtime-telemetry/runtime-telemetry-java17/library/src/main/java/io/opentelemetry/instrumentation/runtimemetrics/java17/internal/cpu/OverallCpuLoadHandler.java#L23

It is implemented through the getProcessCpuLoad() function:
https://github.com/openjdk/jdk/blob/master/src/jdk.management/linux/native/libmanagement_ext/UnixOperatingSystem.c#L327


container_cpu_usage_seconds_total Definition: https://github.com/kubernetes/kubernetes/blob/master/vendor/github.com/google/cadvisor/metrics/prometheus.go#L164

This is accomplished by reading cpuacct.usage under the container cgroup: https://github.com/kubernetes/kubernetes/blob/master/vendor/github.com/opencontainers/runc/libcontainer/cgroups/fs/cpuacct.go#L54

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingneeds author feedbackWaiting for additional feedback from the authorneeds triageNew issue that requires triagestale

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions