Skip to content

Grpc Server metrics rpc_server_duration contains high cardinality labels of network_peer_port  #12126

@jianwu

Description

@jianwu

Describe the bug

GRPC Server metrics: rpc_server_duration contains high cardinality labels cardinality labels network_peer_port. As port number can change for every request. The total time series will be huge which makes the metrics not very useful.

Steps to reproduce

  • Start a standard grpc application with OTel Java Agent enabled and enabled prometheus metrics server at port 9464.
  • Issue grpc request continuously, (In our case, the Kubernetes just issue health check every few seconds).
  • Look at the metrics page: "http://localhost:9464"
  • Observe the page size is huge with tens thousands metrics line looks like:
rpc_server_duration_milliseconds_bucket{network_peer_address="127.0.0.1",network_peer_port="32924",network_type="ipv4",otel_scope_name="io.opentelemetry.grpc-1.6",otel_scope_version="2.4.0-alpha",rpc_grpc_status_code="0",rpc_method="Check",rpc_service="grpc.health.v1.Health",rpc_system="grpc",server_address="",server_port="50051",le="0.0"} 0

The reason of huge number of entries is because it contains high cardinality labels like network_peer_port

Expected behavior

For prometheus metrics, we should avoid high cardinality labels.

Actual behavior

There are high cardinality labels.

Javaagent or library instrumentation version

1.38.0

Environment

JDK: 17
OS: linux

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingneeds author feedbackWaiting for additional feedback from the authorneeds triageNew issue that requires triagestale

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions