Skip to content

GRPC metric exporter doesn't reconnectΒ #4435

@BalazsBago

Description

@BalazsBago

Describe your environment

OS: Ubuntu
Python version: Python 3.8
SDK version: 1.27.0
API version: 1.27.0
Exporter: 1.27.0

Endpoint:
Telegraf, docker, 1.28, OpenTelemetry input(https://github.com/influxdata/telegraf/tree/master/plugins/inputs/opentelemetry)

What happened?

If the metric endpoint does not exist at the start of a "PeriodicExportingMetricReader" with an "OTLPMetricExporter" then it can't connect to it, even after the endpoint gets alive.
It tries to resend the metric, but without any success:

Transient error StatusCode.UNAVAILABLE encountered while exporting metrics to localhost:4317, retrying in 1s.
...

Steps to Reproduce

  1. Start a "PeriodicExportingMetricReader" with an "OTLPMetricExporter"
from opentelemetry import metrics
from opentelemetry.exporter.otlp.proto.grpc.metric_exporter import OTLPMetricExporter
from opentelemetry.sdk.metrics import MeterProvider
from opentelemetry.sdk.metrics.export import PeriodicExportingMetricReader


reader = PeriodicExportingMetricReader(
        exporter=OTLPMetricExporter(
            endpoint="http://localhost:4317/v1/metrics",
            timeout=60,
        ),
        export_interval_millis=5000,
    )

metrics.set_meter_provider(
    MeterProvider(
        metric_readers=[reader],
    )
)

meter = metrics.get_meter(name='test')

inst = meter.create_counter('counter')
inst.add(1)
inst.add(1)

time.sleep(120)
  1. Start Telegraf with OpenTelemetry endpoint
    Config file (/tmp/tele.conf)
[global_tags]
[agent]
  interval = "10s"
  round_interval = true
  metric_batch_size = 1000
  metric_buffer_limit = 10000
  collection_jitter = "0s"
  flush_interval = "10s"
  flush_jitter = "0s"
  precision = "0s"
  hostname = ""
  omit_hostname = false
[[inputs.opentelemetry]]          
[[outputs.file]]
  files = ["stdout"]
  data_format = "influx"

Start telegraf:

docker run --rm -it -p 4317:4317 -v /tmp/tele.conf:/etc/telegraf/telegraf.conf telegraf:1.28

Expected Result

The exporter should try to rebuild the connection to the endpoint in case of "StatusCode.UNAVAILABLE".

Actual Result

The exporter gets stuck in "StatusCode.UNAVAILABLE" status.

Additional context

No response

Would you like to implement a fix?

None

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions