Skip to content

Memory Leak in OpenTelemetry Tracing — Request to Upgrade to opentelemetry 1.35.0+ #53763

@bmoon4

Description

@bmoon4

Apache Airflow version

3.0.3

If "Other Airflow 2 version" selected, which one?

No response

What happened?

When Opentelemetry trace is enabled in Airflow 3.0.2, we are experiencing memory leaks in schedulers, triggerers.

We tested this by deploying three schedulers:

  • scheduler-1 with otel-metrics setup (AIRFLOW__METRICS__OTEL_ON=true, etc)
  • scheduler-2 with otel-traces setup (AIRFLOW__TRACES__OTEL_ON=true, etc)
  • scheduler-3 without otel setup (no metrics, no traces)

These schedulers have identical DAG files and run in the same k8s namespace.

And we found that scheduler-2 with otel-traces setup shows the typical memory leak trend in the Grafana dashboard ( I can't post the screenshot here sorry :( )

After some digging, we found that the issue is already reported and fixed in the opentelemtry-python repo.

Image Image
  • Current OTEL version in Airflow constraint: 1.27.0

https://raw.githubusercontent.com/apache/airflow/constraints-3.0.2/constraints-3.12.txt (we are using Airflow 3.0.2)
https://raw.githubusercontent.com/apache/airflow/constraints-3.0.3/constraints-3.12.txt

...
opentelemetry-api==1.27.0
opentelemetry-exporter-otlp-proto-common==1.27.0
opentelemetry-exporter-otlp-proto-grpc==1.27.0
opentelemetry-exporter-otlp-proto-http==1.27.0
opentelemetry-exporter-otlp==1.27.0
opentelemetry-exporter-prometheus==0.48b0
opentelemetry-proto==1.27.0
opentelemetry-sdk==1.27.0
opentelemetry-semantic-conventions==0.48b0
...

Could you please consider upgrading the latest version of Opentelemetry packages in Airflow future version to prevent memory leaking.

What you think should happen instead?

No response

How to reproduce

Airflow OTEL trace setup

https://airflow.apache.org/docs/apache-airflow/stable/administration-and-deployment/logging-monitoring/traces.html

Operating System

debian 12

Versions of Apache Airflow Providers

No response

Deployment

Other Docker-based deployment

Deployment details

No response

Anything else?

No response

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

Metadata

Metadata

Assignees

No one assigned

    Labels

    area:corekind:bugThis is a clearly a bugneeds-triagelabel for new issues that we didn't triage yet

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions