Skip to content

Additional metrics exported from Celery workers #3463

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

rbagd
Copy link
Contributor

@rbagd rbagd commented May 5, 2025

Description

This PR includes an enhancement regarding metrics exported from Celery workers and implements these measurements

  • Prefetched task up-down counter (number of tasks queued locally in the worker) and time spent in prefetch mode
  • Active task up-down counter and task processing duration

Currently, there's only one metric being exported - flower.task.runtime.seconds, and it was renamed to follow semantic conventions (reported in the changelog).

Changes in this PR also fix a memory leak present in current instrumentation.

Fixes #3458

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)

How Has This Been Tested?

  • Unit tests added for all newly included and updated metrics

Does This PR Require a Core Repo Change?

  • No.

Checklist:

  • Followed the style guidelines of this project
  • Changelogs have been updated
  • Unit tests have been added

@rbagd rbagd requested a review from a team as a code owner May 5, 2025 10:31
@rbagd rbagd force-pushed the include_more_celery_metrics branch 4 times, most recently from 3755a64 to 8867346 Compare May 5, 2025 11:40
@rbagd rbagd force-pushed the include_more_celery_metrics branch 2 times, most recently from 09038ae to 2d192ca Compare May 13, 2025 09:40
@rbagd rbagd force-pushed the include_more_celery_metrics branch from 2d192ca to 63b559b Compare May 22, 2025 10:02
@rbagd
Copy link
Contributor Author

rbagd commented May 22, 2025

@xrmx @emdneto May I ask one of you to give a first look at this? I understand there's nobody who owns Celery component, so I'm afraid this will drift into oblivion.

@rbagd rbagd force-pushed the include_more_celery_metrics branch from 2221552 to e9e80e1 Compare May 26, 2025 07:05
@bschoenmaeckers
Copy link
Contributor

bschoenmaeckers commented May 28, 2025

It would also be nice to have task result counters, like the following

  • total finished tasks
  • total sucessfull tasks
  • total failed tasks
  • total retried tasks

@rbagd rbagd force-pushed the include_more_celery_metrics branch from 99db9da to a417078 Compare June 11, 2025 11:08
@rbagd
Copy link
Contributor Author

rbagd commented Jun 23, 2025

It would also be nice to have task result counters, like the following

* total finished tasks

* total sucessfull tasks

* total failed tasks

* total retried tasks

Agreed that it'd be nice to have. I don't want to increase the scope of this MR as it's already quite large but I can look into this in the near future.

@rbagd rbagd force-pushed the include_more_celery_metrics branch from e3a2c0f to c42bbb8 Compare June 23, 2025 12:24
@@ -96,6 +97,12 @@ def add(x, y):
_TASK_REVOKED_TERMINATED_SIGNAL_KEY = "celery.terminated.signal"
_TASK_NAME_KEY = "celery.task_name"

# Metric names
_TASK_COUNT_ACTIVE = "messaging.client.active_tasks"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These 3 new metrics are not in semantic conventions right? I don't think we should add them to the same namespace as the others. If adding unspecified metrics at all.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to summarize my viewpoint

# Already in semconv
* _TASK_PROCESSING_TIME = messaging_metrics.MESSAGING_PROCESS_DURATION

# Probably should be in semconv
* _TASK_COUNT_ACTIVE = "messaging.client.active_tasks"

# Debatable as this is is more linked to Celery
* _TASK_COUNT_PREFETCHED = "messaging.client.prefetched_tasks"
* _TASK_PREFETCH_TIME = "messaging.prefetch.duration"

By different namespace, do you mean replacing messaging by something like celery for the last two (or three) of these?


### Fixed

- `opentelemetry-instrumentation-celery` Fix a memory leak where a reference to a task identifier is kept indefinitely
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please extract this change in another PR so we can get this reviewed and hopefully merged before next release?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say the same. The memory leak seems more feasible to review and merge.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I opened this small PR to just fix the leak.


def create_celery_metrics(self, meter) -> None:
self.metrics = {
"flower.task.runtime.seconds": meter.create_histogram(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need to discuss if we can keep this and the new metric to avoid breakage.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's fine of course to keep this for backwards compatibility, albeit not sure it's worth it in this case. Metric name has nothing to do with OTel or semantic conventions, and references an unrelated project. :/

@rbagd
Copy link
Contributor Author

rbagd commented Aug 13, 2025

Sorry for delays - summer time and all that. I'll go through the review comments.

@rbagd rbagd force-pushed the include_more_celery_metrics branch from 0ef3755 to d9f3fee Compare August 13, 2025 07:36
@rbagd rbagd force-pushed the include_more_celery_metrics branch from d9f3fee to 1df9e8d Compare August 13, 2025 07:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Improve metrics in Celery instrumentation
4 participants