Skip to content

Conversation

xrmx
Copy link
Contributor

@xrmx xrmx commented Feb 28, 2025

Changes

Celery site here:
https://docs.celeryq.dev/en/stable/index.html

And we have an instrumentation for celery in opentelemetry-python-contrib.

Merge requirement checklist

Copy link
Member

@lmolkova lmolkova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Celery seems to be a background task processing library rather than a messaging system. Messaging semantic conventions might not really apply to it since they are focused on common messaging scenarios involving network calls. I would imaging celery integrations for specific systems (like RabbitMQ) to either leverage underlying instrumentation for this library or, if celery needs to be instrumented, it should set the messaging system to the actual broker configured by the user.

@github-project-automation github-project-automation bot moved this from Untriaged to Blocked in Semantic Conventions Triage Mar 4, 2025
@xrmx
Copy link
Contributor Author

xrmx commented Mar 4, 2025

Celery seems to be a background task processing library rather than a messaging system. Messaging semantic conventions might not really apply to it since they are focused on common messaging scenarios involving network calls. I would imaging celery integrations for specific systems (like RabbitMQ) to either leverage underlying instrumentation for this library or, if celery needs to be instrumented, it should set the messaging system to the actual broker configured by the user.

We already have a celery instrumentation but we were not setting the messaging.system attribute, see open-telemetry/opentelemetry-python-contrib#3265 . The original PR had queue as value which does not seem much informative.
We may be able to get the broker from the client, a popular broker for celery is redis, will it fit in that list?

@lmolkova
Copy link
Member

lmolkova commented Mar 5, 2025

We already have a celery instrumentation but we were not setting the messaging.system attribute, see open-telemetry/opentelemetry-python-contrib#3265 . The original PR had queue as value which does not seem much informative. We may be able to get the broker from the client, a popular broker for celery is redis, will it fit in that list?

yep, redis would fit.
If you have strong reasons to believe we need celery to be in the messaging.system, e.g. if there are some celery-specific behaviors, or the underlying broker is not known, please let me know. I'm happy to change my opinion

@shivanshuraj1333
Copy link
Member

shivanshuraj1333 commented Mar 5, 2025

I believe that since users typically configure the underlying broker (e.g., Redis, RabbitMQ) through Celery, the Python auto-instrumentation should extract as much relevant information as possible from the Celery instrumentation.

Currently, however, certain details, such as the broker URL are missing from the context.

Ideally, if the broker URL can be identified, it should be used to populate the messaging.system attribute. In scenarios where only the task processing library is instrumented (as opposed to, say, a Kafka or RMQ client), it might be more appropriate to set messaging.system to the task processing library. This approach helps avoid incomplete instrumentation. Also from the end user perspective, the broker URL is just for queuing the tasks.

If the sem conv says, that messaging.system attribute is reserved for brokers, we may need to think of some another attribute which helps in identifying the processing library.

PTAL at open-telemetry/opentelemetry-python-contrib#3265 (comment) I'm not sure if there's an actual way to extract broker info from the context.

@lmolkova
Copy link
Member

lmolkova commented Mar 5, 2025

@shivanshuraj1333 @xrmx I replied in the https://github.com/open-telemetry/opentelemetry-python-contrib/pull/3265/files#r1981937391 - looking into existing instrumentation I don't believe it has a significant intersection with messaging semconv - it only populates message id, conversation id and destination name from messaging, but populates a lot of celery-specific attributes.

I don't see how setting messaging.system would be helpful - backends will assume that spans follow messaging semconv, but these spans would lack the details.

I'm suggesting to think about generic task processing (which maybe in-process) provided by celery as a layer on top of messaging semconv.

@shivanshuraj1333
Copy link
Member

what could be the attribute name for it? any suggestions?

@lmolkova
Copy link
Member

lmolkova commented Mar 5, 2025

created #1963 - we should document the limitations of messaging semconv.

//cc @open-telemetry/semconv-messaging-approvers

@lmolkova
Copy link
Member

lmolkova commented Mar 5, 2025

what could be the attribute name for it? any suggestions?

could you elaborate? attribute name for what?

@shivanshuraj1333
Copy link
Member

shivanshuraj1333 commented Mar 5, 2025

if you run a demo application like this, you'd see attributes coming like below, which captures all the otel-python instrumented celery specific attributes:

- celery.action ⇒ `run`
- celery.delivery_info ⇒ `{'exchange': '', 'routing_key': 'distilled-guidance-dpo-25-10k-v1', 'priority': 0, 'redelivered': False}`
- celery.hostname ⇒ `gen1@bestapi-6dfffd8486-6h9x5`
- celery.reply_to ⇒ `86b14e3b-bbdb-3d12-8838-c6125a236bc6`
- celery.state ⇒ `SUCCESS`
- celery.task_name ⇒ `agent_telemetry_processor`
- messaging.conversation_id ⇒ `90054b2c-4e21-4edd-8999-605c1aa703b0`
- messaging.destination ⇒ `distilled-guidance-dpo-25-10k-v1`
- messaging.message_id ⇒ `90054b2c-4e21-4edd-8999-605c1aa703b0`

Now I see attributes with messaging.* in the backend to do correlations, but as per discussion we can not have messaging.system=celery.

Now one way to know that task processor celery is used is to check the presence of celery.* attributes or capture it under some attribute name. Is there a sem conv for that or we can have something like messaging.task.processor=celery

Copy link

This PR was marked stale due to lack of activity. It will be closed in 7 days.

@github-actions github-actions bot added the Stale label Mar 21, 2025
@joaopgrassi
Copy link
Member

joaopgrassi commented Mar 25, 2025

Now I see attributes with messaging.* in the backend to do correlations, but as per discussion we can not have messaging.system=celery.

Now one way to know that task processor celery is used is to check the presence of celery.* attributes or capture it under some attribute name. Is there a sem conv for that or we can have something like messaging.task.processor=celery

It is not that we cannot have, more so that by using it, is misleading and can lead to issues as backends will expect messaging semantics which celery in this case lacks as @lmolkova pointed out.

AFAIK we don't have an attribute you could set. I think the path forward is this also by @lmolkova

I'm suggesting to think about generic task processing (which maybe in-process) provided by celery as a layer on top of messaging semconv.

@github-actions github-actions bot removed the Stale label Mar 26, 2025
Copy link

This PR was marked stale due to lack of activity. It will be closed in 7 days.

@github-actions github-actions bot added the Stale label Apr 10, 2025
Copy link

Closed as inactive. Feel free to reopen if this PR is still being worked on.

@github-actions github-actions bot closed this Apr 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request Stale

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

4 participants