Skip to content

Conversation

smoke
Copy link
Contributor

@smoke smoke commented Oct 6, 2025

When a Job is pushed via perform_in or friends, it goes through the following life-cycle:

  1. First the Job is pushed with at attribute in Redis from Sidekiq::Client - this will now create <...> scheduled span (before it was <...> publish
  2. Then the Job is handled from Sidekiq::Scheduled#enqueue and pushed without at attribute in Redis from Sidekiq::Client - this will keep creating <...> publish span
  3. Then the Job is handled by the Worker as usual

Additionally when using :propagation_style = :link and :trace_poller_enqueue = true a handy set of links are added between the 3 actors above.

@smoke smoke marked this pull request as draft October 6, 2025 13:41
@smoke smoke force-pushed the feat/sidekiq-links-to-scheduled-jobs branch 4 times, most recently from 03a5fdb to 4eca039 Compare October 6, 2025 13:57
@smoke smoke closed this Oct 6, 2025
@smoke smoke reopened this Oct 8, 2025
@smoke smoke marked this pull request as ready for review October 8, 2025 13:51
smoke added 3 commits October 8, 2025 16:52
…d add links

When a Job is pushed via `perform_in` or friends, it goes through the following life-cycle:
1. First the Job is pushed with `at` attribute in Redis from `Sidekiq::Client` - this will now create `<...> scheduled` span (before it was `<...> publish`
2. Then the Job is handled from `Sidekiq::Scheduled#enqueue` and pushed without `at` attribute in Redis from `Sidekiq::Client` - this will keep creating `<...> publish` span
3. Then the Job is handled by the Worker as usual

Additionally when using `:propagation_style = :link` and `:trace_poller_enqueue = true` a handy set of links are added between the 3 actors above.
@smoke smoke force-pushed the feat/sidekiq-links-to-scheduled-jobs branch from 44cda99 to 6a3983f Compare October 8, 2025 13:53
@arielvalentin arielvalentin changed the title feat(sidekiq): scheduled jobs improvements - scheduled operation and add links feat: sidekiq scheduled jobs improvements - scheduled operation and add links Oct 10, 2025
Copy link
Contributor

@arielvalentin arielvalentin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your submission. I have some concerns about how this changes/breaks primary use cases.

Please review my comments and provide any feedback to help me understand how we can support your needs.

attributes[SemanticConventions::Trace::PEER_SERVICE] = instrumentation_config[:peer_service] if instrumentation_config[:peer_service]

scheduled_at = job['at']
op = scheduled_at.nil? ? 'publish' : 'scheduled'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔 This naming does not match any of the semantic convention naming for messaging.

https://github.com/open-telemetry/semantic-conventions/blob/v1.24.0/docs/messaging/messaging-spans.md#span-name

Is there something special about scheduled spans that help you disambiguate them from any other publisher?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will answer in the next comment ...

tracer.in_span(span_name, attributes: attributes, kind: :producer) do |span|
# In case this is Scheduled job, there is already context injected, so link to that context
# NOTE: :propagation_style = :child is not supported as it is quite tricky when :trace_poller_enqueue = true
extracted_context = OpenTelemetry.propagation.extract(job, context: OpenTelemetry::Context::ROOT)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is quite interesting and do not see a case covered in the spec https://github.com/open-telemetry/semantic-conventions/blob/v1.24.0/docs/messaging/messaging-spans.md#span-name

If I understand this correctly, this is disconnecting the parent/child context and treats a Sidekiq scheduled job from the place where it was enqueued?

This seems unexpected to do from the client all cases of links because it is going to disconnect the clients, publishers and consumers from each other generating 3 traces ids instead of 2. By default, even in a link situation, I would expect the publisher span to use parent/child propagation and the consumer to generate links.

E.g. you have a web request that uses wait_until, I would expect the http.server span to have a parent/child relationship with the messaging.producer span:

def checkout
  # ... schedule housekeeping clean up
  GuestsCleanupJob.set(wait_until: Date.tomorrow.noon).perform_later(guest)
end

When the GuestsCleanupJob then executes it would create a new trace, and link the web request trace to it.

Sidekiq maps wait_until to the at attribute AFAICT: https://github.com/sidekiq/sidekiq/blob/96f867cb58b7fa0a6a832af1a732a339aa0eb61f/lib/sidekiq/job.rb#L194

With these changes, it will end up changing this expected behavior and that is undesirable in my opinion.

My recommendation here would be for you to add a utility method to your code or identify something more specific in Sidekiq that would address your use case.

Copy link
Contributor Author

@smoke smoke Oct 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is how Sidekiq is working, to what I have found:

  1. <SomeJob>.perform_in(1.minute) pushes the Job definition to a dedicated queue (sorted set) with name schedule, using the at to sort the set
  2. Sidekiq::Scheduled::Poller runs roughly every average_scheduled_poll_interval,
  3. During the each run, it picks the items in the schedule that are due and pushes to the defined <SomeJob>.queue
  4. <SomeJob>.perform is invoked

When it comes to OpenTelemetry instrumentation of that:

Currently when when using span_naming=:job_class (default is :queue), propagation_style=:link (default is :link) this results in the following spans

  1. <SomeJob> publish and attrs {"messaging.destination.name": "high", ...} time=T, id=123, trace=ABC, parent=123
  2. If using trace_poller_enqueue: true - Sidekiq::Scheduled::Poller#enqueue time~=T+1.minute, id=567, trace=XYZ, parent=nil, having no parent and it enqueues a batch of Jobs
    • no relation / link to p.1
  3. If using trace_poller_enqueue: true - <SomeJob> publish and attrs {"messaging.destination.name": "high", ...} time~=T+1.minute, id=678, trace=XYZ, parent=567
    • no relation / link to p.1
  4. <SomeJob> process and attrs {"messaging.destination.name": "high", ...} time~=T+1.minute+operational delay, id=987, trace=DEF
    • relation / link to p.3, but only if using trace_poller_enqueue: true

Caveats:

  • If using the default trace_poller_enqueue: false - the P.2, P.3 are completely missed
  • If using the default trace_poller_enqueue: true - the P.2, P.3 and P.4 are correlated, but the Span from P.2 (Sidekiq::Scheduled::Poller#enqueue) can't be child of many parents as each there are NxP.1 each on its own Trace, thus only Link is available to correlate.
  • If using the default trace_poller_enqueue: true - the The P.3 (<SomeJob> publish) spans are put on the trace of P.2. The spans of P.2 eventually with quite some work, may be refactored to be put as child of P.1 and have links to P.2, but that is not something I am keen on doing as it is very hard. image
  • If using the default trace_poller_enqueue: false (the default) all of the P.2 and P.3 spans and correlation are missing

My goal is to have meaningful correlation from P.1 to P.4, regardless same Trace or Links, whatever is easy and feasible.
With the above caveats in place, this PR changes

  1. <SomeJob> scheduled and attrs {"messaging.destination.name": "high", ...} time=T, id=123, trace=ABC, parent=123
    • changed from publish to scheduled to distinguish from P.3
  2. If using trace_poller_enqueue: true - Sidekiq::Scheduled::Poller#enqueue time~=T+1.minute, id=567, trace=XYZ, parent=nil, having no parent and it enqueues a batch of Jobs
    • no relation / link to p.1
  3. If using trace_poller_enqueue: true - <SomeJob> publish and attrs {"messaging.destination.name": "high", ...} time~=T+1.minute, id=678, trace=XYZ, parent=567
    • if using propagation_style=:link - a Link to P.1 is added
  4. <SomeJob> process and attrs {"messaging.destination.name": "high", ...} time~=T+1.minute+operational delay, id=987, trace=DEF
    • relation / link to p.3, but only if using trace_poller_enqueue: true

Resulting in this more meaningful list and correlation with links or traces

image

I have been thinking to do some monkey patching or whatever, but it is way harder, moreover this PR has value for others.

So @arielvalentin how about, the following?

  1. Keep correlation as improved (limited Trace and added Links)
  2. If you are still very keen on obliging the standards, I can easily change P.1 spans addressing feat: sidekiq scheduled jobs improvements - scheduled operation and add links #1717 (comment)
    • from <SomeJob> **scheduled** and attrs {"messaging.destination.name": "**high**", ...} time=T, id=123, trace=ABC, parent=123
    • to <SomeJob> **publish** and attrs {"messaging.destination.name": "**schedule**", ...} time=T, id=123, trace=ABC, parent=123

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not a sidekiq user so I'm thinking about this from the perspective of an active job user who may use the sidekiq adapter.

I see your point though of using non standard semconv in the cases where the naming pattern is Job Name instead of semantic conventions. I think in those cases using scheduled in the span name should be fine; however I think the semconv cases should remain untouched.

I'd need to review your comments more carefully since again, I'm not a sidekiq user so I don't quite understand the nuances of the cause of disconnect between the producer and consumer spans in the cases you outlined above.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants