Skip to content

feat: Add Rage instrumentation#1904

Closed
rsamoilov wants to merge 6 commits intoopen-telemetry:mainfrom
rage-rb:feature/rage-instrumentation
Closed

feat: Add Rage instrumentation#1904
rsamoilov wants to merge 6 commits intoopen-telemetry:mainfrom
rage-rb:feature/rage-instrumentation

Conversation

@rsamoilov
Copy link

@rsamoilov rsamoilov commented Jan 5, 2026

Summary

This PR adds OpenTelemetry instrumentation for Rage, a fiber-based Ruby web framework.

The instrumentation leverages Rage's built-in Telemetry interface to instrument:

  • HTTP requests - Enriches Rack spans with controller/action information and route patterns
  • WebSocket connections - Traces connections, channel subscriptions, and broadcasts with span linking
  • Event bus - Instruments event publishing and subscriber processing with context propagation
  • Background jobs - Traces job enqueuing and execution with context propagation and retry tracking
  • Fiber context propagation - Ensures OpenTelemetry context flows correctly across application-level fibers

Implementation Details:

  • Uses Semantic Conventions v1.36+
  • Implements span linking to tie WebSocket connections back to initial handshakes
  • Adds trace/span IDs to Rage logs

Note: The Rage version compatible with this instrumentation (1.20.0+) has not been released yet. This PR is intended to be merged before the Rage release so that the instrumentation is available immediately when users upgrade.

@linux-foundation-easycla
Copy link

linux-foundation-easycla bot commented Jan 5, 2026

CLA Signed

The committers listed above are authorized under a signed CLA.

@arielvalentin
Copy link
Contributor

Thank you @rsamoilov for your contribution. I see that you are a core contributor of rage-rb so I am pleased that you all have taken the time to instrument the framework using OpenTelemetry!

Our preference would eventually be for library maintainers to maintain first party instrumentations in their respective repositories as opposed to the contrib package. In other words, have the rage-rb-otel instrumentation be included in your library and we work with you to publish it in the OTel registry.

This would give you the flexibility to maintain the instrumentation as a part of your organization and reduce friction trying to get changes merged into the contrib repo. We have limited resources and lack the expertise in the framework so it will be difficult for us to provide you with a thorough review.

The areas where I think that we will be able to help with code-reviews or unblock you when faced with incompatibilities with the SDK.

Is this something you would be amenable to?

@rsamoilov
Copy link
Author

Hi @arielvalentin

This makes sense to me. I will create a repo and get back to you. Thanks!


OpenTelemetry::Context.with_current(otel_context) do
attributes = {
SemConv::Incubating::MESSAGING::MESSAGING_SYSTEM => 'rage.deferred',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As above

http_route = request.route_uri_pattern
span.name = "#{request.method} #{http_route}"

attributes = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any additional semconv attributes that could be added?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not that I can think of - these are pretty much the same attributes Action Pack adds to requests. Any recommendations?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're referring to a client span for outbound connections. handlers/request.rb processes inbound connections.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case look at the server span.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most of those attribute are already being added to the span inside the Rack instrumentation:

  • http.request.method
  • server.address
  • url.scheme
  • url.path
  • url.query
  • user_agent.original
  • http.route
  • http.response.status_code

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you saying that those attributes are being added to this span elsewhere or are they on different spans?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are added by the Rack instrumentation.

@rsamoilov
Copy link
Author

@arielvalentin I've created rage-rb/opentelemetry-instrumentation. Let me know how you'd like to proceed.

OpenTelemetry::Context.with_current(handshake_context) do
attributes = {
SemConv::Incubating::MESSAGING::MESSAGING_SYSTEM => 'rage.cable',
SemConv::Incubating::MESSAGING::MESSAGING_DESTINATION_NAME => channel.class.name,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
SemConv::Incubating::MESSAGING::MESSAGING_DESTINATION_NAME => channel.class.name,
'rage.cable.stream.name' => connection.class.name,

attributes = {
SemConv::Incubating::MESSAGING::MESSAGING_SYSTEM => 'rage.cable',
SemConv::Incubating::MESSAGING::MESSAGING_DESTINATION_NAME => connection.class.name,
SemConv::Incubating::CODE::CODE_FUNCTION_NAME => "#{connection.class}##{action}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
SemConv::Incubating::CODE::CODE_FUNCTION_NAME => "#{connection.class}##{action}"
'rpc.operation.type' => "{action}"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cable is a pub/sub (broadcast to subscribers) system, not RPC (call and wait for response). While channels can be triggered usin RPC actions, there's no return value. Using rpc.operation.type essentially misclassifies the system.

RPC also considers streaming which is not request and response.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a bit hard for me to reason about it without understanding what a messaging system is in the OpenTelemetry terms.

Anyway, the latest semantic conventions state that for streaming RPCs, the server span should cover the full lifetime of the request and/or response streams until they are closed or terminated.

This implies that:

  1. RPC is considered request/response, which is not true for WebSockets.
  2. The RPC span should cover the entire lifetime of the WebSocket connection. This is unrealistic for WebSockets as connections can be open for days and weeks.

Please let me know if I've misinterpreted it.

Copy link
Contributor

@thompson-tomo thompson-tomo Jan 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that the defination of rpc covering the entire duration as one span makes it harder, hopefully it will be addressed as part of the rpc stabilisation effort. I do know streaming was initially out of scope.

It's a bit hard for me to reason about it without understanding what a messaging system is in the OpenTelemetry terms.

Agree that having the definition/requirements of things like messaging.system documented is important.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But you do recommend going with the RPC attributes, right?

attributes = {
SemConv::Incubating::MESSAGING::MESSAGING_SYSTEM => 'rage.cable',
SemConv::Incubating::MESSAGING::MESSAGING_DESTINATION_NAME => channel.class.name,
SemConv::Incubating::CODE::CODE_FUNCTION_NAME => "#{channel.class}##{action}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
SemConv::Incubating::CODE::CODE_FUNCTION_NAME => "#{channel.class}##{action}"
'rpc.operation.type' => "{action}"

def self.create_broadcast_span(stream:)
attributes = {
SemConv::Incubating::MESSAGING::MESSAGING_SYSTEM => 'rage.cable',
SemConv::Incubating::MESSAGING::MESSAGING_OPERATION_TYPE => 'publish',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
SemConv::Incubating::MESSAGING::MESSAGING_OPERATION_TYPE => 'publish',
'rpc.operation.type' => 'publish',

http_route = request.route_uri_pattern
span.name = "#{request.method} #{http_route}"

attributes = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case look at the server span.

@rsamoilov
Copy link
Author

Hey @thompson-tomo , thanks for the review!

'rage.cable.stream.name' => connection.class.name

Same as with Action Cable, streams aren't properties of connections - they are being created when the application code executes stream_from. A single client can be associated with zero or multiple streams, so using rage.cable.stream.name when accepting connections/subscriptions doesn't seem semantically correct.

Using messaging.destination.name with the connection/channel class name identifies the actual destination of the message.

'messaging.client.id' => subscriber.class

Looking at other implementations and OTel docs, messaging.client.id seems to be used to identify the instance that processes the message (process/host) rather than the code being executed.

'rpc.operation.type' => 'publish',

Cable is a pub/sub (broadcast to subscribers) system, not RPC (call and wait for response). While channels can be triggered usin RPC actions, there's no return value. Using rpc.operation.type essentially misclassifies the system.

The Workflow proposal should cater for this.

Background job queues have all attributes of messaging systems; since workflow conventions are still a proposal, I'd propose using established messaging conventions now and migrating to workflow once it's stable.

@thompson-tomo
Copy link
Contributor

Have gone and transfered the above feedback to the review comments and responded accordingly

@rsamoilov
Copy link
Author

@arielvalentin @thompson-tomo Just checking in on the PR - please let me know how we can proceed.

Currently, most disagreements are about the Cable instrumentation. I’m happy to delete it and tackle in another PR if you think it makes sense.

@rsamoilov
Copy link
Author

@arielvalentin @thompson-tomo Hey folks, thank you for your feedback! I'll give the PR one more week for review. If there are no concerns, I'll then move forward with maintaining this under the Rage org.

@dazuma Would you be able to take a look at the instrumentation when you get a chance?

@arielvalentin
Copy link
Contributor

@rsamoilov please proceed with releasing the code as a part of your repo!

Happy to help answer questions or provide feedback there as well.

the "what code rag" info is carried inside the span name; `code.*` is used for debugging, not primary semantic modeling;
@rsamoilov
Copy link
Author

@thompson-tomo Made some updates:

  • Reverted to the server span kind for Cable channels
  • Removed code.* attributes
  • Updated Cable instrumentation to use custom websocket.* attributes - let me know if you think rpc.* or rage.cable.* attributes will work better
  • Updated Deferred instrumentation to use the workflow.* attributes

@rsamoilov
Copy link
Author

Closed in favor of #1975.

@rsamoilov rsamoilov closed this Feb 1, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants