Skip to content

Conversation

@kapoorabhishek24
Copy link
Contributor

@kapoorabhishek24 kapoorabhishek24 commented Jan 2, 2026

GCP logs were missing request/correlation IDs, which made it hard to correlate logs with traces. PR implements the below:

Summary

  • Add a context-aware OTLP log layer that injects request_id into log records
  • Centralize request context propagation in tracing spans with a thread-local fallback and request-boundary cleanup.
  • Normalize request/correlation ID extraction and fallback behavior to keep IDs consistent across logs and traces.

Changes

  • Replace the OpenTelemetry tracing appender with a custom log layer + formatter.
  • Store request context in span extensions and thread-local fallback; add Rocket fairing cleanup at request boundaries.
  • Update request guards to set both IDs and emit them as trace attributes for cross-system correlation.
  • Simplify endpoint wiring by relying on the Tracing guard for context setup.

@kapoorabhishek24 kapoorabhishek24 changed the title Add request_id/correlation_id in logs Feature: Add request_id/ correlation_id in logs Jan 5, 2026
@kapoorabhishek24 kapoorabhishek24 marked this pull request as ready for review January 5, 2026 23:52
Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 03aeb04a09

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines 26 to 29
thread_local! {
/// Fallback storage for disconnected span scenarios.
/// Thread-bound, not task-bound; clear at request boundaries to prevent leakage.
static THREAD_REQUEST_CONTEXT: RefCell<Option<RequestContext>> = const { RefCell::new(None) };

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Avoid thread-local fallback in async request handling

Using a thread-local fallback for request context means concurrent async requests that happen to run on the same worker thread will overwrite each other’s IDs. In a tokio-based Rocket server, multiple requests can interleave on one thread, so logs from request A can be tagged with request B’s request_id/correlation_id once B calls set_request_context. This is a regression from the prior span-only approach and will misattribute logs under load. Consider using task-local storage or only span extensions for async servers so context stays per-request.

Useful? React with 👍 / 👎.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, switched to a task-local storage keyed by tokio::task::Id instead of thread-local. Span extensions remain the primary lookup, task-local is just a fallback for logs that lose span context.

Copy link
Contributor

@GTC6244 GTC6244 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks pretty good, Abhi! Couple of comments that I may be wrong about - hoping this is the case! ;-)

extract_correlation_id(req).unwrap_or_else(|| format!("LD-{}", Uuid::new_v4()));
let (request_id, correlation_id) = extract_request_and_correlation_ids(req);

// Generate fallback ID only if BOTH headers are missing
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this an anticipated situation? I ask, because each node will create a different one, and we'll only be able to trace what's happening on a single node this way.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah i don't think we want a fallback ID. leave it empty if there's no ID parsed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed this entirely per Chris's feedback. If no header is provided, we leave it empty.


use crate::error::{EC, Error, Result, conversion_err_code, validation_err_code};

pub const HEADER_KEY_X_CORRELATION_ID: &str = "X-Correlation-Id";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as far as I can tell, X-Correlation-Id is just used for backwards compatibility, and the JS SDK stuffs in the request-id value.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes i believe the JS SDK uses X-Request-Id. code is here: https://github.com/LIT-Protocol/js-sdk/blob/6832ff6255f1320946cfba270eeb7e62fd341ba1/packages/lit-client/src/lib/LitNodeClient/LitNodeApi/src/helper/sendNodeRequest.ts#L42

we can standardize on X-Request-Id since that's what the client uses, or we can support both

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nevermind - i see you handle both header options below.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we support both. X-Request-Id is preferred, but we fall back to X-Correlation-Id for older clients. I remember discussing this before.

use rocket::fairing::{Fairing, Info, Kind};
use rocket::{Data, Request, Response};

/// Clears thread-local request context at request start and after response.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we do this? I thought ( may be incorrectly ) that Rocket uses tokio async , and that thread-local storage didn't hold up under load. Rocket has a manager for this in Rocket.Request.Request cache, IIRC.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing this out. Although I switched to task-local keyed by tokio::task::Id. Looked into Rocket's request cache but the tracing layer doesn't have access to the Request object? so went with task-local approach. Also, added a fairing to clear it at request boundaries. what do you think of this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking maybe the other way - use rocket's cache from inside the fairing to sync the tracing context ... but you might have the same ( though inverse issue ). At the end of the day ( pardon the expression) if this works in a test environment, I'd give it a 👍 !

Copy link
Contributor

@glitch003 glitch003 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks good - only change i would make is not to make up a fallback request id. if there's no ID parsed, just leave the field empty


use crate::error::{EC, Error, Result, conversion_err_code, validation_err_code};

pub const HEADER_KEY_X_CORRELATION_ID: &str = "X-Correlation-Id";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes i believe the JS SDK uses X-Request-Id. code is here: https://github.com/LIT-Protocol/js-sdk/blob/6832ff6255f1320946cfba270eeb7e62fd341ba1/packages/lit-client/src/lib/LitNodeClient/LitNodeApi/src/helper/sendNodeRequest.ts#L42

we can standardize on X-Request-Id since that's what the client uses, or we can support both

extract_correlation_id(req).unwrap_or_else(|| format!("LD-{}", Uuid::new_v4()));
let (request_id, correlation_id) = extract_request_and_correlation_ids(req);

// Generate fallback ID only if BOTH headers are missing
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah i don't think we want a fallback ID. leave it empty if there's no ID parsed.


use crate::error::{EC, Error, Result, conversion_err_code, validation_err_code};

pub const HEADER_KEY_X_CORRELATION_ID: &str = "X-Correlation-Id";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nevermind - i see you handle both header options below.

@kapoorabhishek24 kapoorabhishek24 changed the title Feature: Add request_id/ correlation_id in logs Feature: Add request_id in logs Jan 7, 2026
@github-actions
Copy link

github-actions bot commented Jan 8, 2026

PASS [ 43.056s] (3/3) lit_node::test toxiproxy::perf_tests::load_with_no_latency
PASS [ 43.215s] (2/3) lit_node::test toxiproxy::perf_tests::load_with_50ms_latency_single_link
PASS [ 87.899s] (1/3) lit_node::test toxiproxy::perf_tests::load_with_50ms_latency_all_links

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants