First draft

rpanackal · rpanackal · commit 07ce6be1b967 · 2025-11-10T11:28:53.000+01:00
diff --git a/docs/adrs/006-logging-strategy.md b/docs/adrs/006-logging-strategy.md
@@ -1,36 +1,111 @@
 # Logging Strategy
 
-## Status: Proposed
+## Status
+
+Proposed
 
 ## Context
 
-We would like to log useful information that in future helps us and the user debug and resolve problems quickly. 
+The AI SDK identified some challenges with debugging and problem resolution that can be addressed with better logging.
+Typically, users had to enable wire logs to access information necessary for troubleshooting which along with helpful log surfaces a large volume of unrelated one.
+Additionally, we identified the need to improve visibility into what's happening behind the scenes about application progress to the user.
+
+Key challenges that drove this decision:
+
+- **Debugging difficulties**: Limited visibility into request flows and processing steps
+- **Security concerns**: Risk of accidentally logging sensitive information
+- **User experience**: Users needed better insight into long-running AI operations
+- **Trace ownership**: Reliance on external parties for troubleshooting details
 
-Currently, the user would have to enable wire logs to show information that enables us to assist them. 
-We also identified than there is benefit to show a story of what is happening behind the scenes to the user which informs them on application progress.
-We want to rely on application level logs from ai sdk for the same.
+## Decision
+
+Devise and follow a comprehensive logging guidelines that prioritizes **debugging capability** and **user visibility** while maintaining **security** and **performance**. The approach emphasizes descriptive, human-readable logs with structured request tracking through MDC (Mapped Diagnostic Context).
 
 ## Guidelines
 
-- Don't log any sensitive information eg: names, ids etc. The general rule to follow is to avoid logging request or response payloads.
-- Logs must contain useful and non-redundant information to keep them concise and to the point.
-- Keep per-request log below INFO level to avoid littering the output and further making it difficult to identify related logs
-- Prefer to keep one-time logs in a single application run at INFO level.
-- Make use of MDC as appropriate and deliberate about clearing the context.
-- MDC must not be cleared as a whole but only by key basis to avoid unintentionally clearing context items that merit a longer lifespan
-- To be deliberate about logging, we limit logging out of generated classes, consequently limiting logs produced from modules like document-grounding, prompt-registry.
-- Limit unnecessary warning logs to also avoid littering the console logging
-- MDC is currently leveraged to contain and transport information when logging is performed on a different block down the flow of control chain. We need special care on where the context is set. Any per-request context set in MDC is clearing up at response completion. Consequently, setting per-request context outside of blocks/objects that are not invoked/consumed per-request can lead to incomplete MDC states which will in turn corrupt logs produced. 
-- We are generally inclined towards descriptive and human-readable logs and ideally contain, service invoked, endpoint etc and request logging. We do not expect machine readability of logs but still following "metric=value" patterns for the sake of extensibility and readability of them.
-- We may also correlate request level logs with request identifier reqId to assist debugging.
-- In the case for logging information not simply read but involve computation or inference cost, we must deliberately deflect the effort based on log level or other available context.
-- Centralize MDC handling. Using magic string as keys or values for the context may be lost in maintenance and add towards collaborative effort. We may wrap the access in a format that explicitly declares context keys and assist discoverability of all expected context items, so they are not lost in refactoring effort. 
-- The responsibility of MDC context clearing lies with the class that sets the context for the sake of readability. 
-- Explicitly log at request start and don't purely rely on logging at response time. A request may take long or even fail. Additionally, inferring response metrics even for a success case may be non-trivial eg: duration for streaming. The most critical benefit is immediate visibility for the user about the effect of their action for even long-running requests. Additionally, this avoids the the scenario where the surface error is ambiguous and needs investigation of stack-trace. 
-
-### Unattended
-- Unit test logging
-- Logback
-- logging pattern
-- Separate logging into per request and per application runtime.
-- A separate section for MDC Handling
+### 1. Content and Security
+
+* **Avoid sensitive information.**
+  Do not log any personally identifiable or confidential data such as names, IDs, tokens, or payload content. As a general rule, avoid logging full request or response bodies.
+
+* **Keep logs concise and relevant.**
+  Every log should convey meaningful information without redundancy. Avoid verbose, repetitive, or cosmetic details.
+
+* **Use descriptive, human-readable formats.**
+  Logs should read naturally and provide enough context for a developer to understand what happened without consulting the code. Include details such as the invoked service, endpoint etc. as applicable.
+
+* **Maintain extensible readability.**
+  While logs are intended for humans, follow the `metric=value` pattern where practical. This improves future extensibility and allows easier parsing if machine analysis becomes necessary.
+
+* **Correlate related logs.**
+  Include a request identifier (e.g., `reqId`) in per-request logs to assist with correlation and debugging.
+
+---
+
+### 2. Log Levels and Scope
+
+* **Per-request logs.**
+  Keep per-request logs **below INFO level** (e.g., DEBUG or TRACE) to prevent cluttering normal application output.
+
+* **Application runtime logs.**
+  Prefer **INFO level** for one-time or startup/shutdown logs that occur once per application run.
+
+* **Avoid unnecessary warnings.**
+  Use the WARNING level only for actionable or genuinely concerning conditions. Do not use it as a placeholder or for expected transient states.
+
+* **Explicit request lifecycle logging.**
+  Always log at **request start** to provide immediate visibility that an operation has begun. Do not rely solely on response-time logging — requests may fail, hang, or take long durations (e.g., streaming cases). This helps users understand that their request is being processed even before a result is available.
+
+* **Performance-aware logging.**
+  If a log statement requires computation or inference to generate information, guard it with a log-level check (e.g., `if (log.isDebugEnabled())`) to avoid unnecessary overhead when that level is disabled.
+
+---
+
+### 3. MDC (Mapped Diagnostic Context)
+
+* **Purpose and usage.**
+  MDC is used to carry contextual information (e.g., `reqId`, `endpoint`, `service`) across execution blocks within the same thread.
+
+* **Setting and clearing context.**
+  Set MDC values deliberately and close to their scope of relevance. Per-request MDC context must be cleared when the response completes. Avoid setting per-request values in long-lived objects that outlive the request lifecycle, as this can result in corrupted or incomplete log context.
+
+* **Granular clearing only.**
+  Never clear the entire MDC context. Instead, remove entries key-by-key to preserve unrelated context items that may remain valid for longer periods.
+
+* **Centralized MDC management.**
+  Avoid using magic strings for MDC keys or values. Define them in a dedicated structure or utility (e.g., `MdcKeys` class) to ensure discoverability and prevent errors during refactoring.
+
+* **Responsibility and ownership.**
+  The component or class that sets MDC context values is also responsible for clearing them. This maintains clarity and ensures proper lifecycle management.
+
+---
+
+### 4. Logging Boundaries and Generation
+
+* **Deliberate logging boundaries.**
+  Generated code (such as those in modules like *document-grounding* or *prompt-registry*) should log minimally and preferably avoid. Logging should be centralized in higher-level components to maintain consistency and reduce noise.
+
+## Alternatives
+
+### Logging Framework Options
+
+1. **slf4j-simple** (previous): Simple but limited configuration and poor production suitability
+2. **logback-classic** (chosen): Industry standard with rich configuration, performance, and features
+
+### Log Format Approaches
+
+1. **Structured JSON logging**: Machine-readable but harder for human debugging
+2. **Descriptive human-readable** (chosen): Prioritizes developer and user experience while maintaining some structure
+
+## Consequences
+
+### Positive
+
+- **Improved debugging**: Comprehensive request tracking enables faster problem resolution
+- **Better user experience**: Users can see progress of long-running AI operations
+- **Security compliance**: Systematic approach prevents accidental logging of sensitive data
+
+### Negative
+
+- **Increased debug log volume**: More detailed logging may increase storage and processing overhead
+- **Development overhead**: Developers must be mindful of proper MDC management as improper MDC handling could lead to incorrect request correlation