|
| 1 | +# Logging Strategy |
| 2 | + |
| 3 | +## Status |
| 4 | + |
| 5 | +Proposed |
| 6 | + |
| 7 | +## Context |
| 8 | + |
| 9 | +The AI SDK identified some challenges with debugging and problem resolution that can be addressed with better logging. |
| 10 | +Typically, users had to enable wire logs to access information necessary for troubleshooting which along with helpful log surfaces a large volume of unrelated one. |
| 11 | +Additionally, we identified the need to improve visibility into what's happening behind the scenes about application progress to the user. |
| 12 | + |
| 13 | +Key challenges that drove this decision: |
| 14 | + |
| 15 | +- **Debugging difficulties**: Limited visibility into request flows and processing steps |
| 16 | +- **Security concerns**: Risk of accidentally logging sensitive information |
| 17 | +- **User experience**: Users needed better insight into long-running AI operations |
| 18 | +- **Trace ownership**: Reliance on external parties for troubleshooting details |
| 19 | + |
| 20 | +## Decision |
| 21 | + |
| 22 | +Devise and follow a comprehensive logging guidelines that prioritizes **debugging capability** and **user visibility** while maintaining **security** and **performance**. The approach emphasizes descriptive, human-readable logs with structured request tracking through MDC (Mapped Diagnostic Context). |
| 23 | + |
| 24 | +## Guidelines |
| 25 | + |
| 26 | +### 1. Content and Security |
| 27 | + |
| 28 | +* **Avoid sensitive information.** |
| 29 | + Do not log any personally identifiable or confidential data such as names, IDs, tokens, or payload content. As a general rule, avoid logging full request or response bodies. |
| 30 | + |
| 31 | +* **Keep logs concise and relevant.** |
| 32 | + Every log should convey meaningful information without redundancy. Avoid verbose, repetitive, or cosmetic details. |
| 33 | + |
| 34 | +* **Use descriptive, human-readable formats.** |
| 35 | + Logs should read naturally and provide enough context for a developer to understand what happened without consulting the code. Include details such as the invoked service, endpoint etc. as applicable. |
| 36 | + |
| 37 | +* **Maintain extensible readability.** |
| 38 | + While logs are intended for humans, follow the `metric=value` pattern where practical. This improves future extensibility and allows easier parsing if machine analysis becomes necessary. |
| 39 | + |
| 40 | +* **Correlate related logs.** |
| 41 | + Include a request identifier (e.g., `reqId`) in per-request logs to assist with correlation and debugging. |
| 42 | + |
| 43 | +--- |
| 44 | + |
| 45 | +### 2. Log Levels and Scope |
| 46 | + |
| 47 | +* **Per-request logs.** |
| 48 | + Keep per-request logs **below INFO level** (e.g., DEBUG or TRACE) to prevent cluttering normal application output. |
| 49 | + |
| 50 | +* **Application runtime logs.** |
| 51 | + Prefer **INFO level** for one-time or startup/shutdown logs that occur once per application run. |
| 52 | + |
| 53 | +* **Avoid unnecessary warnings.** |
| 54 | + Use the WARNING level only for actionable or genuinely concerning conditions. Do not use it as a placeholder or for expected transient states. |
| 55 | + |
| 56 | +* **Explicit request lifecycle logging.** |
| 57 | + Always log at **request start** to provide immediate visibility that an operation has begun. Do not rely solely on response-time logging — requests may fail, hang, or take long durations (e.g., streaming cases). This helps users understand that their request is being processed even before a result is available. |
| 58 | + |
| 59 | +* **Performance-aware logging.** |
| 60 | + If a log statement requires computation or inference to generate information, guard it with a log-level check (e.g., `if (log.isDebugEnabled())`) to avoid unnecessary overhead when that level is disabled. |
| 61 | + |
| 62 | +--- |
| 63 | + |
| 64 | +### 3. MDC (Mapped Diagnostic Context) |
| 65 | + |
| 66 | +* **Purpose and usage.** |
| 67 | + MDC is used to carry contextual information (e.g., `reqId`, `endpoint`, `service`) across execution blocks within the same thread. |
| 68 | + |
| 69 | +* **Setting and clearing context.** |
| 70 | + Set MDC values deliberately and close to their scope of relevance. Per-request MDC context must be cleared when the response completes. Avoid setting per-request values in long-lived objects that outlive the request lifecycle, as this can result in corrupted or incomplete log context. |
| 71 | + |
| 72 | +* **Granular clearing only.** |
| 73 | + Never clear the entire MDC context. Instead, remove entries key-by-key to preserve unrelated context items that may remain valid for longer periods. |
| 74 | + |
| 75 | +* **Centralized MDC management.** |
| 76 | + Avoid using magic strings for MDC keys or values. Define them in a dedicated structure or utility (e.g., `MdcKeys` class) to ensure discoverability and prevent errors during refactoring. |
| 77 | + |
| 78 | +* **Responsibility and ownership.** |
| 79 | + The component or class that sets MDC context values is also responsible for clearing them. This maintains clarity and ensures proper lifecycle management. |
| 80 | + |
| 81 | +--- |
| 82 | + |
| 83 | +### 4. Logging Boundaries and Generation |
| 84 | + |
| 85 | +* **Deliberate logging boundaries.** |
| 86 | + Generated code (such as those in modules like *document-grounding* or *prompt-registry*) should log minimally and preferably avoid. Logging should be centralized in higher-level components to maintain consistency and reduce noise. |
| 87 | + |
| 88 | +## Alternatives |
| 89 | + |
| 90 | +### Logging Framework Options |
| 91 | + |
| 92 | +1. **slf4j-simple** (previous): Simple but limited configuration and poor production suitability |
| 93 | +2. **logback-classic** (chosen): Industry standard with rich configuration, performance, and features |
| 94 | + |
| 95 | +### Log Format Approaches |
| 96 | + |
| 97 | +1. **Structured JSON logging**: Machine-readable but harder for human debugging |
| 98 | +2. **Descriptive human-readable** (chosen): Prioritizes developer and user experience while maintaining some structure |
| 99 | + |
| 100 | +## Consequences |
| 101 | + |
| 102 | +### Positive |
| 103 | + |
| 104 | +- **Improved debugging**: Comprehensive request tracking enables faster problem resolution |
| 105 | +- **Better user experience**: Users can see progress of long-running AI operations |
| 106 | +- **Security compliance**: Systematic approach prevents accidental logging of sensitive data |
| 107 | + |
| 108 | +### Negative |
| 109 | + |
| 110 | +- **Increased debug log volume**: More detailed logging may increase storage and processing overhead |
| 111 | +- **Development overhead**: Developers must be mindful of proper MDC management as improper MDC handling could lead to incorrect request correlation |
0 commit comments