|
1 | 1 | # Logging Strategy |
2 | 2 |
|
3 | | -## Status: Proposed |
| 3 | +## Status |
| 4 | + |
| 5 | +Proposed |
4 | 6 |
|
5 | 7 | ## Context |
6 | 8 |
|
7 | | -We would like to log useful information that in future helps us and the user debug and resolve problems quickly. |
| 9 | +The AI SDK identified some challenges with debugging and problem resolution that can be addressed with better logging. |
| 10 | +Typically, users had to enable wire logs to access information necessary for troubleshooting which along with helpful log surfaces a large volume of unrelated one. |
| 11 | +Additionally, we identified the need to improve visibility into what's happening behind the scenes about application progress to the user. |
| 12 | + |
| 13 | +Key challenges that drove this decision: |
| 14 | + |
| 15 | +- **Debugging difficulties**: Limited visibility into request flows and processing steps |
| 16 | +- **Security concerns**: Risk of accidentally logging sensitive information |
| 17 | +- **User experience**: Users needed better insight into long-running AI operations |
| 18 | +- **Trace ownership**: Reliance on external parties for troubleshooting details |
8 | 19 |
|
9 | | -Currently, the user would have to enable wire logs to show information that enables us to assist them. |
10 | | -We also identified than there is benefit to show a story of what is happening behind the scenes to the user which informs them on application progress. |
11 | | -We want to rely on application level logs from ai sdk for the same. |
| 20 | +## Decision |
| 21 | + |
| 22 | +Devise and follow a comprehensive logging guidelines that prioritizes **debugging capability** and **user visibility** while maintaining **security** and **performance**. The approach emphasizes descriptive, human-readable logs with structured request tracking through MDC (Mapped Diagnostic Context). |
12 | 23 |
|
13 | 24 | ## Guidelines |
14 | 25 |
|
15 | | -- Don't log any sensitive information eg: names, ids etc. The general rule to follow is to avoid logging request or response payloads. |
16 | | -- Logs must contain useful and non-redundant information to keep them concise and to the point. |
17 | | -- Keep per-request log below INFO level to avoid littering the output and further making it difficult to identify related logs |
18 | | -- Prefer to keep one-time logs in a single application run at INFO level. |
19 | | -- Make use of MDC as appropriate and deliberate about clearing the context. |
20 | | -- MDC must not be cleared as a whole but only by key basis to avoid unintentionally clearing context items that merit a longer lifespan |
21 | | -- To be deliberate about logging, we limit logging out of generated classes, consequently limiting logs produced from modules like document-grounding, prompt-registry. |
22 | | -- Limit unnecessary warning logs to also avoid littering the console logging |
23 | | -- MDC is currently leveraged to contain and transport information when logging is performed on a different block down the flow of control chain. We need special care on where the context is set. Any per-request context set in MDC is clearing up at response completion. Consequently, setting per-request context outside of blocks/objects that are not invoked/consumed per-request can lead to incomplete MDC states which will in turn corrupt logs produced. |
24 | | -- We are generally inclined towards descriptive and human-readable logs and ideally contain, service invoked, endpoint etc and request logging. We do not expect machine readability of logs but still following "metric=value" patterns for the sake of extensibility and readability of them. |
25 | | -- We may also correlate request level logs with request identifier reqId to assist debugging. |
26 | | -- In the case for logging information not simply read but involve computation or inference cost, we must deliberately deflect the effort based on log level or other available context. |
27 | | -- Centralize MDC handling. Using magic string as keys or values for the context may be lost in maintenance and add towards collaborative effort. We may wrap the access in a format that explicitly declares context keys and assist discoverability of all expected context items, so they are not lost in refactoring effort. |
28 | | -- The responsibility of MDC context clearing lies with the class that sets the context for the sake of readability. |
29 | | -- Explicitly log at request start and don't purely rely on logging at response time. A request may take long or even fail. Additionally, inferring response metrics even for a success case may be non-trivial eg: duration for streaming. The most critical benefit is immediate visibility for the user about the effect of their action for even long-running requests. Additionally, this avoids the the scenario where the surface error is ambiguous and needs investigation of stack-trace. |
30 | | - |
31 | | -### Unattended |
32 | | -- Unit test logging |
33 | | -- Logback |
34 | | -- logging pattern |
35 | | -- Separate logging into per request and per application runtime. |
36 | | -- A separate section for MDC Handling |
| 26 | +### 1. Content and Security |
| 27 | + |
| 28 | +* **Avoid sensitive information.** |
| 29 | + Do not log any personally identifiable or confidential data such as names, IDs, tokens, or payload content. As a general rule, avoid logging full request or response bodies. |
| 30 | + |
| 31 | +* **Keep logs concise and relevant.** |
| 32 | + Every log should convey meaningful information without redundancy. Avoid verbose, repetitive, or cosmetic details. |
| 33 | + |
| 34 | +* **Use descriptive, human-readable formats.** |
| 35 | + Logs should read naturally and provide enough context for a developer to understand what happened without consulting the code. Include details such as the invoked service, endpoint etc. as applicable. |
| 36 | + |
| 37 | +* **Maintain extensible readability.** |
| 38 | + While logs are intended for humans, follow the `metric=value` pattern where practical. This improves future extensibility and allows easier parsing if machine analysis becomes necessary. |
| 39 | + |
| 40 | +* **Correlate related logs.** |
| 41 | + Include a request identifier (e.g., `reqId`) in per-request logs to assist with correlation and debugging. |
| 42 | + |
| 43 | +--- |
| 44 | + |
| 45 | +### 2. Log Levels and Scope |
| 46 | + |
| 47 | +* **Per-request logs.** |
| 48 | + Keep per-request logs **below INFO level** (e.g., DEBUG or TRACE) to prevent cluttering normal application output. |
| 49 | + |
| 50 | +* **Application runtime logs.** |
| 51 | + Prefer **INFO level** for one-time or startup/shutdown logs that occur once per application run. |
| 52 | + |
| 53 | +* **Avoid unnecessary warnings.** |
| 54 | + Use the WARNING level only for actionable or genuinely concerning conditions. Do not use it as a placeholder or for expected transient states. |
| 55 | + |
| 56 | +* **Explicit request lifecycle logging.** |
| 57 | + Always log at **request start** to provide immediate visibility that an operation has begun. Do not rely solely on response-time logging — requests may fail, hang, or take long durations (e.g., streaming cases). This helps users understand that their request is being processed even before a result is available. |
| 58 | + |
| 59 | +* **Performance-aware logging.** |
| 60 | + If a log statement requires computation or inference to generate information, guard it with a log-level check (e.g., `if (log.isDebugEnabled())`) to avoid unnecessary overhead when that level is disabled. |
| 61 | + |
| 62 | +--- |
| 63 | + |
| 64 | +### 3. MDC (Mapped Diagnostic Context) |
| 65 | + |
| 66 | +* **Purpose and usage.** |
| 67 | + MDC is used to carry contextual information (e.g., `reqId`, `endpoint`, `service`) across execution blocks within the same thread. |
| 68 | + |
| 69 | +* **Setting and clearing context.** |
| 70 | + Set MDC values deliberately and close to their scope of relevance. Per-request MDC context must be cleared when the response completes. Avoid setting per-request values in long-lived objects that outlive the request lifecycle, as this can result in corrupted or incomplete log context. |
| 71 | + |
| 72 | +* **Granular clearing only.** |
| 73 | + Never clear the entire MDC context. Instead, remove entries key-by-key to preserve unrelated context items that may remain valid for longer periods. |
| 74 | + |
| 75 | +* **Centralized MDC management.** |
| 76 | + Avoid using magic strings for MDC keys or values. Define them in a dedicated structure or utility (e.g., `MdcKeys` class) to ensure discoverability and prevent errors during refactoring. |
| 77 | + |
| 78 | +* **Responsibility and ownership.** |
| 79 | + The component or class that sets MDC context values is also responsible for clearing them. This maintains clarity and ensures proper lifecycle management. |
| 80 | + |
| 81 | +--- |
| 82 | + |
| 83 | +### 4. Logging Boundaries and Generation |
| 84 | + |
| 85 | +* **Deliberate logging boundaries.** |
| 86 | + Generated code (such as those in modules like *document-grounding* or *prompt-registry*) should log minimally and preferably avoid. Logging should be centralized in higher-level components to maintain consistency and reduce noise. |
| 87 | + |
| 88 | +## Alternatives |
| 89 | + |
| 90 | +### Logging Framework Options |
| 91 | + |
| 92 | +1. **slf4j-simple** (previous): Simple but limited configuration and poor production suitability |
| 93 | +2. **logback-classic** (chosen): Industry standard with rich configuration, performance, and features |
| 94 | + |
| 95 | +### Log Format Approaches |
| 96 | + |
| 97 | +1. **Structured JSON logging**: Machine-readable but harder for human debugging |
| 98 | +2. **Descriptive human-readable** (chosen): Prioritizes developer and user experience while maintaining some structure |
| 99 | + |
| 100 | +## Consequences |
| 101 | + |
| 102 | +### Positive |
| 103 | + |
| 104 | +- **Improved debugging**: Comprehensive request tracking enables faster problem resolution |
| 105 | +- **Better user experience**: Users can see progress of long-running AI operations |
| 106 | +- **Security compliance**: Systematic approach prevents accidental logging of sensitive data |
| 107 | + |
| 108 | +### Negative |
| 109 | + |
| 110 | +- **Increased debug log volume**: More detailed logging may increase storage and processing overhead |
| 111 | +- **Development overhead**: Developers must be mindful of proper MDC management as improper MDC handling could lead to incorrect request correlation |
0 commit comments