Skip to content

Commit 6cf2fa2

Browse files
committed
chore(logging): update docs
1 parent ac8f7aa commit 6cf2fa2

File tree

1 file changed

+223
-5
lines changed

1 file changed

+223
-5
lines changed

docs/ADRs/2025-12-29-unified-logging-library.md

Lines changed: 223 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -177,7 +177,7 @@ The library provides multiple layers of defense:
177177

178178
## Implementation Status
179179

180-
### Completed (December 2025 - January 2026)
180+
### Phase 1: Core Library & Initial Integration (Completed)
181181

182182
The core library is complete with production/development API separation, automatic PII sanitization through Microsoft Presidio integration, multi-transport support, global singleton architecture, in-memory buffer, and console override implementation.
183183

@@ -191,10 +191,228 @@ Tooling includes automated Presidio pattern updates via `yarn presidio:update`,
191191

192192
Comprehensive documentation covers the library, Presidio integration, and this architectural decision record. Full test coverage validates the implementation.
193193

194-
### Remaining Work
194+
### Phase 2: Production Validation & Distribution (In Progress)
195195

196-
The console override code exists but needs activation in production builds. The last remaining console.log statement should be migrated to the new logger.
196+
**Testing & Validation**:
197+
- Test Datadog integration in production environment to verify log delivery, correlation IDs, and RUM integration work correctly
198+
- Test Electron file logging to ensure logs are written to disk with proper rotation and the global singleton pattern works across main and renderer processes
199+
- Verify console override behavior in production builds
197200

198-
### Future Enhancements
201+
**Distribution**:
202+
- Publish library to NPM registry as `@wireapp/logger` for external use and version management
203+
- Establish versioning strategy and release process
199204

200-
Optional improvements include log sampling for high-volume scenarios, ESLint rules to enforce logger usage patterns, custom transport plugins, and build-time log statement analysis.
205+
**Electron Migration**:
206+
- Remove old file logging implementation from Electron wrapper
207+
- Ensure file transport is initialized first in the Electron main process
208+
- Verify log file location, rotation, and accessibility for support
209+
)
210+
211+
### Phase 3: Development Logging Migration (Not Started)
212+
213+
**Codebase-Wide Migration**:
214+
- Replace all existing logger calls throughout web-packages with `.development.*` methods
215+
- This ensures all current logging is explicitly marked as development-only and will not be sent to Datadog
216+
- Review each logger instance to understand its purpose and appropriate log level
217+
218+
**Rationale**: Start by making all logs development-only to establish a secure baseline. This prevents accidentally sending sensitive data to Datadog while we review what should be production logging.
219+
220+
### Phase 4: Production Logging Strategy (Not Started)
221+
222+
**Define Production Logging Standards**:
223+
- Identify what events and metrics are valuable for production monitoring
224+
- Define which errors require production tracking versus development-only debugging
225+
- Establish guidelines for production log context (what data is safe to include)
226+
- Document production logging patterns and anti-patterns
227+
228+
**Add Production Logging**:
229+
- Systematically add `.production.*` logs for critical events, errors, and metrics
230+
- Focus on operational visibility: API failures, authentication issues, feature usage, performance metrics
231+
- Ensure all production logs contain only whitelisted context keys
232+
- Review each production log for PII safety
233+
234+
**Examples of Production-Worthy Logs**:
235+
- Authentication failures with error codes
236+
- API request failures with status codes and endpoints
237+
- Feature flag activations
238+
- Critical user flows (login, message send, call start)
239+
- Performance metrics (API latency, render times)
240+
- Client configuration issues
241+
242+
### Phase 5: Final Hardening (Not Started)
243+
244+
**Console Override Activation**:
245+
- Activate `installConsoleOverride()` in production builds to silence accidental console.log
246+
- Test in staging environment first
247+
248+
**Remaining Cleanup**:
249+
- Migrate the last console.log statement in MLSConversations
250+
- Final audit of all logging to ensure production/development separation is correct
251+
252+
### Improvements & Enhancements
253+
254+
The following improvements would enhance the library's usability and functionality:
255+
256+
#### Timer API for Performance Measurement
257+
258+
**What**: Console.time/console.timeEnd equivalent that integrates with the logger.
259+
260+
**Why**: Performance measurements often need to correlate with other logs. A timer API that uses the same logger infrastructure would provide better context and ensure timing data is properly sanitized when logged.
261+
262+
**How**:
263+
```typescript
264+
// Start a timer
265+
logger.development.timer.start('apiCall');
266+
267+
// Log intermediate time
268+
logger.development.log('apiCall', 'Request sent'); // Logs elapsed time
269+
270+
// End timer and log final duration
271+
logger.development.timer.end('apiCall'); // Logs total duration
272+
```
273+
274+
**Benefits**:
275+
- Automatic duration calculation
276+
- Consistent formatting across the codebase
277+
- Integration with production/development logging (timing data can be production-safe)
278+
- Correlation IDs can link timers to other log events
279+
280+
#### Log Sampling for High-Volume Scenarios
281+
282+
**What**: Configurable sampling rate to reduce log volume and costs for high-frequency events.
283+
284+
**Why**: Some events (like network requests or render loops) can generate thousands of logs per second. Sending all of these to Datadog is expensive and creates noise. Sampling allows us to capture representative data without overwhelming the system.
285+
286+
**How**:
287+
```typescript
288+
// Sample 10% of logs for this logger
289+
const logger = getLogger('HighVolumeComponent', {sampleRate: 0.1});
290+
291+
// Or configure per-transport
292+
{
293+
datadog: {
294+
enabled: true,
295+
sampleRate: 0.1, // Only send 10% of production logs to Datadog
296+
}
297+
}
298+
```
299+
300+
**Use Cases**:
301+
- Audio/video signaling logs (already have AVS filtering, sampling would be additional)
302+
- Mouse move or scroll events
303+
- Network request logging in high-traffic scenarios
304+
- Render performance logging
305+
- Websocket message logging
306+
307+
**Benefits**:
308+
- Reduces Datadog costs significantly for high-volume loggers
309+
- Still provides representative data for understanding system behavior
310+
- Can sample differently per environment (100% in staging, 1% in production)
311+
- Sampling decisions made before sanitization, saving processing time
312+
313+
**Implementation Considerations**:
314+
- Sampling happens deterministically (same session always samples or doesn't)
315+
- Or random sampling per log call (different logs from same session)
316+
- Sample rate configurable per logger, per transport, or globally
317+
- Important errors should bypass sampling (ERROR/FATAL always sent)
318+
319+
#### ESLint Rules for Logger Enforcement
320+
321+
**What**: Custom ESLint rules that enforce proper logger usage patterns.
322+
323+
**Why**: Prevent common mistakes before code reaches production. Automated enforcement is more reliable than code review alone.
324+
325+
**Rules**:
326+
- `no-console`: Disallow console.log/info/debug/warn (except in tests)
327+
- `require-production-method`: Prevent accidentally using generic `logger.info()` instead of `logger.production.info()` or `logger.development.info()`
328+
- `no-sensitive-context-keys`: Warn when using non-whitelisted context keys in production logs
329+
- `production-log-review`: Require special comment or annotation for new `.production.*` logs
330+
331+
**Benefits**:
332+
- Catch issues during development, not in production
333+
- Enforce team standards automatically
334+
- Reduce cognitive load in code review
335+
- Build institutional knowledge into tooling
336+
337+
#### Build-Time Log Analysis
338+
339+
**What**: Script that analyzes all log statements during build and generates a report.
340+
341+
**Why**: Understanding what gets logged where helps with security auditing and cost management.
342+
343+
**Output**:
344+
- Count of production vs development logs
345+
- List of all production log context keys used
346+
- Loggers with highest call count (candidates for sampling)
347+
- Production logs that might need review
348+
- Unused loggers
349+
350+
**Benefits**:
351+
- Security audit trail
352+
- Cost estimation for Datadog
353+
- Identify high-volume loggers before they become problems
354+
- Documentation of logging surface area
355+
356+
#### Custom Transport Plugins
357+
358+
**What**: Public API for adding custom transport implementations.
359+
360+
**Why**: Organizations might want to send logs to other destinations beyond Datadog and file (e.g., Sentry, Elasticsearch, custom internal systems).
361+
362+
**How**:
363+
```typescript
364+
class CustomTransport implements Transport {
365+
log(entry: LogEntry): void {
366+
// Custom logic
367+
}
368+
}
369+
370+
updateLoggerConfig({
371+
transports: {
372+
custom: new CustomTransport(),
373+
},
374+
});
375+
```
376+
377+
**Benefits**:
378+
- Flexibility for different deployment scenarios
379+
- Easier testing (mock transport)
380+
- Community contributions
381+
- Gradual migration between monitoring services
382+
383+
#### Structured Error Context
384+
385+
**What**: Helper for extracting safe error context from Error objects.
386+
387+
**Why**: Error objects contain useful debugging information, but they can also contain sensitive data in messages or stack traces. A helper that extracts only safe fields would be useful.
388+
389+
**How**:
390+
```typescript
391+
try {
392+
await riskyOperation();
393+
} catch (error) {
394+
logger.production.error('Operation failed', error, {
395+
operation: 'riskyOperation',
396+
// Error name, code, and sanitized message automatically extracted
397+
});
398+
}
399+
```
400+
401+
**Benefits**:
402+
- Consistent error logging
403+
- Automatic stack trace sanitization
404+
- Extract error codes and types safely
405+
- Integration with error tracking services
406+
407+
#### Performance Monitoring Integration
408+
409+
**What**: Hooks for integrating with performance monitoring tools beyond Datadog RUM.
410+
411+
**Why**: Some teams use specialized performance monitoring tools (e.g., Sentry, New Relic) and want to correlate logs with performance data.
412+
413+
**How**: Callbacks or events when certain log types occur, allowing integration code to forward data to other systems without coupling the logger to specific services.
414+
415+
**Benefits**:
416+
- Flexible performance monitoring strategy
417+
- Correlate logs with performance traces
418+
- Use best-of-breed tools for different purposes

0 commit comments

Comments
 (0)