-
Notifications
You must be signed in to change notification settings - Fork 120
Description
Overview
Implement comprehensive OpenTelemetry (OTel) tracing and logging for py-shiny, providing observability into Shiny application behavior with minimal performance overhead. This implementation follows the R Shiny approach but adapts it to Python idioms (async/await, context managers, decorators, contextvars).
Goals
- 5-level collection granularity:
none,session,reactive_update,reactivity,all - Async-aware span propagation using Python's
contextvars - Lazy initialization with minimal performance impact (< 50μs per span)
- Dual configuration: Environment variables (
SHINY_OTEL_COLLECT) and programmatic API (otel_collectcontext manager) - Source location attribution for reactive computations
- Error sanitization and proper exception recording
- Standard OTel compatibility with backends like Jaeger, Zipkin, etc.
Architecture
New public module shiny/otel/ containing:
- Core tracer/logger initialization
- Collection level management
- Context propagation utilities
- Span creation helpers
- Attribute extraction (source refs, HTTP metadata)
- Error handling and sanitization
- User-facing decorators and context managers
Dependencies
Phase 1: Add opentelemetry-api>=1.20.0 as required dependency
Phase 10: Evaluate optional dependency group approach (pip install shiny[telemetry])
Success Criteria
- OTel integration works with all 5 collection levels
- Session lifecycle spans include correct attributes (session ID, HTTP metadata)
- Reactive execution spans nest correctly under reactive update spans
- Async context propagation maintains correct parent-child relationships
- Users can control collection via environment variable and context managers (environment variable ✅, context manager pending Phase 7)
- Performance overhead < 50μs per span
- Test coverage > 90% for OTel code
- Complete documentation and example applications
- Compatible with standard OTel backends
Sub-Issues
This epic is broken down into 10 phases, each corresponding to a sub-issue:
1. ✅ Foundation (Core OTel Infrastructure) [#2132]
Set up basic OTel infrastructure with tracer/logger initialization and collection level management.
Status: Merged in PR #2143
2. ✅ Session Lifecycle Instrumentation [#2133]
Add OTel spans for session start/end and HTTP/WebSocket connections.
Status: Merged in PR #2146
3. ✅ Reactive Flush Instrumentation [#2134]
Add "reactive update" spans that wrap each flush cycle, serving as parent for all reactive spans.
Status: Merged in PR #2148
4. ✅ Reactive Execution Instrumentation [#2135]
Instrument individual reactive computations (calcs, effects, outputs) with descriptive labels and source attribution.
Status: Merged in PR #2149
5. ✅ Value Updates and Logging [#2136]
Log reactive value updates as OTel log events.
Status: In Review - PR #2169
- Added
emit_otel_log()helper function - Value updates logged with DEBUG severity
- Namespace support for module-scoped values
- 12 comprehensive tests
- Example application
6. Error Handling and Sanitization [#2137]
Record exceptions in spans with proper sanitization for sensitive information.
Status: Not Started
7. User-Facing API [#2138]
Provide context managers and decorators for user control over OTel collection.
Status: Not Started
8. Testing Infrastructure [#2139]
Add comprehensive tests for OTel functionality with in-memory exporters and fixtures.
Status: Not Started (test infrastructure added incrementally in each phase)
9. Documentation [#2140]
Document OTel features for users including API docs, user guide, and examples.
Status: Not Started
10. Follow-up Evaluation [#2141]
Evaluate whether optional dependency group approach is better than required dependency.
Status: Not Started
See linked sub-issues below for detailed implementation plans for each phase.
Implementation Notes
Non-Goals (Out of Scope)
- ASGI middleware for automatic HTTP tracing
- Metrics collection (only traces and logs)
- Custom span processors or exporters
- Integration with specific observability platforms
- Automatic instrumentation of third-party libraries
Open Questions
- Extended Tasks: Should we instrument
shiny/reactive/_extended_task.py?- Decision: Add in Phase 4 if time permits, otherwise defer
- Bookmark Operations: Should bookmark save/restore be instrumented?
- Decision: Defer to follow-up, focus on reactive core
- Module Namespacing: Use existing
_current_namespacecontextvar fromshiny/_namespaces.py - Performance Impact: Target < 50μs per span overhead
Critical Files Modified
Core:
pyproject.tomlshiny/__init__.pyshiny/otel/*(all new files)
Integration Points:
shiny/session/_session.py(lines ~599, ~719, ~1821)shiny/reactive/_core.py(line ~175)shiny/reactive/_reactives.py(lines ~180, ~235, ~305, ~575)
Testing:
tests/pytest/test_otel_*.py(all new files)