-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Labels
difficulty:intermediateIntermediate difficulty - requires domain knowledgeIntermediate difficulty - requires domain knowledgeenhancementNew feature or requestNew feature or requestpriority:mediumMedium priority - important but not blockingMedium priority - important but not blocking
Description
Overview
Integrate OpenTelemetry distributed tracing for production observability.
Background
From STAFF_REVIEW.md: "Can you trace a request through all abstraction layers?"
Current stack has 5+ abstraction layers:
User Request
└── Conduit Router
└── PydanticAI Agent
└── OpenAI/Anthropic SDK
└── HTTP client
└── Provider API
Goals
- Trace requests end-to-end across all layers
- Identify latency bottlenecks in production
- Debug issues across distributed components
- Monitor bandit algorithm decision-making
Implementation
1. Install OpenTelemetry
pip install opentelemetry-api opentelemetry-sdk
pip install opentelemetry-instrumentation-fastapi # if using FastAPI
pip install opentelemetry-instrumentation-httpx # for HTTP calls
pip install opentelemetry-exporter-otlp # for export2. Instrument Conduit Router
from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter
# Initialize tracing
provider = TracerProvider()
processor = BatchSpanProcessor(OTLPSpanExporter())
provider.add_span_processor(processor)
trace.set_tracer_provider(provider)
tracer = trace.get_tracer(__name__)
# Instrument routing decision
with tracer.start_as_current_span("bandit.select_arm") as span:
arm = bandit.select_arm(context)
span.set_attribute("selected_model", arm.model_name)
span.set_attribute("ucb_score", arm.score)3. Key Spans to Instrument
Routing Layer:
conduit.route- Overall routing decisionbandit.select_arm- Arm selection logicbandit.update- Reward feedback updateembeddings.generate- Query embedding generation
Execution Layer:
model.execute- LLM API callevaluation.score- Quality evaluation with Arbitercost.calculate- Cost tracking
Persistence Layer:
db.save_state- Bandit state persistencedb.load_state- State recovery
4. Attributes to Capture
span.set_attribute("query.text", query[:100]) # Truncate for privacy
span.set_attribute("query.category", category)
span.set_attribute("query.complexity", complexity)
span.set_attribute("model.selected", model_name)
span.set_attribute("model.cost", cost)
span.set_attribute("model.latency_ms", latency)
span.set_attribute("quality.score", quality)
span.set_attribute("bandit.algorithm", algo_name)
span.set_attribute("bandit.exploration", is_exploration)5. Export to Observability Backend
Choose backend:
- Jaeger (self-hosted, good for development)
- Honeycomb (SaaS, excellent UX)
- Datadog (enterprise, full APM)
- Grafana Tempo (open-source, cost-effective)
Recommend: Jaeger for development, Honeycomb for production
Success Criteria
- OpenTelemetry instrumentation in
conduit_bench/tracing.py - All 10+ key operations instrumented
- Trace export to Jaeger/Honeycomb working
- Can visualize full request flow in trace UI
- Latency breakdown by layer visible
- Documentation in
docs/OBSERVABILITY.md - Example traces in docs
Example Trace Visualization
Request [200ms total]
├─ conduit.route [180ms]
│ ├─ embeddings.generate [50ms]
│ ├─ bandit.select_arm [5ms]
│ │ └─ linucb.compute_ucb [4ms]
│ └─ model.execute [120ms]
│ └─ openai.chat.completions [115ms]
└─ evaluation.score [20ms]
└─ arbiter.semantic_similarity [18ms]
Priority
MEDIUM - Essential for production debugging, not blocking research
Difficulty
Intermediate - Requires observability platform knowledge
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
difficulty:intermediateIntermediate difficulty - requires domain knowledgeIntermediate difficulty - requires domain knowledgeenhancementNew feature or requestNew feature or requestpriority:mediumMedium priority - important but not blockingMedium priority - important but not blocking