Skip to content

Commit 2726a25

Browse files
docs
1 parent 3f0f28d commit 2726a25

File tree

2 files changed

+180
-0
lines changed

2 files changed

+180
-0
lines changed

README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,7 @@
77
- **Sample REST Service** exposed publicly with HTTPS
88
- **Terraform** for infrastructure as code
99
- **Let's Encrypt SSL** certificates with cert-manager
10+
- **Distributed Tracing** with Jaeger and OpenTelemetry
1011

1112
## Prerequisites
1213

docs/OPENTELEMETRY_GUIDE.md

Lines changed: 179 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,179 @@
1+
# OpenTelemetry Guide
2+
3+
OpenTelemetry integration for distributed tracing in items-service.
4+
5+
## What is OpenTelemetry?
6+
7+
**OpenTelemetry (OTel)** creates **traces** - records of requests as they flow through your system.
8+
9+
**Key Concepts:**
10+
11+
- **Trace**: The complete journey of a request through your system
12+
- **Span**: A single operation within a trace (e.g., HTTP request, database query)
13+
- **Attributes**: Key-value pairs that provide context (e.g., HTTP method, status code)
14+
- **Events**: Timestamped logs within a span
15+
- **Exceptions**: Errors that occurred during a span
16+
## Use Cases
17+
18+
**Find Performance Bottlenecks:**
19+
- Identify slow database queries
20+
- Measure endpoint response times
21+
- Pinpoint time-consuming operations
22+
23+
**Debug Issues:**
24+
- See operation sequence leading to errors
25+
- View exception details and stack traces
26+
- Understand request context
27+
28+
**Analyze Behavior:**
29+
- Count database queries per endpoint
30+
- Track typical response times
31+
- Identify frequently called operations
32+
33+
## Generate Test Traces
34+
35+
```bash
36+
# Health check
37+
curl http://localhost:8081/v1/health
38+
39+
# List items
40+
curl http://localhost:8081/v1/items
41+
42+
# Create an item
43+
curl -X POST http://localhost:8081/v1/items \
44+
-H "Content-Type: application/json" \
45+
-d '{"name":"test-item"}'
46+
```
47+
48+
## What's Captured
49+
50+
**HTTP Requests:**
51+
- Operation: `GET /v1/items`, `POST /v1/items`
52+
- Attributes: method, URL, status code, host
53+
54+
**Database Operations:**
55+
- SQL queries and duration
56+
- Connection details
57+
- Query parameters (sanitized)
58+
59+
**Custom Attributes:**
60+
- `items.count`: Number of items returned
61+
- `item.id`: Item ID
62+
- `item.name`: Item name
63+
64+
**Errors:**
65+
- Exception type and message
66+
- Stack trace
67+
- Error context
68+
69+
## Common Workflows
70+
71+
**Debug Slow Endpoint:**
72+
1. Search for endpoint traces
73+
2. Sort by duration (longest first)
74+
3. Check timeline for bottlenecks
75+
76+
**Investigate Errors:**
77+
1. Filter by `error=true` tag
78+
2. View exception details
79+
3. Check request context
80+
81+
**Analyze Database Usage:**
82+
1. Open any trace
83+
2. Count database spans
84+
3. Look for N+1 query patterns
85+
86+
## Filtering Traces
87+
88+
**By Duration:** `minDuration=100ms maxDuration=1s`
89+
90+
**By Tags:** `http.status_code=500`, `http.method=POST`
91+
92+
**By Time:** Use time picker for specific periods
93+
94+
## Health Indicators
95+
96+
**Good:**
97+
- Requests < 100ms
98+
- Few errors
99+
- Consistent timing
100+
- Minimal DB queries
101+
102+
**Warning:**
103+
- High variance in response times
104+
- N+1 query problems
105+
- Frequent errors
106+
107+
**Critical:**
108+
- Timeouts
109+
- Cascading failures
110+
- Slow database queries (> 1s)
111+
112+
## 🔧 Troubleshooting
113+
114+
### No Traces Appearing
115+
116+
1. **Check if OpenTelemetry is enabled:**
117+
```bash
118+
kubectl get deployment items-service -o yaml | grep OTEL_ENABLED
119+
```
120+
Should show `value: "true"`
121+
122+
2. **Check items-service logs:**
123+
Look for "📊 OpenTelemetry initialized" message
124+
125+
3. **Check Jaeger is running:**
126+
```bash
127+
kubectl get pods | grep jaeger
128+
```
129+
130+
4. **Check connectivity:**
131+
```bash
132+
kubectl exec -it <items-service-pod> -- curl http://jaeger:4318
133+
```
134+
135+
### Traces Missing Information
136+
137+
1. **Check OTEL_LOG_LEVEL:**
138+
Set to "debug" to see detailed logs:
139+
```yaml
140+
- name: OTEL_LOG_LEVEL
141+
value: "debug"
142+
```
143+
144+
2. **Check auto-instrumentation:**
145+
Some libraries may not be auto-instrumented
146+
May need manual instrumentation
147+
148+
### Performance Impact
149+
150+
OpenTelemetry has minimal overhead:
151+
- ~1-5ms per request
152+
- Sampling can reduce overhead further
153+
- Can be disabled in production if needed
154+
155+
## 📚 Additional Resources
156+
157+
- [OpenTelemetry Documentation](https://opentelemetry.io/docs/)
158+
- [Jaeger Documentation](https://www.jaegertracing.io/docs/)
159+
- [OpenTelemetry JavaScript SDK](https://opentelemetry.io/docs/instrumentation/js/)
160+
- [Distributed Tracing Best Practices](https://opentelemetry.io/docs/concepts/observability-primer/)
161+
162+
## 🎯 Next Steps
163+
164+
1. **Run the test script** to validate your setup:
165+
```bash
166+
./scripts/test-otel.sh
167+
```
168+
169+
2. **Explore Jaeger UI** at http://localhost:16686
170+
171+
3. **Make some requests** and watch the traces appear
172+
173+
4. **Try the use cases** above to get familiar with the UI
174+
175+
5. **Consider adding custom spans** for important business operations
176+
177+
6. **Set up alerts** based on trace data (requires additional setup)
178+
179+

0 commit comments

Comments
 (0)