Skip to content

Commit 9157312

Browse files
Merge pull request #9 from nextinterfaces/remove-jaeger-base64-auth
Remove jaeger base64 auth
2 parents f7bcf01 + 2726a25 commit 9157312

File tree

3 files changed

+361
-0
lines changed

3 files changed

+361
-0
lines changed

README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,7 @@
77
- **Sample REST Service** exposed publicly with HTTPS
88
- **Terraform** for infrastructure as code
99
- **Let's Encrypt SSL** certificates with cert-manager
10+
- **Distributed Tracing** with Jaeger and OpenTelemetry
1011

1112
## Prerequisites
1213

docs/JAEGER.md

Lines changed: 181 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,181 @@
1+
# Jaeger Distributed Tracing
2+
3+
Jaeger is deployed for distributed tracing and observability of the items-service. This guide covers deployment, configuration, and usage.
4+
5+
---
6+
7+
## Quick Start
8+
9+
### Deploy Jaeger
10+
11+
```bash
12+
task deploy:jaeger
13+
```
14+
15+
### View Traces
16+
17+
1. Open **https://app.roussev.com/jaeger**
18+
2. Select the service from Service dropdown
19+
3. Click **"Find Traces"**
20+
21+
### Generate Test Traces
22+
23+
```bash
24+
# Make some requests to generate traces
25+
curl https://app.roussev.com/items/v1/health
26+
curl https://app.roussev.com/items/v1/items
27+
curl -X POST https://app.roussev.com/items/v1/items \
28+
-H "Content-Type: application/json" \
29+
-d '{"name":"test item"}'
30+
```
31+
32+
---
33+
34+
## Architecture
35+
36+
```
37+
┌─────────────────┐
38+
│ Items Service │
39+
│ (OTEL enabled) │
40+
└────────┬────────┘
41+
│ OTLP/HTTP
42+
│ :4318
43+
44+
┌─────────────────┐
45+
│ Jaeger │
46+
│ (all-in-one) │
47+
└────────┬────────┘
48+
49+
50+
┌─────────────────────────┐
51+
│ Jaeger UI │
52+
│ app.roussev.com/jaeger │
53+
│ (publicly accessible) │
54+
└─────────────────────────┘
55+
```
56+
57+
**Components:**
58+
- **Items Service**: Sends traces via OpenTelemetry Protocol (OTLP)
59+
- **Jaeger Collector**: Receives traces on port 4318 (HTTP) and 4317 (gRPC)
60+
- **Jaeger UI**: Web interface for viewing and analyzing traces
61+
- **Storage**: In-memory (configurable to use Elasticsearch, Cassandra, etc.)
62+
63+
---
64+
65+
## Configuration
66+
67+
### Jaeger Deployment
68+
69+
Location: `infra/k8s/observability/jaeger-deployment.yaml`
70+
71+
**Key settings:**
72+
- **Image**: `jaegertracing/all-in-one:1.52`
73+
- **Storage**: In-memory (stores up to 10,000 traces)
74+
- **Resources**: 512Mi-1Gi memory, 200m-500m CPU
75+
- **Base Path**: `/jaeger` (configured via `QUERY_BASE_PATH` env var)
76+
77+
**Ports:**
78+
- `16686`: Jaeger UI
79+
- `4318`: OTLP HTTP collector
80+
- `4317`: OTLP gRPC collector
81+
82+
**Environment Variables:**
83+
```yaml
84+
env:
85+
- name: COLLECTOR_OTLP_ENABLED
86+
value: "true"
87+
- name: QUERY_BASE_PATH
88+
value: "/jaeger"
89+
- name: SPAN_STORAGE_TYPE
90+
value: "memory"
91+
- name: MEMORY_MAX_TRACES
92+
value: "10000"
93+
```
94+
95+
### Items Service Configuration
96+
97+
Location: `infra/k8s/apps/items-service-deployment.yaml`
98+
99+
**OpenTelemetry environment variables:**
100+
```yaml
101+
- name: OTEL_ENABLED
102+
value: "true"
103+
- name: OTEL_SERVICE_NAME
104+
value: "items-service"
105+
- name: OTEL_SERVICE_VERSION
106+
value: "1.0.0"
107+
- name: OTEL_EXPORTER_OTLP_ENDPOINT
108+
value: "http://jaeger:4318"
109+
- name: OTEL_LOG_LEVEL
110+
value: "info"
111+
```
112+
113+
### Ingress Configuration
114+
115+
**URL**: `https://app.roussev.com/jaeger`
116+
117+
**Features:**
118+
- Path-based routing (shares domain with items-service)
119+
- HTTPS with Let's Encrypt certificate (shared with items-service)
120+
- Publicly accessible (no authentication required)
121+
122+
**Annotations:**
123+
```yaml
124+
nginx.ingress.kubernetes.io/ssl-redirect: "true"
125+
cert-manager.io/cluster-issuer: "letsencrypt-prod"
126+
```
127+
128+
---
129+
130+
## What You Can See in Jaeger
131+
132+
### 1. Request Timeline
133+
- Total request duration
134+
- Time spent in each operation
135+
- Database query performance
136+
- HTTP request/response timing
137+
138+
### 2. Request Details
139+
- HTTP method, URL, status code
140+
- Request headers and parameters
141+
- Database queries executed
142+
- Error messages and stack traces
143+
144+
### 3. Service Dependencies
145+
- Which services call which
146+
- Request flow through the system
147+
- Performance bottlenecks
148+
149+
### 4. Custom Attributes
150+
The items-service adds custom attributes like:
151+
- `items.count`: Number of items returned
152+
- `item.id`: Item ID for specific operations
153+
- `db.query`: Database queries executed
154+
155+
---
156+
157+
## Storage Options
158+
159+
### Current: In-Memory Storage
160+
161+
**Cons:**
162+
- Traces are lost on pod restart
163+
- Limited to 10,000 traces
164+
- Not suitable for high-traffic
165+
166+
### Production Storage Options
167+
168+
- Elasticsearch, Cassandra or Badger
169+
170+
---
171+
172+
## Summary
173+
174+
**Deployment**: `task deploy:jaeger`
175+
176+
**Access**: https://app.roussev.com/jaeger
177+
178+
**Storage**: In-memory (10,000 traces max)
179+
180+
**Integration**: Items-service automatically sends traces via OpenTelemetry
181+

docs/OPENTELEMETRY_GUIDE.md

Lines changed: 179 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,179 @@
1+
# OpenTelemetry Guide
2+
3+
OpenTelemetry integration for distributed tracing in items-service.
4+
5+
## What is OpenTelemetry?
6+
7+
**OpenTelemetry (OTel)** creates **traces** - records of requests as they flow through your system.
8+
9+
**Key Concepts:**
10+
11+
- **Trace**: The complete journey of a request through your system
12+
- **Span**: A single operation within a trace (e.g., HTTP request, database query)
13+
- **Attributes**: Key-value pairs that provide context (e.g., HTTP method, status code)
14+
- **Events**: Timestamped logs within a span
15+
- **Exceptions**: Errors that occurred during a span
16+
## Use Cases
17+
18+
**Find Performance Bottlenecks:**
19+
- Identify slow database queries
20+
- Measure endpoint response times
21+
- Pinpoint time-consuming operations
22+
23+
**Debug Issues:**
24+
- See operation sequence leading to errors
25+
- View exception details and stack traces
26+
- Understand request context
27+
28+
**Analyze Behavior:**
29+
- Count database queries per endpoint
30+
- Track typical response times
31+
- Identify frequently called operations
32+
33+
## Generate Test Traces
34+
35+
```bash
36+
# Health check
37+
curl http://localhost:8081/v1/health
38+
39+
# List items
40+
curl http://localhost:8081/v1/items
41+
42+
# Create an item
43+
curl -X POST http://localhost:8081/v1/items \
44+
-H "Content-Type: application/json" \
45+
-d '{"name":"test-item"}'
46+
```
47+
48+
## What's Captured
49+
50+
**HTTP Requests:**
51+
- Operation: `GET /v1/items`, `POST /v1/items`
52+
- Attributes: method, URL, status code, host
53+
54+
**Database Operations:**
55+
- SQL queries and duration
56+
- Connection details
57+
- Query parameters (sanitized)
58+
59+
**Custom Attributes:**
60+
- `items.count`: Number of items returned
61+
- `item.id`: Item ID
62+
- `item.name`: Item name
63+
64+
**Errors:**
65+
- Exception type and message
66+
- Stack trace
67+
- Error context
68+
69+
## Common Workflows
70+
71+
**Debug Slow Endpoint:**
72+
1. Search for endpoint traces
73+
2. Sort by duration (longest first)
74+
3. Check timeline for bottlenecks
75+
76+
**Investigate Errors:**
77+
1. Filter by `error=true` tag
78+
2. View exception details
79+
3. Check request context
80+
81+
**Analyze Database Usage:**
82+
1. Open any trace
83+
2. Count database spans
84+
3. Look for N+1 query patterns
85+
86+
## Filtering Traces
87+
88+
**By Duration:** `minDuration=100ms maxDuration=1s`
89+
90+
**By Tags:** `http.status_code=500`, `http.method=POST`
91+
92+
**By Time:** Use time picker for specific periods
93+
94+
## Health Indicators
95+
96+
**Good:**
97+
- Requests < 100ms
98+
- Few errors
99+
- Consistent timing
100+
- Minimal DB queries
101+
102+
**Warning:**
103+
- High variance in response times
104+
- N+1 query problems
105+
- Frequent errors
106+
107+
**Critical:**
108+
- Timeouts
109+
- Cascading failures
110+
- Slow database queries (> 1s)
111+
112+
## 🔧 Troubleshooting
113+
114+
### No Traces Appearing
115+
116+
1. **Check if OpenTelemetry is enabled:**
117+
```bash
118+
kubectl get deployment items-service -o yaml | grep OTEL_ENABLED
119+
```
120+
Should show `value: "true"`
121+
122+
2. **Check items-service logs:**
123+
Look for "📊 OpenTelemetry initialized" message
124+
125+
3. **Check Jaeger is running:**
126+
```bash
127+
kubectl get pods | grep jaeger
128+
```
129+
130+
4. **Check connectivity:**
131+
```bash
132+
kubectl exec -it <items-service-pod> -- curl http://jaeger:4318
133+
```
134+
135+
### Traces Missing Information
136+
137+
1. **Check OTEL_LOG_LEVEL:**
138+
Set to "debug" to see detailed logs:
139+
```yaml
140+
- name: OTEL_LOG_LEVEL
141+
value: "debug"
142+
```
143+
144+
2. **Check auto-instrumentation:**
145+
Some libraries may not be auto-instrumented
146+
May need manual instrumentation
147+
148+
### Performance Impact
149+
150+
OpenTelemetry has minimal overhead:
151+
- ~1-5ms per request
152+
- Sampling can reduce overhead further
153+
- Can be disabled in production if needed
154+
155+
## 📚 Additional Resources
156+
157+
- [OpenTelemetry Documentation](https://opentelemetry.io/docs/)
158+
- [Jaeger Documentation](https://www.jaegertracing.io/docs/)
159+
- [OpenTelemetry JavaScript SDK](https://opentelemetry.io/docs/instrumentation/js/)
160+
- [Distributed Tracing Best Practices](https://opentelemetry.io/docs/concepts/observability-primer/)
161+
162+
## 🎯 Next Steps
163+
164+
1. **Run the test script** to validate your setup:
165+
```bash
166+
./scripts/test-otel.sh
167+
```
168+
169+
2. **Explore Jaeger UI** at http://localhost:16686
170+
171+
3. **Make some requests** and watch the traces appear
172+
173+
4. **Try the use cases** above to get familiar with the UI
174+
175+
5. **Consider adding custom spans** for important business operations
176+
177+
6. **Set up alerts** based on trace data (requires additional setup)
178+
179+

0 commit comments

Comments
 (0)