Skip to content

Commit e1c4eaa

Browse files
authored
Merge pull request #17 from sassoftware/monitoring-utilities
Monitoring utilities
2 parents 75f8af8 + 26c7211 commit e1c4eaa

File tree

7 files changed

+1122
-5
lines changed

7 files changed

+1122
-5
lines changed

README.md

Lines changed: 11 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,7 @@
2020
- [Optional Components](#optional-components)
2121
- [Backup and Restore Guide](#backup-and-restore-guide)
2222
- [Connect an LLM](#connecting-different-llms)
23+
- [Monitoring and Logging](#monitoring-and-logging)
2324
- [Troubleshooting](#troubleshooting)
2425
- [Common Issues](#common-issues)
2526
- [Debug Commands](#debug-commands)
@@ -238,10 +239,12 @@ After you have access to the Kubernetes cluster, you must install the necessary
238239

239240
SAS has partnered with [Weaviate](https://weaviate.io/) and supports it as a vector database alternative to PGVector storage. This installation is not required but is compatible with RAM.
240241

241-
| Component | Version | Example Values File | Installation Instructions |
242-
|-----------|---------------|---------------------|---------------------------------------------------------------------------------------------|
243-
| **Weaviate** |v17.3.3 |[weaviate.yaml](./examples/weaviate.yaml) | [instructions](./docs/user/DependencyInstall.md#weaviate) |
244-
| **Ollama** |v1.12.0 |[ollama.yaml](./examples/ollama.yaml) | [instructions](./docs/llm-connection/ollama.md) |
242+
| Component | Version | Example Values File | Installation Instructions |
243+
|-----------|---------------|---------------------|----------------------------------------------------------------------------------------------|
244+
| **Weaviate** |v17.3.3 |[weaviate.yaml](./examples/weaviate.yaml) | [instructions](./docs/user/DependencyInstall.md#weaviate) |
245+
| **Ollama** |v1.12.0 |[ollama.yaml](./examples/ollama.yaml) | [instructions](./docs/llm-connection/ollama.md) |
246+
| **Vector** | 0.46.0 |[example](./docs/monitoring/README.md) | [instructions](https://vector.dev/installation/) |
247+
| **Phoenix** |v4.0.7 |[phoenix.yaml](./examples/phoenix.yaml) | [instructions](./docs/monitoring/traces.md) |
245248

246249
### Install SAS Retrieval Agent Manager
247250

@@ -283,6 +286,10 @@ To backup and restore the data you use RAM for, visit the [Backup and Restore pa
283286

284287
To add different LLMs for RAM to use, visit the [Connecting an LLM page](./docs/llm-connection/README.md).
285288

289+
## Monitoring and Logging
290+
291+
To monitor and log agent and LLM activity, visit the [Monitoring setup page](./docs/monitoring/README.md)
292+
286293
## Troubleshooting
287294

288295
### Common Issues

docs/monitoring/README.md

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
# Monitoring and Logging guide
2+
3+
This folder provides documentation and instructions for managing logs, metrics, and traces using [Vector](https://vector.dev/), [Phoenix](https://phoenix.arize.com/), and [Langfuse](https://langfuse.com/).
4+
5+
## Contents
6+
7+
- [logs-and-metrics.md](./logs-and-metrics.md): Instructions for how to track and view logs and metrics using [Vector](https://vector.dev/).
8+
- [traces.md](./traces.md): Instructions for how to track and view traces using Vector and [Phoenix](https://phoenix.arize.com/) or [Langfuse](https://langfuse.com/).
9+
10+
## Purpose
11+
12+
These documents are intended to help operators and users:
13+
14+
- Deploy Vector and Phoenix in various cloud and on-premises environments
15+
- Configure the endpoints for trace collection using phoenix or langfuse
16+
- Adapt the values files to deploy phoenix on your cluster alongside RAM
17+
18+
Refer to each file for detailed, step-by-step instructions tailored to your platform and use case.
Lines changed: 209 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,209 @@
1+
# Logs and Metrics in RAM
2+
3+
The SAS Retrieval Agent Manager (RAM) system collects and stores logs and metrics using [Vector](https://vector.dev/), a high-performance observability data pipeline. Vector aggregates telemetry data from Kubernetes clusters and routes it to PostgreSQL via PostgREST for persistent storage and querying.
4+
5+
## Architecture Overview
6+
7+
```text
8+
Kubernetes Logs / RAM APIMetrics → Vector → PostgREST → PostgreSQL
9+
```
10+
11+
Vector runs as a DaemonSet in the cluster, collecting:
12+
13+
- **Logs**: Container logs from all pods via Kubernetes log files
14+
15+
- **Metrics**: Performance metrics, resource usage, and custom application metrics
16+
17+
## Configuration
18+
19+
### Vector Pipeline Components
20+
21+
The Vector configuration consists of three main components:
22+
23+
1. **Sources**: Data collection from Kubernetes
24+
2. **Transforms**: Data processing and enrichment using VRL (Vector Remap Language)
25+
3. **Sinks**: Delivery to PostgREST endpoints
26+
27+
### Logs Pipeline
28+
29+
Vector collects Kubernetes pod logs and enriches them with metadata:
30+
31+
```yaml
32+
sources:
33+
kube_logs:
34+
type: kubernetes_logs
35+
auto_partial_merge: true
36+
37+
transforms:
38+
logs_transform:
39+
type: remap
40+
inputs:
41+
- kube_logs
42+
source: |
43+
# Remove fields not in database schema
44+
del(.source_type)
45+
del(.stream)
46+
47+
sinks:
48+
logs_postgrest:
49+
type: http
50+
inputs:
51+
- logs_transform
52+
uri: "http://sas-retrieval-agent-manager-postgrest.retagentmgr.svc.cluster.local:3002/logs"
53+
encoding:
54+
codec: json
55+
method: post
56+
```
57+
58+
> Note: See a full [Vector example values file here](../../examples/vector.yaml)
59+
60+
#### Log Schema
61+
62+
Logs are stored in PostgreSQL with the following schema:
63+
64+
| Column | Type | Description |
65+
|--------|------|-------------|
66+
| `file` | TEXT | Path to the log file in Kubernetes |
67+
| `kubernetes` | JSONB | Kubernetes metadata (pod, namespace, labels, etc.) |
68+
| `message` | TEXT | The actual log message |
69+
| `timestamp` | TIMESTAMPTZ | When the log entry was created |
70+
71+
#### Kubernetes Metadata
72+
73+
The `kubernetes` JSONB column includes the following context:
74+
75+
- `pod_name`, `pod_namespace`, `pod_uid`
76+
77+
- `container_name`, `container_image`
78+
79+
- `node_labels`
80+
81+
- `pod_labels`
82+
83+
- `pod_ip`, `pod_owner`
84+
85+
### Metrics Pipeline
86+
87+
Metrics collection follows a similar pattern but does not need transformations:
88+
89+
```yaml
90+
sources:
91+
otel:
92+
type: opentelemetry
93+
grpc:
94+
address: 0.0.0.0:4317
95+
http:
96+
address: 0.0.0.0:4318
97+
98+
sinks:
99+
metrics_postgrest:
100+
type: http
101+
inputs:
102+
- otel.metrics
103+
uri: "http://sas-retrieval-agent-manager-postgrest.retagentmgr.svc.cluster.local:3002/metrics"
104+
headers:
105+
Content-Type: "Application/json"
106+
encoding:
107+
codec: json
108+
```
109+
110+
## Installation
111+
112+
To install Vector, edit the [example Vector values file](../../examples/vector.yaml) to your desired settings and run the following commands:
113+
114+
```sh
115+
helm repo add vector https://helm.vector.dev
116+
helm repo update
117+
118+
helm install vector vector/vector \
119+
-n vector -f .\values.yaml \
120+
--create-namespace --version 0.46.0
121+
```
122+
123+
## PostgREST Integration
124+
125+
Vector sends data directly to PostgREST HTTP endpoints, which provides:
126+
127+
- Automatic API generation from PostgreSQL schema
128+
129+
- Role-based access control via PostgreSQL roles
130+
131+
- JSON validation and type safety
132+
133+
## Testing
134+
135+
### Manual Log Injection
136+
137+
Test the postgREST endpoint with a curl from within the cluster:
138+
139+
```bash
140+
curl -X POST \
141+
"http://sas-retrieval-agent-manager-postgrest.retagentmgr.svc.cluster.local:3002/logs" \
142+
-H "Content-Type: application/json" \
143+
-H "Prefer: return=representation" \
144+
-d '{
145+
"file": "/var/log/pods/test_pod/container/0.log",
146+
"kubernetes": {
147+
"container_name": "test-container",
148+
"pod_name": "test-pod",
149+
"pod_namespace": "default",
150+
"pod_uid": "test-uid-12345"
151+
},
152+
"message": "Test log message",
153+
"timestamp": "2025-11-10T18:00:00.000000Z"
154+
}'
155+
```
156+
157+
### Verify Vector is Running
158+
159+
```bash
160+
# Check Vector pods
161+
kubectl get pods -n vector
162+
163+
# View Vector logs
164+
kubectl logs -n vector -l app.kubernetes.io/name=vector --tail=100
165+
166+
# Check for errors
167+
kubectl logs -n vector -l app.kubernetes.io/name=vector | grep ERROR
168+
```
169+
170+
## Troubleshooting
171+
172+
### Common Issues
173+
174+
#### 1. Schema Mismatch Errors
175+
176+
**Error**: `Could not find the 'source_type' column`
177+
178+
**Solution**: Add a VRL transform to remove fields not in your database schema:
179+
180+
```yaml
181+
transforms:
182+
remove_extra_fields:
183+
type: remap
184+
inputs:
185+
- kube_logs
186+
source: |
187+
del(.source_type)
188+
del(.stream)
189+
```
190+
191+
#### 2. PostgREST Connection Failures
192+
193+
**Error**: `Service call failed. No retries or retries exhausted`
194+
195+
Check PostgREST is accessible:
196+
197+
```bash
198+
199+
kubectl get svc -n retagentmgr sas-retrieval-agent-manager-postgrest
200+
kubectl get pods -n retagentmgr -l app.kubernetes.io/name=postgrest
201+
202+
```
203+
204+
## Related Documentation
205+
206+
- [Vector Documentation](https://vector.dev/docs/)
207+
- [PostgREST API Reference](https://postgrest.org/en/stable/api.html)
208+
- [OpenTelemetry Specification](https://opentelemetry.io/docs/)
209+
- [VRL Language Reference](https://vector.dev/docs/reference/vrl/)

0 commit comments

Comments
 (0)