Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 11 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@
- [Optional Components](#optional-components)
- [Backup and Restore Guide](#backup-and-restore-guide)
- [Connect an LLM](#connecting-different-llms)
- [Monitoring and Logging](#monitoring-and-logging)
- [Troubleshooting](#troubleshooting)
- [Common Issues](#common-issues)
- [Debug Commands](#debug-commands)
Expand Down Expand Up @@ -238,10 +239,12 @@ After you have access to the Kubernetes cluster, you must install the necessary

SAS has partnered with [Weaviate](https://weaviate.io/) and supports it as a vector database alternative to PGVector storage. This installation is not required but is compatible with RAM.

| Component | Version | Example Values File | Installation Instructions |
|-----------|---------------|---------------------|---------------------------------------------------------------------------------------------|
| **Weaviate** |v17.3.3 |[weaviate.yaml](./examples/weaviate.yaml) | [instructions](./docs/user/DependencyInstall.md#weaviate) |
| **Ollama** |v1.12.0 |[ollama.yaml](./examples/ollama.yaml) | [instructions](./docs/llm-connection/ollama.md) |
| Component | Version | Example Values File | Installation Instructions |
|-----------|---------------|---------------------|----------------------------------------------------------------------------------------------|
| **Weaviate** |v17.3.3 |[weaviate.yaml](./examples/weaviate.yaml) | [instructions](./docs/user/DependencyInstall.md#weaviate) |
| **Ollama** |v1.12.0 |[ollama.yaml](./examples/ollama.yaml) | [instructions](./docs/llm-connection/ollama.md) |
| **Vector** | 0.46.0 |[example](./docs/monitoring/README.md) | [instructions](https://vector.dev/installation/) |
| **Phoenix** |v4.0.7 |[phoenix.yaml](./examples/phoenix.yaml) | [instructions](./docs/monitoring/traces.md) |

### Install SAS Retrieval Agent Manager

Expand Down Expand Up @@ -283,6 +286,10 @@ To backup and restore the data you use RAM for, visit the [Backup and Restore pa

To add different LLMs for RAM to use, visit the [Connecting an LLM page](./docs/llm-connection/README.md).

## Monitoring and Logging

To monitor and log agent and LLM activity, visit the [Monitoring setup page](./docs/monitoring/README.md)

## Troubleshooting

### Common Issues
Expand Down
18 changes: 18 additions & 0 deletions docs/monitoring/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
# Monitoring and Logging guide

This folder provides documentation and instructions for managing logs, metrics, and traces using [Vector](https://vector.dev/), [Phoenix](https://phoenix.arize.com/), and [Langfuse](https://langfuse.com/).

## Contents

- [logs-and-metrics.md](./logs-and-metrics.md): Instructions for how to track and view logs and metrics using [Vector](https://vector.dev/).
- [traces.md](./traces.md): Instructions for how to track and view traces using Vector and [Phoenix](https://phoenix.arize.com/) or [Langfuse](https://langfuse.com/).

## Purpose

These documents are intended to help operators and users:

- Deploy Vector and Phoenix in various cloud and on-premises environments
- Configure the endpoints for trace collection using phoenix or langfuse
- Adapt the values files to deploy phoenix on your cluster alongside RAM

Refer to each file for detailed, step-by-step instructions tailored to your platform and use case.
209 changes: 209 additions & 0 deletions docs/monitoring/logs-and-metrics.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,209 @@
# Logs and Metrics in RAM

The SAS Retrieval Agent Manager (RAM) system collects and stores logs and metrics using [Vector](https://vector.dev/), a high-performance observability data pipeline. Vector aggregates telemetry data from Kubernetes clusters and routes it to PostgreSQL via PostgREST for persistent storage and querying.

## Architecture Overview

```text
Kubernetes Logs / RAM APIMetrics → Vector → PostgREST → PostgreSQL
```

Vector runs as a DaemonSet in the cluster, collecting:

- **Logs**: Container logs from all pods via Kubernetes log files

- **Metrics**: Performance metrics, resource usage, and custom application metrics

## Configuration

### Vector Pipeline Components

The Vector configuration consists of three main components:

1. **Sources**: Data collection from Kubernetes
2. **Transforms**: Data processing and enrichment using VRL (Vector Remap Language)
3. **Sinks**: Delivery to PostgREST endpoints

### Logs Pipeline

Vector collects Kubernetes pod logs and enriches them with metadata:

```yaml
sources:
kube_logs:
type: kubernetes_logs
auto_partial_merge: true

transforms:
logs_transform:
type: remap
inputs:
- kube_logs
source: |
# Remove fields not in database schema
del(.source_type)
del(.stream)

sinks:
logs_postgrest:
type: http
inputs:
- logs_transform
uri: "http://sas-retrieval-agent-manager-postgrest.retagentmgr.svc.cluster.local:3002/logs"
encoding:
codec: json
method: post
```

> Note: See a full [Vector example values file here](../../examples/vector.yaml)

#### Log Schema

Logs are stored in PostgreSQL with the following schema:

| Column | Type | Description |
|--------|------|-------------|
| `file` | TEXT | Path to the log file in Kubernetes |
| `kubernetes` | JSONB | Kubernetes metadata (pod, namespace, labels, etc.) |
| `message` | TEXT | The actual log message |
| `timestamp` | TIMESTAMPTZ | When the log entry was created |

#### Kubernetes Metadata

The `kubernetes` JSONB column includes the following context:

- `pod_name`, `pod_namespace`, `pod_uid`

- `container_name`, `container_image`

- `node_labels`

- `pod_labels`

- `pod_ip`, `pod_owner`

### Metrics Pipeline

Metrics collection follows a similar pattern but does not need transformations:

```yaml
sources:
otel:
type: opentelemetry
grpc:
address: 0.0.0.0:4317
http:
address: 0.0.0.0:4318

sinks:
metrics_postgrest:
type: http
inputs:
- otel.metrics
uri: "http://sas-retrieval-agent-manager-postgrest.retagentmgr.svc.cluster.local:3002/metrics"
headers:
Content-Type: "Application/json"
encoding:
codec: json
```

## Installation

To install Vector, edit the [example Vector values file](../../examples/vector.yaml) to your desired settings and run the following commands:

```sh
helm repo add vector https://helm.vector.dev
helm repo update

helm install vector vector/vector \
-n vector -f .\values.yaml \
--create-namespace --version 0.46.0
```

## PostgREST Integration

Vector sends data directly to PostgREST HTTP endpoints, which provides:

- Automatic API generation from PostgreSQL schema

- Role-based access control via PostgreSQL roles

- JSON validation and type safety

## Testing

### Manual Log Injection

Test the postgREST endpoint with a curl from within the cluster:

```bash
curl -X POST \
"http://sas-retrieval-agent-manager-postgrest.retagentmgr.svc.cluster.local:3002/logs" \
-H "Content-Type: application/json" \
-H "Prefer: return=representation" \
-d '{
"file": "/var/log/pods/test_pod/container/0.log",
"kubernetes": {
"container_name": "test-container",
"pod_name": "test-pod",
"pod_namespace": "default",
"pod_uid": "test-uid-12345"
},
"message": "Test log message",
"timestamp": "2025-11-10T18:00:00.000000Z"
}'
```

### Verify Vector is Running

```bash
# Check Vector pods
kubectl get pods -n vector

# View Vector logs
kubectl logs -n vector -l app.kubernetes.io/name=vector --tail=100

# Check for errors
kubectl logs -n vector -l app.kubernetes.io/name=vector | grep ERROR
```

## Troubleshooting

### Common Issues

#### 1. Schema Mismatch Errors

**Error**: `Could not find the 'source_type' column`

**Solution**: Add a VRL transform to remove fields not in your database schema:

```yaml
transforms:
remove_extra_fields:
type: remap
inputs:
- kube_logs
source: |
del(.source_type)
del(.stream)
```

#### 2. PostgREST Connection Failures

**Error**: `Service call failed. No retries or retries exhausted`

Check PostgREST is accessible:

```bash

kubectl get svc -n retagentmgr sas-retrieval-agent-manager-postgrest
kubectl get pods -n retagentmgr -l app.kubernetes.io/name=postgrest

```

## Related Documentation

- [Vector Documentation](https://vector.dev/docs/)
- [PostgREST API Reference](https://postgrest.org/en/stable/api.html)
- [OpenTelemetry Specification](https://opentelemetry.io/docs/)
- [VRL Language Reference](https://vector.dev/docs/reference/vrl/)
Loading