diff --git a/README.md b/README.md index 3ee0895..14ce323 100644 --- a/README.md +++ b/README.md @@ -20,6 +20,7 @@ - [Optional Components](#optional-components) - [Backup and Restore Guide](#backup-and-restore-guide) - [Connect an LLM](#connecting-different-llms) + - [Monitoring and Logging](#monitoring-and-logging) - [Troubleshooting](#troubleshooting) - [Common Issues](#common-issues) - [Debug Commands](#debug-commands) @@ -238,10 +239,12 @@ After you have access to the Kubernetes cluster, you must install the necessary SAS has partnered with [Weaviate](https://weaviate.io/) and supports it as a vector database alternative to PGVector storage. This installation is not required but is compatible with RAM. -| Component | Version | Example Values File | Installation Instructions | -|-----------|---------------|---------------------|---------------------------------------------------------------------------------------------| -| **Weaviate** |v17.3.3 |[weaviate.yaml](./examples/weaviate.yaml) | [instructions](./docs/user/DependencyInstall.md#weaviate) | -| **Ollama** |v1.12.0 |[ollama.yaml](./examples/ollama.yaml) | [instructions](./docs/llm-connection/ollama.md) | +| Component | Version | Example Values File | Installation Instructions | +|-----------|---------------|---------------------|----------------------------------------------------------------------------------------------| +| **Weaviate** |v17.3.3 |[weaviate.yaml](./examples/weaviate.yaml) | [instructions](./docs/user/DependencyInstall.md#weaviate) | +| **Ollama** |v1.12.0 |[ollama.yaml](./examples/ollama.yaml) | [instructions](./docs/llm-connection/ollama.md) | +| **Vector** | 0.46.0 |[example](./docs/monitoring/README.md) | [instructions](https://vector.dev/installation/) | +| **Phoenix** |v4.0.7 |[phoenix.yaml](./examples/phoenix.yaml) | [instructions](./docs/monitoring/traces.md) | ### Install SAS Retrieval Agent Manager @@ -283,6 +286,10 @@ To backup and restore the data you use RAM for, visit the [Backup and Restore pa To add different LLMs for RAM to use, visit the [Connecting an LLM page](./docs/llm-connection/README.md). +## Monitoring and Logging + +To monitor and log agent and LLM activity, visit the [Monitoring setup page](./docs/monitoring/README.md) + ## Troubleshooting ### Common Issues diff --git a/docs/monitoring/README.md b/docs/monitoring/README.md new file mode 100644 index 0000000..2f4f8d2 --- /dev/null +++ b/docs/monitoring/README.md @@ -0,0 +1,18 @@ +# Monitoring and Logging guide + +This folder provides documentation and instructions for managing logs, metrics, and traces using [Vector](https://vector.dev/), [Phoenix](https://phoenix.arize.com/), and [Langfuse](https://langfuse.com/). + +## Contents + +- [logs-and-metrics.md](./logs-and-metrics.md): Instructions for how to track and view logs and metrics using [Vector](https://vector.dev/). +- [traces.md](./traces.md): Instructions for how to track and view traces using Vector and [Phoenix](https://phoenix.arize.com/) or [Langfuse](https://langfuse.com/). + +## Purpose + +These documents are intended to help operators and users: + +- Deploy Vector and Phoenix in various cloud and on-premises environments +- Configure the endpoints for trace collection using phoenix or langfuse +- Adapt the values files to deploy phoenix on your cluster alongside RAM + +Refer to each file for detailed, step-by-step instructions tailored to your platform and use case. diff --git a/docs/monitoring/logs-and-metrics.md b/docs/monitoring/logs-and-metrics.md new file mode 100644 index 0000000..5d9ba7a --- /dev/null +++ b/docs/monitoring/logs-and-metrics.md @@ -0,0 +1,209 @@ +# Logs and Metrics in RAM + +The SAS Retrieval Agent Manager (RAM) system collects and stores logs and metrics using [Vector](https://vector.dev/), a high-performance observability data pipeline. Vector aggregates telemetry data from Kubernetes clusters and routes it to PostgreSQL via PostgREST for persistent storage and querying. + +## Architecture Overview + +```text +Kubernetes Logs / RAM APIMetrics → Vector → PostgREST → PostgreSQL +``` + +Vector runs as a DaemonSet in the cluster, collecting: + +- **Logs**: Container logs from all pods via Kubernetes log files + +- **Metrics**: Performance metrics, resource usage, and custom application metrics + +## Configuration + +### Vector Pipeline Components + +The Vector configuration consists of three main components: + +1. **Sources**: Data collection from Kubernetes +2. **Transforms**: Data processing and enrichment using VRL (Vector Remap Language) +3. **Sinks**: Delivery to PostgREST endpoints + +### Logs Pipeline + +Vector collects Kubernetes pod logs and enriches them with metadata: + +```yaml +sources: + kube_logs: + type: kubernetes_logs + auto_partial_merge: true + +transforms: + logs_transform: + type: remap + inputs: + - kube_logs + source: | + # Remove fields not in database schema + del(.source_type) + del(.stream) + +sinks: + logs_postgrest: + type: http + inputs: + - logs_transform + uri: "http://sas-retrieval-agent-manager-postgrest.retagentmgr.svc.cluster.local:3002/logs" + encoding: + codec: json + method: post +``` + +> Note: See a full [Vector example values file here](../../examples/vector.yaml) + +#### Log Schema + +Logs are stored in PostgreSQL with the following schema: + +| Column | Type | Description | +|--------|------|-------------| +| `file` | TEXT | Path to the log file in Kubernetes | +| `kubernetes` | JSONB | Kubernetes metadata (pod, namespace, labels, etc.) | +| `message` | TEXT | The actual log message | +| `timestamp` | TIMESTAMPTZ | When the log entry was created | + +#### Kubernetes Metadata + +The `kubernetes` JSONB column includes the following context: + +- `pod_name`, `pod_namespace`, `pod_uid` + +- `container_name`, `container_image` + +- `node_labels` + +- `pod_labels` + +- `pod_ip`, `pod_owner` + +### Metrics Pipeline + +Metrics collection follows a similar pattern but does not need transformations: + +```yaml +sources: + otel: + type: opentelemetry + grpc: + address: 0.0.0.0:4317 + http: + address: 0.0.0.0:4318 + +sinks: + metrics_postgrest: + type: http + inputs: + - otel.metrics + uri: "http://sas-retrieval-agent-manager-postgrest.retagentmgr.svc.cluster.local:3002/metrics" + headers: + Content-Type: "Application/json" + encoding: + codec: json +``` + +## Installation + +To install Vector, edit the [example Vector values file](../../examples/vector.yaml) to your desired settings and run the following commands: + +```sh +helm repo add vector https://helm.vector.dev +helm repo update + +helm install vector vector/vector \ + -n vector -f .\values.yaml \ + --create-namespace --version 0.46.0 +``` + +## PostgREST Integration + +Vector sends data directly to PostgREST HTTP endpoints, which provides: + +- Automatic API generation from PostgreSQL schema + +- Role-based access control via PostgreSQL roles + +- JSON validation and type safety + +## Testing + +### Manual Log Injection + +Test the postgREST endpoint with a curl from within the cluster: + +```bash +curl -X POST \ + "http://sas-retrieval-agent-manager-postgrest.retagentmgr.svc.cluster.local:3002/logs" \ + -H "Content-Type: application/json" \ + -H "Prefer: return=representation" \ + -d '{ + "file": "/var/log/pods/test_pod/container/0.log", + "kubernetes": { + "container_name": "test-container", + "pod_name": "test-pod", + "pod_namespace": "default", + "pod_uid": "test-uid-12345" + }, + "message": "Test log message", + "timestamp": "2025-11-10T18:00:00.000000Z" + }' +``` + +### Verify Vector is Running + +```bash +# Check Vector pods +kubectl get pods -n vector + +# View Vector logs +kubectl logs -n vector -l app.kubernetes.io/name=vector --tail=100 + +# Check for errors +kubectl logs -n vector -l app.kubernetes.io/name=vector | grep ERROR +``` + +## Troubleshooting + +### Common Issues + +#### 1. Schema Mismatch Errors + +**Error**: `Could not find the 'source_type' column` + +**Solution**: Add a VRL transform to remove fields not in your database schema: + +```yaml +transforms: + remove_extra_fields: + type: remap + inputs: + - kube_logs + source: | + del(.source_type) + del(.stream) +``` + +#### 2. PostgREST Connection Failures + +**Error**: `Service call failed. No retries or retries exhausted` + +Check PostgREST is accessible: + +```bash + +kubectl get svc -n retagentmgr sas-retrieval-agent-manager-postgrest +kubectl get pods -n retagentmgr -l app.kubernetes.io/name=postgrest + +``` + +## Related Documentation + +- [Vector Documentation](https://vector.dev/docs/) +- [PostgREST API Reference](https://postgrest.org/en/stable/api.html) +- [OpenTelemetry Specification](https://opentelemetry.io/docs/) +- [VRL Language Reference](https://vector.dev/docs/reference/vrl/) \ No newline at end of file diff --git a/docs/monitoring/traces.md b/docs/monitoring/traces.md new file mode 100644 index 0000000..6d4d885 --- /dev/null +++ b/docs/monitoring/traces.md @@ -0,0 +1,276 @@ +# Traces in RAM + +The SAS Retrieval Agent Manager (RAM) system collects and processes distributed traces using [Vector](https://vector.dev/), a high-performance observability data pipeline. Vector receives OpenTelemetry traces from applications and routes them to multiple observability backends for analysis and visualization. + +## Architecture Overview + +```text +RAM API → Vector (OTLP) → Phoenix / Langfuse +``` + +Vector runs as a DaemonSet in the cluster, collecting: + +- **Traces**: Distributed tracing data from AI agent operations, LangChain executions, and tool calls + +## Configuration + +### Vector Pipeline Components + +The Vector configuration consists of three main components: + +1. **Sources**: OTLP data collection from applications +2. **Transforms**: OTLP format reconstruction using VRL (Vector Remap Language) +3. **Sinks**: Delivery to observability platforms (Phoenix, Langfuse) + +### Traces Pipeline + +Vector accepts OpenTelemetry traces via both gRPC and HTTP protocols, however, we do have to rebuild them into the correct otlp format to pass into observability platforms: + +```yaml +sources: + # Collects traces, logs, and metrics from RAM + # Accessible through otel.logs, otel.metrics, and otel.traces + otel: + type: opentelemetry + grpc: + address: 0.0.0.0:4317 + http: + address: 0.0.0.0:4318 + +transforms: + # Transforms otel.traces into an OTLP-compliant form + rebuild_otlp_format: + type: remap + inputs: + - otel.traces + source: | + start_time_nanos = to_unix_timestamp(parse_timestamp!(.start_time_unix_nano, format: "%+"), unit: "nanoseconds") + end_time_nanos = to_unix_timestamp(parse_timestamp!(.end_time_unix_nano, format: "%+"), unit: "nanoseconds") + + attrs = [] + for_each(object!(.attributes)) -> |key, val| { + if is_string(val) { + attrs = push(attrs, {"key": key, "value": {"stringValue": to_string!(val)}}) + } else if is_integer(val) { + attrs = push(attrs, {"key": key, "value": {"intValue": to_string!(val)}}) + } else if is_float(val) { + attrs = push(attrs, {"key": key, "value": {"doubleValue": to_string!(val)}}) + } else if is_boolean(val) { + attrs = push(attrs, {"key": key, "value": {"boolValue": to_string!(val)}}) + } else { + attrs = push(attrs, {"key": key, "value": {"stringValue": to_string!(val)}}) + } + } + + # Convert resources to OTLP attribute array format + resource_attrs = [] + for_each(object!(.resources)) -> |key, value| { + resource_attrs = push(resource_attrs, {"key": key, "value": {"stringValue": string!(value)}}) + } + + # Build OTLP structure + . = { + "resourceSpans": [{ + "resource": { + "attributes": resource_attrs + }, + "scopeSpans": [{ + "spans": [{ + "traceId": .trace_id, + "spanId": .span_id, + "parentSpanId": .parent_span_id, + "name": .name, + "kind": .kind, + "startTimeUnixNano": start_time_nanos, + "endTimeUnixNano": end_time_nanos, + "attributes": attrs, + "status": .status, + "droppedAttributesCount": .dropped_attributes_count, + "droppedEventsCount": .dropped_events_count, + "droppedLinksCount": .dropped_links_count + }] + }] + }] + } + +sinks: + # Requires Phoenix deployment in cluster + phoenix: + inputs: + - rebuild_otlp_format + protocol: + compression: none + encoding: + codec: otlp + type: http + uri: http://phoenix-svc.phoenix.svc.cluster.local:6006/v1/traces + type: opentelemetry + + # Requires account and API key creation in Langfuse + # Requires a public Langfuse account + langfuse: + inputs: + - rebuild_otlp_format + protocol: + compression: none + encoding: + codec: otlp + type: http + uri: https://us.cloud.langfuse.com/api/public/otel/v1/traces + headers: + Accept: "*/*" + Authorization: "Basic " + type: opentelemetry +``` + +> Note: See a full [Vector example values file here](../../examples/vector.yaml) and full [Phoenix example values file here](../../examples/phoenix.yaml) + +### OTLP Format Transformation + +The `rebuild_otlp_format` transform is critical for ensuring traces conform to the OpenTelemetry Protocol specification: + +- **Timestamp Conversion**: Converts timestamps to Unix nanoseconds format +- **Attribute Mapping**: Maps Vector's internal attribute format to OTLP's typed key-value pairs +- **Resource Attributes**: Restructures resource metadata into OTLP format +- **Span Structure**: Builds the complete `resourceSpans` → `scopeSpans` → `spans` hierarchy + +## Installation + +To install Vector, edit the [example Vector values file](../../examples/vector.yaml) to your desired settings and run the following commands: + +```sh +helm install vector vector/vector \ + -n vector -f .\values.yaml \ + --create-namespace --version 0.46.0 +``` + +To install Phoenix, edit the [example Phoenix values file](../../examples/phoenix.yaml) to your desired settings and run the following commands: + +```sh +helm install phoenix oci://registry-1.docker.io/arizephoenix/phoenix-helm \ + -f .\values.yaml --version 4.0.7 \ + -n phoenix --create-namespace +``` + +## Observability Backends + +### Phoenix + +[Phoenix](https://github.com/Arize-ai/phoenix) is an open-source observability platform for LLM applications. + +**Requirements:** + +- Phoenix must be deployed in the cluster + +- Default endpoint: `http://phoenix-svc.phoenix.svc.cluster.local:6006` + +**Features:** + +- Real-time trace visualization + +- LLM performance metrics + +- Token usage tracking + +- Latency analysis + +### Langfuse + +[Langfuse](https://langfuse.com/) is a hosted observability and analytics platform for LLM applications. + +**Requirements:** + +- Langfuse account + +- API key pair (public key and secret key) + +- Base64-encoded credentials in format: `base64(public_key:secret_key)` + +**Features:** + +- Trace persistence and historical analysis + +- Cost tracking and budgeting + +- Team collaboration + +- Advanced filtering and search + +## Testing + +### Verify Vector is Running + +```bash +# Check Vector pods +kubectl get pods -n vector + +# View Vector logs +kubectl logs -n vector -l app.kubernetes.io/name=vector --tail=100 + +# Check for trace processing +kubectl logs -n vector -l app.kubernetes.io/name=vector | grep "otel.traces" +``` + +### Verify Traces in Phoenix + +```bash +# Port-forward Phoenix UI +kubectl port-forward -n phoenix svc/phoenix-svc 6006:6006 + +# Open in browser +open http://localhost:6006 +``` + +## Troubleshooting + +### Common Issues + +#### 1. OTLP Format Errors + +**Error**: Traces not appearing in Phoenix/Langfuse + +**Solution**: Verify the OTLP transformation is working: + +```bash +# Check Vector logs for transform errors +kubectl logs -n vector -l app.kubernetes.io/name=vector | grep "rebuild_otlp_format" + +# Verify trace structure is correct +kubectl logs -n vector -l app.kubernetes.io/name=vector | grep "resourceSpans" +``` + +#### 2. Phoenix Connection Failures + +**Error**: `Service call failed` or connection timeouts + +Check Phoenix is accessible: + +```bash +kubectl get svc -n phoenix phoenix-svc +kubectl get pods -n phoenix + +# Test connectivity from Vector pod +kubectl exec -n vector -- curl http://phoenix-svc.phoenix.svc.cluster.local:6006/healthz +``` + +#### 3. Langfuse Authentication Issues + +**Error**: 401 Unauthorized + +Verify credentials are properly encoded: + +```bash +# Test your credentials +echo -n "pk-lf-xxx:sk-lf-xxx" | base64 + +# Verify the header in Vector config matches +kubectl get configmap -n vector vector-config -o yaml | grep Authorization +``` + +## Related Documentation + +- [Vector Documentation](https://vector.dev/docs/) +- [OpenTelemetry Specification](https://opentelemetry.io/docs/) +- [Phoenix Documentation](https://docs.arize.com/phoenix/) +- [Langfuse Documentation](https://langfuse.com/docs) +- [VRL Language Reference](https://vector.dev/docs/reference/vrl/) \ No newline at end of file diff --git a/docs/user/DependencyInstall.md b/docs/user/DependencyInstall.md index 3681472..bb9394c 100644 --- a/docs/user/DependencyInstall.md +++ b/docs/user/DependencyInstall.md @@ -76,7 +76,7 @@ helm install kueue oci://registry.k8s.io/kueue/charts/kueue \ SAS Retrieval Agent Manager requires the NGINX Ingress controller for managing incoming traffic. -Here is an [Example NGINX Controller Values File](../../examples/nginx.yaml). +Here is an [Example NGINX Controller Values File](../../examples/nginx.yaml). You can edit it as you'd like to fit your deployment. You can install it onto your cluster with the following commands: @@ -94,6 +94,14 @@ helm install nginx-ingress-nginx-controller \ --create-namespace ``` +### Vector + +SAS Retrieval Agent Manager requires Vector for collecting, viewing, and managing logs/metrics. + +Here is an [Example Vector Values File](../../examples/vector.yaml). You can edit it as you'd like to fit your deployment. + +You can install it onto your cluster by reading the [installation instructions found here](../monitoring/logs-and-metrics.md#installation). + ## Optional Components ### Weaviate @@ -116,3 +124,11 @@ helm install weaviate weaviate/weaviate \ -f \ --create-namespace ``` + +### Phoenix + +SAS Retrieval Agent Manager supports [Phoenix](https://github.com/Arize-ai/phoenix), an open-source observability platform for LLM applications. + +Here is an [Example Phoenix Values File](../../examples/phoenix.yaml). You can edit it as you'd like to fit your deployment. + +You can look at [installation instructions here](../monitoring/traces.md#installation). diff --git a/examples/aws/aws-ram-values.yaml b/examples/aws/aws-ram-values.yaml index 6200aa0..b38151d 100644 --- a/examples/aws/aws-ram-values.yaml +++ b/examples/aws/aws-ram-values.yaml @@ -232,8 +232,15 @@ global: # -- Embedding service configuration embedding: + # -- Embedding service database user + user: 'sas_ram_embedding_user' + # -- Embedding service database user password + password: # -- Persistent storage configuration for embeddings pvc: + storageClassName: "efs-sc" + # -- Must be false for AWS EFS + createStorageClass: false # -- PVC Size for embedding storage size: 20Gi @@ -250,12 +257,6 @@ global: sslVerify: 'True' # -- SAS license content license: - # -- Embedding service user credentials - embedding: - # -- Embedding service database user - user: 'sas_ram_embedding_user' - # -- Embedding service database user password - password: # -- Evaluation service configuration # eval: diff --git a/examples/azure/azure-ram-values.yaml b/examples/azure/azure-ram-values.yaml index 7350f62..38dac41 100644 --- a/examples/azure/azure-ram-values.yaml +++ b/examples/azure/azure-ram-values.yaml @@ -227,6 +227,10 @@ global: # -- Embedding service configuration embedding: + # -- Embedding service database user + user: 'sas_ram_embedding_user' + # -- Embedding service database user password + password: # -- Persistent storage configuration for embeddings pvc: # -- PVC Size for embedding storage @@ -245,12 +249,6 @@ global: sslVerify: 'True' # -- SAS license content license: - # -- Embedding service user credentials - embedding: - # -- Embedding service database user - user: 'sas_ram_embedding_user' - # -- Embedding service database user password - password: # -- Evaluation service configuration # eval: diff --git a/examples/k8s/k8s-ram-values.yaml b/examples/k8s/k8s-ram-values.yaml index 5f8b02a..a80e778 100644 --- a/examples/k8s/k8s-ram-values.yaml +++ b/examples/k8s/k8s-ram-values.yaml @@ -228,6 +228,10 @@ global: # -- Embedding service configuration embedding: + # -- Embedding service database user + user: 'sas_ram_embedding_user' + # -- Embedding service database user password + password: # -- Persistent storage configuration for embeddings pvc: # -- PVC Size for embedding storage @@ -246,12 +250,6 @@ global: sslVerify: 'True' # -- SAS license content license: - # -- Embedding service user credentials - embedding: - # -- Embedding service database user - user: 'sas_ram_embedding_user' - # -- Embedding service database user password - password: # -- Evaluation service configuration # eval: diff --git a/examples/phoenix.yaml b/examples/phoenix.yaml new file mode 100644 index 0000000..315a694 --- /dev/null +++ b/examples/phoenix.yaml @@ -0,0 +1,309 @@ +# Phoenix Helm Chart Values +# This file contains configuration values for deploying Phoenix via Helm. +# Each value corresponds to an environment variable described in https://arize.com/docs/phoenix/self-hosting/configuration. +extraObjects: [] + ### REQUIRED ### + # -- Ingress TLS secret for RAM HTTPS termination + # -- TLS Certificate for secure external access + # - apiVersion: v1 + # kind: Secret + # metadata: + # name: ingress-tls + # namespace: retagentmgr + # data: + # tls.crt: >- + # + # tls.key: >- + # + # type: kubernetes.io/tls + +# Replica count +# -- Number of Phoenix pod replicas +replicaCount: 1 + +# Deployment strategy +deployment: + # -- Deployment strategy + strategy: + type: RollingUpdate + rollingUpdate: + maxUnavailable: "25%" + maxSurge: "25%" + + # -- Tolerations, nodeSelector and affinity + # For Pod scheduling strategy on the nodes + tolerations: [] + nodeSelector: {} + affinity: {} + +postgresql: + # -- Enable PostgreSQL deployment. Set to false if you have your own postgres instance (e.g., RDS, CloudSQL) + # When disabled, you must configure database.url or database.postgres settings to point to your external database + # IMPORTANT: Cannot be enabled simultaneously with persistence.enabled=true (for SQLite) + # Choose one persistence strategy: + # - groundhog2k PostgreSQL: postgresql.enabled=true, persistence.enabled=false + # - SQLite: postgresql.enabled=false, persistence.enabled=true + # - External DB: postgresql.enabled=false, persistence.enabled=false, database.url configured + enabled: false + +ingress: + # -- Annotations to add to the ingress resource + annotations: {} + + # -- Path prefix for the Phoenix API + apiPath: + + # -- Enable ingress controller for external access + enabled: true + + # -- Hostname for ingress + host: + + # -- Labels to add to the ingress resource + labels: {} + + # -- Ingress path type (Prefix, Exact, or ImplementationSpecific) + pathType: "Prefix" + + tls: + # -- Enable TLS/HTTPS for ingress + enabled: true + secretName: + +server: + # -- Annotations to add to the Phoenix service + annotations: {} + + # -- Enable Prometheus metrics endpoint on port 9090 + enablePrometheus: false + + # -- Port for OpenTelemetry gRPC collector (PHOENIX_GRPC_PORT) + grpcPort: 4317 + + # -- Host IP to bind Phoenix server (PHOENIX_HOST) + host: "0.0.0.0" + + # -- Root path prefix for Phoenix UI and API (PHOENIX_HOST_ROOT_PATH) + hostRootPath: "" + + # -- Labels to add to the Phoenix service + labels: {} + + # -- Port for Phoenix web UI and HTTP API (PHOENIX_PORT) + port: 6006 + + rootUrl: + + # -- The working directory for saving, loading, and exporting data (PHOENIX_WORKING_DIR) + # Set to empty string to use container's $HOME directory (not recommended for persistence) + # Use `/data` as a default for volume mount - enables proper permissions in both strict and normal security contexts + # IMPORTANT: When persistence.enabled=true, this directory must be writable by the Phoenix container (UID 65532) + # The fsGroup setting in securityContext.pod ensures proper permissions when enabled + workingDir: "/data" + + # -- Allows calls to external resources, like Google Fonts in the web interface (PHOENIX_ALLOW_EXTERNAL_RESOURCES) + # Set to false in air-gapped environments to prevent external requests that can cause UI loading delays + allowExternalResources: true + +# Service configuration +service: + # -- Service type for Phoenix service (ClusterIP, NodePort, LoadBalancer, or ExternalName) + # Use ClusterIP for service mesh deployments (Istio, Linkerd, etc.) + # Use NodePort for direct external access without ingress + type: "ClusterIP" + + # -- Annotations to add to the Phoenix service (useful for service mesh configurations) + annotations: + {} + # For Istio service mesh, you might want: + # service.istio.io/canonical-name: phoenix + # service.istio.io/canonical-revision: stable + + # -- Labels to add to the Phoenix service + labels: + {} + # For service mesh deployments, you might want: + # app: phoenix + # version: stable + +# Persistence configuration for Phoenix home directory +persistence: + enabled: false + + +database: + # -- Storage allocation in GiB for the database persistent volume + allocatedStorageGiB: 20 + + # -- Default retention policy for traces in days (PHOENIX_DEFAULT_RETENTION_POLICY_DAYS) + # Set to 0 to disable automatic trace cleanup. When set to a positive value, + # traces older than this many days will be automatically removed from the database. + defaultRetentionPolicyDays: 0 + + postgres: + # -- Name of the PostgreSQL database (PHOENIX_POSTGRES_DB) + db: "SASRetrievalAgentManagerMonitoring" + + # -- Postgres Host (PHOENIX_POSTGRES_HOST) + # Default points to the groundhog2k PostgreSQL service when postgresql.enabled=true + # IMPORTANT: Only change this when using external PostgreSQL (postgresql.enabled=false, database.url empty) + # Examples: "localhost", "postgres.example.com", "your-rds-endpoint.region.rds.amazonaws.com" + host: + + # -- PostgreSQL password (should match auth.secret."PHOENIX_POSTGRES_PASSWORD", PHOENIX_POSTGRES_PASSWORD) + password: + + # -- Port number for PostgreSQL connections (PHOENIX_POSTGRES_PORT) + port: 5432 + + # -- PostgreSQL schema to use (PHOENIX_SQL_DATABASE_SCHEMA) + schema: "phoenix" + + # -- PostgreSQL username (PHOENIX_POSTGRES_USER) + user: + + # -- Full database connection URL (overrides postgres settings if provided) + # IMPORTANT: Only set this for external databases (Strategy 3) + # - When using SQLite (Strategy 1): MUST be empty - SQLite auto-uses persistent volume + # - When using built-in PostgreSQL (Strategy 2): MUST be empty - auto-configured + # - When using external database (Strategy 3): MUST be configured with full connection string + # + # Examples for external databases: + # PostgreSQL: "postgresql://username:password@your-rds-endpoint.region.rds.amazonaws.com:5432/phoenix" + # SQLite: "sqlite:///path/to/database.db" (only for external SQLite files, not recommended) + # + # WARNING: Setting this will override all database.postgres.* settings and disable built-in PostgreSQL validation + # url: "" + +# Authentication and security +auth: + # -- Duration in minutes before access tokens expire and require renewal (PHOENIX_ACCESS_TOKEN_EXPIRY_MINUTES) + accessTokenExpiryMinutes: 60 + + # FIX: Add your domain to CORS/CSRF + allowedOrigins: + - + - "http://localhost:6006" + + csrfTrustedOrigins: + - + - "http://localhost:6006" + + defaultAdminPassword: "iotorion123!" + + enableAuth: false + + # -- Name of the Kubernetes secret containing authentication credentials + name: "phoenix-secret" + + # -- Duration in minutes before password reset tokens expire (PHOENIX_PASSWORD_RESET_TOKEN_EXPIRY_MINUTES) + passwordResetTokenExpiryMinutes: 60 + + # -- Duration in minutes before refresh tokens expire (PHOENIX_REFRESH_TOKEN_EXPIRY_MINUTES) + refreshTokenExpiryMinutes: 43200 + + secret: + # -- Environment variable name for the main Phoenix secret key used for encryption + - key: "PHOENIX_SECRET" + # -- Autogenerated if empty + value: "" + # -- Use this for existing Secrets / Configmaps, takes precedence over auth.secret[].value + # valueFrom: + # secretKeyRef: + # name: my-secret + # key: phoenix-secret-key + + # -- Environment variable name for the admin secret key + - key: "PHOENIX_ADMIN_SECRET" + # -- Autogenerated if empty + value: "" + + # -- Environment variable name for the PostgreSQL password + - key: "PHOENIX_POSTGRES_PASSWORD" + # -- If using postgres in this chart, password must match with database.postgres.password + value: "iotorion123!" + + # -- Environment variable name for the SMTP password + - key: "PHOENIX_SMTP_PASSWORD" + # -- Autogenerated if empty + value: "" + + # -- Environment variable name for the default admin password + - key: "PHOENIX_DEFAULT_ADMIN_INITIAL_PASSWORD" + # -- Default password for the admin user on initial setup, uses defaultAdminPassword if empty + value: + + # -- Enable secure cookies (should be true when using HTTPS) + useSecureCookies: false + + # OAuth2/OIDC Identity Provider Configuration + # Configure OAuth2 identity providers for authentication + oauth2: + # -- Enable OAuth2/OIDC authentication + enabled: false + + # -- List of OAuth2 identity providers to configure + # Each provider requires client_id, client_secret, and oidc_config_url + # Optional settings include display_name, allow_sign_up, and auto_login + # You can also define corresponding ENVs via auth.secrets[].valueFrom to use existing secrets + # ENVs: PHOENIX_OAUTH2_{{ $provider_upper }}_{{ setting }}, e.g. PHOENIX_OAUTH2_GOOGLE_CLIENT_SECRET + providers: + # Example Google configuration: + # google: + # client_id: "your-google-client-id" + # client_secret: "your-google-client-secret" + # oidc_config_url: "https://accounts.google.com/.well-known/openid-configuration" + # display_name: "Google" # Optional, defaults to provider name + # allow_sign_up: true # Optional, defaults to true + # auto_login: false # Optional, defaults to false + + # Example AWS Cognito configuration: + # aws_cognito: + # client_id: "your-aws-cognito-client-id" + # client_secret: "your-aws-cognito-client-secret" + # oidc_config_url: "https://cognito-idp.us-east-1.amazonaws.com/us-east-1_xxxxx/.well-known/openid-configuration" + # display_name: "AWS Cognito" + # allow_sign_up: true + # auto_login: false + + # Example Microsoft Entra ID configuration: + # microsoft_entra_id: + # client_id: "your-microsoft-entra-id-client-id" + # client_secret: "your-microsoft-entra-id-client-secret" + # oidc_config_url: "https://login.microsoftonline.com/your-tenant-id/v2.0/.well-known/openid-configuration" + # display_name: "Microsoft Entra ID" + # allow_sign_up: true + # auto_login: false + + # Example Keycloak configuration: + # keycloak: + # client_id: "phoenix" + # client_secret: "your-keycloak-client-secret" + # oidc_config_url: "https://your-keycloak-server/realms/your-realm/.well-known/openid-configuration" + # display_name: "Keycloak" + # allow_sign_up: true + # auto_login: false + + +# Logging +logging: + # -- Database logging level (debug, info, warning, error) PHOENIX_DB_LOGGING_LEVEL + dbLevel: "warning" + + # -- Application logging level (debug, info, warning, error) PHOENIX_LOGGING_LEVEL + level: "info" + + # -- Enable logging of database migration operations (PHOENIX_LOG_MIGRATIONS) + logMigrations: true + + # -- Logging mode configuration - PHOENIX_LOGGING_MODE (default|structured) + mode: "default" + +# Instrumentation +instrumentation: + # -- OpenTelemetry collector gRPC endpoint for sending traces (PHOENIX_SERVER_INSTRUMENTATION_OTLP_TRACE_COLLECTOR_GRPC_ENDPOINT) + otlpTraceCollectorGrpcEndpoint: "" + + # -- OpenTelemetry collector HTTP endpoint for sending traces (PHOENIX_SERVER_INSTRUMENTATION_OTLP_TRACE_COLLECTOR_HTTP_ENDPOINT) + otlpTraceCollectorHttpEndpoint: "" + diff --git a/examples/vector.yaml b/examples/vector.yaml new file mode 100644 index 0000000..e33e34f --- /dev/null +++ b/examples/vector.yaml @@ -0,0 +1,282 @@ +# extraObjects -- Create extra manifests via values. Would be passed through `tpl` for templating. +extraObjects: + - apiVersion: rbac.authorization.k8s.io/v1 + kind: ClusterRole + metadata: + name: vector + rules: + - apiGroups: ["*"] + resources: ["*"] + verbs: ["*"] + - nonResourceURLs: ["*"] + verbs: ["*"] + + - apiVersion: rbac.authorization.k8s.io/v1 + kind: ClusterRoleBinding + metadata: + name: vector + roleRef: + apiGroup: rbac.authorization.k8s.io + kind: ClusterRole + name: vector + subjects: + - kind: ServiceAccount + name: vector + namespace: vector +# Default values for Vector +# See Vector helm documentation to learn more: +# https://vector.dev/docs/setup/installation/package-managers/helm/ + +# nameOverride -- Override the name of resources. +nameOverride: "" + +# fullnameOverride -- Override the full name of resources. +fullnameOverride: "" + +# role -- [Role](https://vector.dev/docs/setup/deployment/roles/) for this Vector instance, valid options are: +# "Agent", "Aggregator", and "Stateless-Aggregator". + +# Each role is created with the following workloads: +# Agent = DaemonSet +# Aggregator = StatefulSet +# Stateless-Aggregator = Deployment +role: "Agent" + +# rollWorkload -- Add a checksum of the generated ConfigMap to workload annotations. +rollWorkload: true + +# rollWorkloadSecrets -- Add a checksum of the generated Secret to workload annotations. +rollWorkloadSecrets: false + +# rollWorkloadExtraObjects -- Add a checksum of the generated ExtraObjects to workload annotations. +rollWorkloadExtraObjects: false + +# commonLabels -- Add additional labels to all created resources. +commonLabels: {} + +# Define the Vector image to use. +image: + # image.repository -- Override default registry and name for Vector's image. + repository: timberio/vector + # image.pullPolicy -- The [pullPolicy](https://kubernetes.io/docs/concepts/containers/images/#image-pull-policy) for + # Vector's image. + pullPolicy: IfNotPresent + # image.pullSecrets -- The [imagePullSecrets](https://kubernetes.io/docs/concepts/containers/images/#specifying-imagepullsecrets-on-a-pod) + # to reference for the Vector Pods. + pullSecrets: [] + # image.tag -- The tag to use for Vector's image. + # @default -- Derived from the Chart's appVersion. + tag: "nightly-2025-10-25-debian" + # image.sha -- The SHA to use for Vector's image. + sha: "" + # image.base -- The base distribution to use for vector. If set, then the base in appVersion will be replaced with this base alongside the version. + # For example: with a `base` of `debian` `0.38.0-distroless-libc` becomes `0.38.0-debian` + base: "" + +# replicas -- Specify the number of Pods to create. Valid for the "Aggregator" and "Stateless-Aggregator" roles. +replicas: 1 + +# Adding additional entries with hostAliases +hostAliases: [] +# - ip: "127.0.0.1" +# hostnames: +# - "foo.local" +# - "bar.local" + + +# podManagementPolicy -- Specify the [podManagementPolicy](https://kubernetes.io/docs/concepts/workloads/controllers/statefulset/#pod-management-policies) +# for the StatefulSet. Valid for the "Aggregator" role. +podManagementPolicy: OrderedReady + +# Create a Secret resource for Vector to use. +secrets: + # secrets.generic -- Each Key/Value will be added to the Secret's data key, each value should be raw and NOT base64 + # encoded. Any secrets can be provided here. It's commonly used for credentials and other access related values. + # **NOTE: Don't commit unencrypted secrets to git!** + generic: {} + # my_variable: "my-secret-value" + # datadog_api_key: "api-key" + # awsAccessKeyId: "access-key" + # awsSecretAccessKey: "secret-access-key" + + +# args -- Override Vector's default arguments. +args: + - --config-dir + - "/etc/vector/" + +# env -- Set environment variables for Vector containers. +env: + - name: VECTOR_LOG + value: info + - name: VECTOR_SELF_NODE_NAME + valueFrom: + fieldRef: + apiVersion: v1 + fieldPath: spec.nodeName + +service: + enabled: true + type: ClusterIP + ports: + # Add these OpenTelemetry ports + - name: otel-http + port: 4318 + targetPort: 4318 + protocol: TCP + - name: otel-grpc + port: 4317 + targetPort: 4317 + protocol: TCP + # Keep the API port + - name: api + port: 8686 + targetPort: 8686 + protocol: TCP + +# customConfig -- Override Vector's default configs, if used **all** options need to be specified. This section supports +# using helm templates to populate dynamic values. See Vector's [configuration documentation](https://vector.dev/docs/reference/configuration/) +# for all options. +customConfig: + data_dir: /vector-data-dir + api: + enabled: true + address: 0.0.0.0:8686 + playground: false + + sources: + # Collects traces, logs, and metrics from ram. Accessible through otel.logs, otel.metrics, and otel.traces + otel: + type: opentelemetry + grpc: + address: 0.0.0.0:4317 + http: + address: 0.0.0.0:4318 + + # Collects logs from Kubernetes + kube_logs: + type: kubernetes_logs + auto_partial_merge: true + timezone: local + + # Transforms otel.traces into an otel-compliant form + transforms: + rebuild_otlp_format: + type: remap + inputs: + - otel.traces + source: | + start_time_nanos = to_unix_timestamp(parse_timestamp!(.start_time_unix_nano, format: "%+"), unit: "nanoseconds") + end_time_nanos = to_unix_timestamp(parse_timestamp!(.end_time_unix_nano, format: "%+"), unit: "nanoseconds") + + attrs = [] + for_each(object!(.attributes)) -> |key, val| { + if is_string(val) { + attrs = push(attrs, {"key": key, "value": {"stringValue": to_string!(val)}}) + } else if is_integer(val) { + attrs = push(attrs, {"key": key, "value": {"intValue": to_string!(val)}}) + } else if is_float(val) { + attrs = push(attrs, {"key": key, "value": {"doubleValue": to_string!(val)}}) + } else if is_boolean(val) { + attrs = push(attrs, {"key": key, "value": {"boolValue": to_string!(val)}}) + } else { + attrs = push(attrs, {"key": key, "value": {"stringValue": to_string!(val)}}) + } + } + + # Convert resources to OTLP attribute array format + resource_attrs = [] + for_each(object!(.resources)) -> |key, value| { + resource_attrs = push(resource_attrs, {"key": key, "value": {"stringValue": string!(value)}}) + } + + # Build OTLP structure + . = { + "resourceSpans": [{ + "resource": { + "attributes": resource_attrs + }, + "scopeSpans": [{ + "spans": [{ + "traceId": .trace_id, + "spanId": .span_id, + "parentSpanId": .parent_span_id, + "name": .name, + "kind": .kind, + "startTimeUnixNano": start_time_nanos, + "endTimeUnixNano": end_time_nanos, + "attributes": attrs, + "status": .status, + "droppedAttributesCount": .dropped_attributes_count, + "droppedEventsCount": .dropped_events_count, + "droppedLinksCount": .dropped_links_count + }] + }] + }] + } + + # Removes unecessary fields from logs to be inserted via postgrest + sanitize_logs: + type: remap + inputs: + - kube_logs + source: | + # Remove fields that don't exist in the database schema + del(.source_type) + del(.stream) + + sinks: + # Sends otel metrics to postgres via postgrest + metrics_postgrest: + type: http + inputs: + - otel.metrics + uri: "http://sas-retrieval-agent-manager-postgrest.retagentmgr.svc.cluster.local:3002/metrics" + headers: + Content-Type: "Application/json" + encoding: + codec: json + + # Sends kubernetes logs to postgres via postgrest + logs_postgrest: + type: http + inputs: + - sanitize_logs + # Could be different URL depending on postgrest settings + uri: "http://sas-retrieval-agent-manager-postgrest.retagentmgr.svc.cluster.local:3002/logs" + headers: + Content-Type: "Application/json" + encoding: + codec: json + + # Requires Phoenix deployment in cluster + phoenix: + inputs: + - rebuild_otlp_format + protocol: + compression: none + encoding: + codec: otlp + type: http + uri: http://phoenix-svc.phoenix.svc.cluster.local:6006/v1/traces + type: opentelemetry + + # Requires account and API key creation in langfuse + # Requires a public langfuse account + langfuse: + inputs: + - rebuild_otlp_format + protocol: + compression: none + encoding: + codec: otlp + type: http + uri: https://us.cloud.langfuse.com/api/public/otel/v1/traces + headers: + Accept: "*/*" + Authorization: "Basic " + type: opentelemetry + +extraVolumeMounts: [] + +extraVolumes: [] \ No newline at end of file