Skip to content

Commit d74c88a

Browse files
authored
Merge pull request #51 from graphistry/telemetry-k8s-docs
Add documentation for deploying Telemetry services on Kubernetes
2 parents d7be55a + 5382a38 commit d74c88a

File tree

5 files changed

+106
-8
lines changed

5 files changed

+106
-8
lines changed

docs/admin.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,7 @@
1616
app-config/index
1717
debugging/index
1818
security/index
19+
telemetry/index
1920
tools/index
2021
faq
2122
support
Lines changed: 8 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,12 @@
1-
# Telemetry
1+
# Docker Compose Telemetry
22

33
## Overview
44

55
Graphistry services export telemetry information (metrics and traces) using the [OpenTelemetry](https://opentelemetry.io/) standard.
66

77
Graphistry services push their telemetry data to the [opentelemetry-collector](https://opentelemetry.io/docs/collector/) service (alias `otel-collector`) and this will forward the data to any observability tool that is compatible with the OpenTelemetry standard (e.g. Prometheus, Jaeger, Grafana Cloud, etc.).
88

9-
## Telemetry Deployment Modes in Graphistry
9+
## Telemetry Deployment Modes
1010

1111
When telemetry services are enabled, the OpenTelemetry Collector will be included in all deployment scenarios:
1212

@@ -36,6 +36,9 @@ cd $GRAPHISTRY_HOME
3636
If you need to manage individual telemetry services, you can use the following commands. Each command starts a specific service:
3737

3838
```bash
39+
# Start the Node Exporter to collect and expose system-level metrics (e.g., CPU, memory, disk, and network).
40+
./release up -d node-exporter
41+
3942
# Start the NVIDIA Data Center GPU Manager Exporter (DCGM Exporter) for GPU monitoring
4043
./release up -d dcgm-exporter
4144

@@ -73,11 +76,9 @@ Use this URL when the service is [behind Caddy](#caddyfile---reverse-proxy-set-u
7376
Use this URL when the service is **not behind Caddy**: `https://$GRAPHISTRY_HOST:16686/jaeger/`
7477

7578
### Grafana dashboard
76-
Grafana will include GPU metrics and dashboards from NVIDIA Data Center GPU Manager: `DCGM Exporter Dashboards`:
77-
78-
Use this URL when the service is [behind Caddy](#caddyfile---reverse-proxy-set-up): `https://$GRAPHISTRY_HOST/grafana/`
79-
80-
Use this URL when the service is **not behind Caddy**: `https://$GRAPHISTRY_HOST:3000`
79+
Grafana will include GPU metrics and dashboards from NVIDIA Data Center GPU Manager: `DCGM Exporter Dashboards`, as well as the Node Exporter Dashboard for system-level metrics (e.g., CPU, memory, disk, and network).
80+
- Use this URL when the service is [behind Caddy](#caddyfile---reverse-proxy-set-up): `https://$GRAPHISTRY_HOST/grafana/`
81+
- Use this URL when the service is **not behind Caddy**: `https://$GRAPHISTRY_HOST:3000`
8182

8283
## Configuration
8384

docs/telemetry/index.rst

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
Telemetry
2+
========================
3+
4+
.. toctree::
5+
:maxdepth: 1
6+
7+
docker-compose
8+
kubernetes

docs/telemetry/kubernetes.md

Lines changed: 89 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,89 @@
1+
# Kubernetes Telemetry
2+
3+
## Overview
4+
To deploy OpenTelemetry services for Graphistry in a Kubernetes environment, you will need to configure the system using Helm values. For comprehensive documentation on deploying Graphistry with Helm, refer to the official documentation at [Graphistry Helm Documentation](https://graphistry-helm.readthedocs.io/). Additionally, you can explore the open-source Helm project for Graphistry on GitHub at [Graphistry Helm GitHub](https://github.com/graphistry/graphistry-helm).
5+
6+
## Telemetry Deployment Modes
7+
Graphistry services export telemetry data (metrics and traces) using the OpenTelemetry standard. In Kubernetes, the telemetry data is pushed to the OpenTelemetry Collector (otel-collector), which forwards it to observability tools such as Prometheus, Jaeger, Grafana Cloud, etc.
8+
9+
Kubernetes supports two primary modes of telemetry deployment, similar to Docker Compose:
10+
11+
### Forwarding to External Services (Cloud Mode)
12+
When the Helm value `telemetryEnv.OTEL_CLOUD_MODE` is `true`, telemetry data is forwarded to external services like `Grafana Cloud`, similar to [Docker Compose’s Forwarding to External Services mode](./docker-compose.md#forwarding-to-external-services).
13+
14+
### Using Packaged Observability Tools
15+
When the Helm value `telemetryEnv.OTEL_CLOUD_MODE` is `false`, the stack bundled with Graphistry (Prometheus, Jaeger, Grafana) is deployed, and telemetry data is exported to these tools, similar to [Docker Compose’s Using Packaged Observability Tools mode](docker-compose.md#using-packaged-observability-tools).
16+
17+
### Hybrid Mode
18+
You can also configure a Hybrid Mode, combining both local tools and external services. This requires custom Helm chart adjustments to forward data to both local and external observability services. See [Docker Compose’s Hybrid Mode](docker-compose.md#hybrid-mode) for more information.
19+
20+
## Prerequisites
21+
22+
Before deploying OpenTelemetry services for Graphistry on Kubernetes, ensure you have the following prerequisites in place:
23+
24+
1. **Kubernetes Cluster**: You must have access to a running Kubernetes cluster.
25+
2. **Helm**: Helm is the package manager for Kubernetes that simplifies the deployment and management of applications.
26+
3. **Graphistry Helm Project**: You must have the `graphistry-helm` project cloned or downloaded to your local machine. This project contains the necessary Helm charts and configurations for deploying Graphistry services with Kubernetes. You can find the project and instructions in the official [Graphistry Helm GitHub repository](https://github.com/graphistry/graphistry-helm).
27+
4. **Access to Required Resources**: Ensure you have the necessary permissions to deploy applications to the Kubernetes cluster. You may need appropriate access rights to the cloud provider's Kubernetes resources or administrative permissions for your self-hosted Kubernetes environment.
28+
29+
## Helm Values for OpenTelemetry in Kubernetes
30+
31+
To deploy OpenTelemetry for Graphistry in a Kubernetes environment, you'll need to configure the Helm deployment with specific values. These values are typically defined in a `values.yaml` file, which will replace the Docker Compose configuration in your setup.
32+
33+
The following is an example of the configuration you would include in your `values.yaml` file to deploy OpenTelemetry services within Kubernetes:
34+
35+
```yaml
36+
global: ## global settings for all charts
37+
ENABLE_OPEN_TELEMETRY: true
38+
39+
# Graphistry Telemetry values and environment variables for observability tools
40+
# can be set like helm upgrade -i chart_name --name release_name \
41+
#--set stENVPublic.LOG_LEVEL="FOO"
42+
# Telemetry documentation:
43+
# https://github.com/graphistry/graphistry-cli/blob/master/docs/tools/telemetry.md#kubernetes-deployment
44+
telemetryEnv:
45+
OTEL_CLOUD_MODE: false # false: deploy our stack: jaeger, prometheus, grafana etc.; else fill OTEL_COLLECTOR_OTLP_HTTP_ENDPOINT and credentials bellow
46+
openTelemetryCollector:
47+
image: "otel/opentelemetry-collector-contrib:0.87.0"
48+
OTEL_COLLECTOR_OTLP_HTTP_ENDPOINT: "" # e.g. Grafana OTLP HTTP endpoint for Graphistry Hub https://otlp-gateway-prod-us-east-0.grafana.net/otlp
49+
OTEL_COLLECTOR_OTLP_USERNAME: "" # e.g. Grafana Cloud Instance ID for OTLP
50+
OTEL_COLLECTOR_OTLP_PASSWORD: "" # e.g. Grafana Cloud API Token for OTLP
51+
grafana:
52+
image: "grafana/grafana:11.0.0"
53+
GF_SERVER_ROOT_URL: "/grafana"
54+
GF_SERVER_SERVE_FROM_SUB_PATH: "true"
55+
dcgmExporter:
56+
image: "nvcr.io/nvidia/k8s/dcgm-exporter:3.3.5-3.4.1-ubuntu22.04"
57+
DCGM_EXPORTER_CLOCK_EVENTS_COUNT_WINDOW_SIZE: 1000 # milliseconds
58+
jaeger:
59+
image: "jaegertracing/all-in-one:1.50.0"
60+
nodeExporter:
61+
image: "prom/node-exporter:v1.8.2"
62+
prometheus:
63+
image: "prom/prometheus:v2.47.2"
64+
```
65+
66+
## Configuration Overview
67+
68+
1. **`global`**: This section in the `values.yaml` file is used to define values that are accessible across all charts within the parent-child hierarchy. Both the parent chart (e.g., `charts/graphistry-helm`) and its child charts (e.g., `charts/graphistry-helm/charts/telemetry`) can reference these global values using `.Values.global.<value_name>`, providing a unified configuration across the deployment.
69+
2. **`telemetryEnv`**: This section defines environment variables that control the OpenTelemetry configuration in Kubernetes. These variables replicate the settings that were originally defined in the Docker Compose setup.
70+
3. **`global.ENABLE_OPEN_TELEMETRY`**: Set to `true` to enable the OpenTelemetry stack within the Kubernetes environment. This will ensure that telemetry data is collected and processed by the relevant tools in your stack.
71+
4. **`telemetryEnv.OTEL_CLOUD_MODE`**:
72+
- When set to `false`, the internal observability stack (`Jaeger`, `Prometheus`, `Grafana`, `NVIDIA DCGM Exporter` and `Node Exporter`) is deployed locally within your Kubernetes cluster. So, setting it to `false` is similar to [using packaged observability tools](./docker-compose.md#using-packaged-observability-tools) within the Kubernetes environment.
73+
- When set to `true`, telemetry data is forwarded to external services, such as Grafana Cloud or other OTLP-compatible services. So, setting this to `true` is equivalent to [forwarding telemetry to external services](./docker-compose.md#forwarding-to-external-services).
74+
5. **`telemetryEnv.openTelemetryCollector.OTEL_COLLECTOR_OTLP_HTTP_ENDPOINT`**, **`telemetryEnv.openTelemetryCollector.OTEL_COLLECTOR_OTLP_USERNAME`**, and **`telemetryEnv.openTelemetryCollector.OTEL_COLLECTOR_OTLP_PASSWORD`**: These fields are required only if `OTEL_CLOUD_MODE` is set to `true`. They provide the necessary connection details (such as the endpoint, username, and password) for forwarding telemetry data to external services like Grafana Cloud or other OTLP-compatible services.
75+
6. **`telemetryEnv.grafana.GF_SERVER_ROOT_URL`** and **`telemetryEnv.grafana.GF_SERVER_SERVE_FROM_SUB_PATH`**: These settings are used to configure Grafana, especially when it's deployed behind a reverse proxy or using an ingress controller.
76+
- **`telemetryEnv.grafana.GF_SERVER_ROOT_URL`** defines the root URL for accessing Grafana (e.g., `/grafana`).
77+
- **`telemetryEnv.grafana.GF_SERVER_SERVE_FROM_SUB_PATH`** should be set to `true` if Grafana is accessed from a sub-path (e.g., `/grafana`) behind a reverse proxy or ingress.
78+
7. **`telemetryEnv.dcgmExporter.DCGM_EXPORTER_CLOCK_EVENTS_COUNT_WINDOW_SIZE`**: This environment variable is used when `OTEL_CLOUD_MODE` is set to `true`, and the `dcgm-exporter` is deployed to export GPU metrics to Prometheus. It controls the frequency of GPU sampling to gather metrics. The value `1000` represents the window size for counting clock events on the GPU.
79+
8. **`telemetryEnv.*.image`**: These values allow to change the image versions of the observability tools.
80+
81+
## Caddyfile - reverse proxy set up
82+
In Kubernetes, you can customize the Caddy configuration to expose or route telemetry data to different observability endpoints, offering flexibility for your deployment. By default, the Kubernetes setup includes ingress configurations for `Prometheus`, `Jaeger`, and `Grafana` dashboards. However, if you need more control over the routing or wish to modify the reverse proxy settings, you can refer to the [Docker Compose section for guidance on configuring Caddy](docker-compose.md#caddyfile---reverse-proxy-set-up). To modify the Caddy configuration in Kubernetes, such as on [GKE (Google Kubernetes Engine)](https://github.com/graphistry/graphistry-helm/tree/main/charts/values-overrides/examples/gke), follow these steps:
83+
1. Edit the [Caddy ConfigMap](https://github.com/graphistry/graphistry-helm/blob/main/charts/graphistry-helm/templates/caddy/caddy-cfg.yml) and update the configuration as needed.
84+
2. Delete the existing Caddy ConfigMap (`kubectl delete configmap caddy-config -n graphistry`).
85+
3. [Update the Graphistry Helm chart](https://github.com/graphistry/graphistry-helm/tree/main/charts/values-overrides/examples/gke#update-graphistry-deployment) to apply the new configuration.
86+
4. Delete the current Caddy pod to trigger a restart with the updated settings (`kubectl delete $(kubectl get pods -n graphistry -o name | grep caddy-graphistry) -n graphistry`).
87+
5. Verify that the new ConfigMap is created and applied to the new Caddy pod (`kubectl get configmap caddy-config -n graphistry -o yaml`).
88+
89+
Additionally, review the general and global [values in the Graphistry chart](https://github.com/graphistry/graphistry-helm/blob/main/charts/graphistry-helm/values.yaml), as some are related to the Caddy configuration.

docs/tools/index.rst

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,6 @@ Operations & Tools
66

77
user-creation
88
developer
9-
telemetry
109
backup-and-restore
1110
update-backup-migrate
1211
bridge

0 commit comments

Comments
 (0)