Add documentation for deploying in Cluster mode using K8s#54
Add documentation for deploying in Cluster mode using K8s#54
Conversation
aucahuasi
commented
Mar 19, 2025
- Improve the cluster documentation adding a new page for k8s
- Improve the organization of the cluster documentation adding a folder to contain common topics.
- Improve telemetry documentation (adding overview)
DataBoyTX
left a comment
There was a problem hiding this comment.
Just a couple of comments to consider, but looks great overall Percy!
docs/install/cluster/index.rst
Outdated
| **Note**: *This deployment configuration is currently **experimental** and subject to future updates.* | ||
|
|
||
|
|
||
| In this installation, both the **Leader** and **Follower** nodes can ingest datasets and files, with all nodes accessing the same **PostgreSQL** instance on the **Leader** node. As a result, **Follower** nodes can also perform data uploads, ensuring that both **Leader** and **Follower** nodes have equal access to dataset ingestion and visualization. |
There was a problem hiding this comment.
"ensuring that both Leader and Follower nodes have equal access to dataset ingestion and visualization" --> "allowing both Leader and Follower nodes to ingest datasets and visualize data"
There was a problem hiding this comment.
Thanks Thomas, done!
docs/telemetry/kubernetes.md
Outdated
| 7. **`telemetryStack.grafana.GF_SERVER_ROOT_URL`** and **`telemetryStack.grafana.GF_SERVER_SERVE_FROM_SUB_PATH`**: These settings are used to configure Grafana, especially when it's deployed behind a reverse proxy or using an ingress controller. | ||
| - **`telemetryStack.grafana.GF_SERVER_ROOT_URL`** defines the root URL for accessing Grafana (e.g., `/grafana`). | ||
| - **`telemetryStack.grafana.GF_SERVER_SERVE_FROM_SUB_PATH`** should be set to `true` if Grafana is accessed from a sub-path (e.g., `/grafana`) behind a reverse proxy or ingress. | ||
| 8. **`telemetryStack.dcgmExporter.DCGM_EXPORTER_CLOCK_EVENTS_COUNT_WINDOW_SIZE`**: This environment variable is used when `OTEL_CLOUD_MODE` is set to `true`, and the `dcgm-exporter` is deployed to export GPU metrics to Prometheus. It controls the frequency of GPU sampling to gather metrics. The value `1000` represents the window size for counting clock events on the GPU. |
There was a problem hiding this comment.
telemetryStack.dcgmExporter.DCGM_EXPORTER_CLOCK_EVENTS_COUNT_WINDOW_SIZE:
This environment variable controls the GPU metric sampling resolution for dcgm-exporter, which exports GPU telemetry to Prometheus. It defines the window size (in milliseconds) for counting clock events on the GPU.
- A smaller value (e.g., 500) results in higher-resolution telemetry with more frequent GPU metric updates.
- A larger value (e.g., 2000) reduces the data rate but lowers monitoring overhead.
This setting applies regardless of OTEL_CLOUD_MODE and affects both local and cloud-based telemetry setups.
There was a problem hiding this comment.
Thanks Thomas, done!