Skip to content

Add documentation for deploying in Cluster mode using K8s#54

Merged
aucahuasi merged 3 commits intomasterfrom
dev/cluster-mode-k8s
Mar 21, 2025
Merged

Add documentation for deploying in Cluster mode using K8s#54
aucahuasi merged 3 commits intomasterfrom
dev/cluster-mode-k8s

Conversation

@aucahuasi
Copy link
Contributor

  • Improve the cluster documentation adding a new page for k8s
  • Improve the organization of the cluster documentation adding a folder to contain common topics.
  • Improve telemetry documentation (adding overview)

@aucahuasi aucahuasi requested a review from DataBoyTX March 19, 2025 02:52
@aucahuasi aucahuasi self-assigned this Mar 19, 2025
Copy link
Contributor

@DataBoyTX DataBoyTX left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a couple of comments to consider, but looks great overall Percy!

**Note**: *This deployment configuration is currently **experimental** and subject to future updates.*


In this installation, both the **Leader** and **Follower** nodes can ingest datasets and files, with all nodes accessing the same **PostgreSQL** instance on the **Leader** node. As a result, **Follower** nodes can also perform data uploads, ensuring that both **Leader** and **Follower** nodes have equal access to dataset ingestion and visualization.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"ensuring that both Leader and Follower nodes have equal access to dataset ingestion and visualization" --> "allowing both Leader and Follower nodes to ingest datasets and visualize data"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Thomas, done!

7. **`telemetryStack.grafana.GF_SERVER_ROOT_URL`** and **`telemetryStack.grafana.GF_SERVER_SERVE_FROM_SUB_PATH`**: These settings are used to configure Grafana, especially when it's deployed behind a reverse proxy or using an ingress controller.
- **`telemetryStack.grafana.GF_SERVER_ROOT_URL`** defines the root URL for accessing Grafana (e.g., `/grafana`).
- **`telemetryStack.grafana.GF_SERVER_SERVE_FROM_SUB_PATH`** should be set to `true` if Grafana is accessed from a sub-path (e.g., `/grafana`) behind a reverse proxy or ingress.
8. **`telemetryStack.dcgmExporter.DCGM_EXPORTER_CLOCK_EVENTS_COUNT_WINDOW_SIZE`**: This environment variable is used when `OTEL_CLOUD_MODE` is set to `true`, and the `dcgm-exporter` is deployed to export GPU metrics to Prometheus. It controls the frequency of GPU sampling to gather metrics. The value `1000` represents the window size for counting clock events on the GPU.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

telemetryStack.dcgmExporter.DCGM_EXPORTER_CLOCK_EVENTS_COUNT_WINDOW_SIZE:
This environment variable controls the GPU metric sampling resolution for dcgm-exporter, which exports GPU telemetry to Prometheus. It defines the window size (in milliseconds) for counting clock events on the GPU.

  • A smaller value (e.g., 500) results in higher-resolution telemetry with more frequent GPU metric updates.
  • A larger value (e.g., 2000) reduces the data rate but lowers monitoring overhead.
    This setting applies regardless of OTEL_CLOUD_MODE and affects both local and cloud-based telemetry setups.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Thomas, done!

@aucahuasi aucahuasi merged commit 27c2068 into master Mar 21, 2025
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants