improve wording

aucahuasi · aucahuasi · commit 55f50becdcbf · 2025-03-21T11:39:36.000-05:00
diff --git a/docs/README.md b/docs/README.md
@@ -0,0 +1,42 @@
+# Welcome to Graphistry: Admin Guide
+
+Graphistry is the most scalable graph-based visual analysis and investigation automation platform. It supports both cloud and on-prem deployment options. Big graphs are tons of fun!
+
+
+## Quick administration links
+
+* [Top commands](https://graphistry-admin-docs.readthedocs.io/en/latest/commands.html)
+* [Plan deployments](https://graphistry-admin-docs.readthedocs.io/en/latest/planning/hardware-software.html)
+* Install: [Cloud](https://graphistry-admin-docs.readthedocs.io/en/latest/install/cloud/index.html) & [On-prem](https://graphistry-admin-docs.readthedocs.io/en/latest/install/on-prem/index.html)
+* [Configure](https://graphistry-admin-docs.readthedocs.io/en/latest/app-config/index.html)
+* [Debugging & performance](https://graphistry-admin-docs.readthedocs.io/en/latest/debugging/index.html)
+* [Security](https://graphistry-admin-docs.readthedocs.io/en/latest/security/index.html)
+* [Operations & tools](https://graphistry-admin-docs.readthedocs.io/en/latest/tools/index.html)
+* [FAQ](https://graphistry-admin-docs.readthedocs.io/en/latest/faq/index.html) & [support options](https://graphistry-admin-docs.readthedocs.io/en/latest/support.html)
+
+## Further reading
+
+* [Main Graphistry documentation](https://hub.graphistry.com/docs) and same path on your local server
+* [Release portal](https://graphistry.zendesk.com/hc/en-us/articles/360033184174) for enterprise admins to download the latest
+* [Release notes](https://graphistry.zendesk.com/hc/en-us/articles/360033184174)
+* [Graphistry Hub](https://hub.graphistry.com): Graphistry-managed GPU servers, including free and team tiers
+* Docker (self-hosted): See [enterprise release portal](https://graphistry.zendesk.com/hc/en-us/articles/360033184174)
+* [Kubernetes Helm charts](https://github.com/graphistry/graphistry-helm) - Experimental
+
+
+## Quick GPU Docker environment test
+
+You can test your GPU environment via Graphistry's [base RAPIDS Docker image on DockerHub](https://hub.docker.com/r/graphistry/graphistry-forge-base):
+
+```bash
+docker run --rm -it --entrypoint=/bin/bash graphistry/graphistry-forge-base:latest -c "source activate base && python3 -c \"import cudf; print(cudf.DataFrame({'x': [0,1,2]})['x'].sum())\""
+```
+
+=>
+```
+3
+```
+
+See the installation and debugging sections for additional scenarios such as ensuring Docker Compose is correctly defaulting to a GPU runtime.
+
+
diff --git a/docs/install/cluster/index.rst b/docs/install/cluster/index.rst
@@ -13,8 +13,7 @@ Multinode Deployment Overview
 
 **Note**: *This deployment configuration is currently **experimental** and subject to future updates.*
 
-
-In this installation, both the **Leader** and **Follower** nodes can ingest datasets and files, with all nodes accessing the same **PostgreSQL** instance on the **Leader** node. As a result, **Follower** nodes can also perform data uploads, ensuring that both **Leader** and **Follower** nodes have equal access to dataset ingestion and visualization.
+In this installation, both the **Leader** and **Follower** nodes can ingest datasets and files, with all nodes accessing the same **PostgreSQL** instance on the **Leader** node. As a result, **Follower** nodes can also perform data uploads, allowing both **Leader** and **Follower** nodes to ingest datasets and visualize data.
 
 The leader and followers will share datasets using a **Distributed File System**, for example, using the **Network File System (NFS)** protocol. This setup allows all nodes to access the same dataset directory. This configuration ensures that **Graphistry** can be deployed across multiple machines, each with different **GPU** configuration profiles (some with more powerful GPUs, enabling **multi-GPU** on multinode setups), while keeping the dataset storage centralized and synchronized.
 
diff --git a/docs/telemetry/kubernetes.md b/docs/telemetry/kubernetes.md
@@ -71,17 +71,28 @@ global:  ## global settings for all charts
 ## Configuration Overview
 
 1. **`global`**: This section in the `values.yaml` file is used to define values that are accessible across all charts within the parent-child hierarchy.  Both the parent chart (e.g., `charts/graphistry-helm`) and its child charts (e.g., `charts/graphistry-helm/charts/telemetry`) can reference these global values using `.Values.global.<value_name>`, providing a unified configuration across the deployment.
+
 2. **`telemetryStack`**: This section defines environment variables that control the OpenTelemetry configuration in Kubernetes. These variables replicate the settings that were originally defined in the Docker Compose setup.
+
 3. **`global.ENABLE_OPEN_TELEMETRY`**: Set to `true` to enable the OpenTelemetry stack within the Kubernetes environment. This will ensure that telemetry data is collected and processed by the relevant tools in your stack.
+
 4. **`telemetryStack.OTEL_CLOUD_MODE`**:
   - When set to `false`, the internal observability stack (`Jaeger`, `Prometheus`, `Grafana`, `NVIDIA DCGM Exporter` and `Node Exporter`) is deployed locally within your Kubernetes cluster.  So, setting it to `false` is similar to [using packaged observability tools](./docker-compose.md#using-packaged-observability-tools) within the Kubernetes environment.
   - When set to `true`, telemetry data is forwarded to external services, such as Grafana Cloud or other OTLP-compatible services.  So, setting this to `true` is equivalent to [forwarding telemetry to external services](./docker-compose.md#forwarding-to-external-services).
+
 5. **`telemetryStack.openTelemetryCollector.OTEL_COLLECTOR_OTLP_HTTP_ENDPOINT`**, **`telemetryStack.openTelemetryCollector.OTEL_COLLECTOR_OTLP_USERNAME`**, and **`telemetryStack.openTelemetryCollector.OTEL_COLLECTOR_OTLP_PASSWORD`**: These fields are required only if `OTEL_CLOUD_MODE` is set to `true`. They provide the necessary connection details (such as the endpoint, username, and password) for forwarding telemetry data to external services like Grafana Cloud or other OTLP-compatible services.
+
 6. **`telemetryStack.openTelemetryCollector.LEADER_OTEL_EXPORTER_OTLP_ENDPOINT`**: This field is used by all follower collectors when `global.ENABLE_CLUSTER_MODE` is set to `true`.  In this case, all follower collectors will export their telemetry data to the leader's collector, which will then export the data to Grafana, Prometheus, Jaeger, etc. For example: `"otel-collector.graphistry1.svc.cluster.local:4317"`.  See the guide on [Configuring Telemetry for a Graphistry Cluster on Kubernetes](https://github.com/graphistry/graphistry-helm/tree/main/charts/values-overrides/examples/cluster#configuring-telemetry-for-graphistry-cluster-on-kubernetes).
+
 7. **`telemetryStack.grafana.GF_SERVER_ROOT_URL`** and **`telemetryStack.grafana.GF_SERVER_SERVE_FROM_SUB_PATH`**: These settings are used to configure Grafana, especially when it's deployed behind a reverse proxy or using an ingress controller.
   - **`telemetryStack.grafana.GF_SERVER_ROOT_URL`** defines the root URL for accessing Grafana (e.g., `/grafana`).
   - **`telemetryStack.grafana.GF_SERVER_SERVE_FROM_SUB_PATH`** should be set to `true` if Grafana is accessed from a sub-path (e.g., `/grafana`) behind a reverse proxy or ingress.
-8. **`telemetryStack.dcgmExporter.DCGM_EXPORTER_CLOCK_EVENTS_COUNT_WINDOW_SIZE`**: This environment variable is used when `OTEL_CLOUD_MODE` is set to `true`, and the `dcgm-exporter` is deployed to export GPU metrics to Prometheus. It controls the frequency of GPU sampling to gather metrics. The value `1000` represents the window size for counting clock events on the GPU.
+
+8. **`telemetryStack.dcgmExporter.DCGM_EXPORTER_CLOCK_EVENTS_COUNT_WINDOW_SIZE`**: This environment variable controls the GPU metric sampling resolution for `dcgm-exporter`, which exports GPU telemetry to `Prometheus`. It defines the window size (in milliseconds) for counting clock events on the GPU.
+  - A smaller value (e.g., 500) results in higher-resolution telemetry with more frequent GPU metric updates.
+  - A larger value (e.g., 2000) reduces the data rate but lowers monitoring overhead.
+This setting applies regardless of `OTEL_CLOUD_MODE` and affects both local and cloud-based telemetry setups.
+
 9. **`telemetryStack.*.image`**: These values allow to change the image versions of the observability tools.
 
 ## Caddyfile - reverse proxy set up