From 37d5fdc02230ad566c2e1d6bcb58fbf73f1bc91b Mon Sep 17 00:00:00 2001
From: krishna-kg732 <krishnagupta.kg2k6@gmail.com>
Date: Fri, 13 Mar 2026 00:12:49 +0530
Subject: [PATCH 1/4] docs: add XGBoost distributed training user guide

Add a user guide for distributed XGBoost training on Kubernetes via
Kubeflow Trainer at content/en/docs/components/trainer/user-guides/xgboost.md.

The guide provides:
- An overview of the XGBoost Collective protocol and how Kubeflow Trainer
  integrates with it (DMLC_* env vars, JobSet, built-in runtime)
- Worker count formula for CPU and GPU training
- A redirect to the comprehensive XGBoost tutorial at
  https://xgboost.readthedocs.io/en/latest/tutorials/kubernetes.html

Signed-off-by: krishna-kg732 <krishnagupta.kg2k6@gmail.com>
---
 .../components/trainer/user-guides/xgboost.md | 63 +++++++++++++++++++
 1 file changed, 63 insertions(+)
 create mode 100644 content/en/docs/components/trainer/user-guides/xgboost.md

diff --git a/content/en/docs/components/trainer/user-guides/xgboost.md b/content/en/docs/components/trainer/user-guides/xgboost.md
new file mode 100644
index 0000000000..27eacb4fac
--- /dev/null
+++ b/content/en/docs/components/trainer/user-guides/xgboost.md
@@ -0,0 +1,63 @@
++++
+title = "XGBoost Guide"
+description = "How to run distributed XGBoost on Kubernetes with Kubeflow Trainer"
+weight = 20
++++
+
+This guide describes how to use TrainJob to run distributed
+[XGBoost](https://xgboost.readthedocs.io/) training on Kubernetes.
+
+---
+
+## Prerequisites
+
+Before exploring this guide, make sure to follow
+[the Getting Started guide](/docs/components/trainer/getting-started/)
+to understand the basics of Kubeflow Trainer.
+
+---
+
+## XGBoost Distributed Overview
+
+XGBoost supports distributed training through the
+[Collective](https://xgboost.readthedocs.io/en/latest/tutorials/kubernetes.html)
+communication protocol (historically known as Rabit). In a distributed setting,
+multiple worker processes each operate on a shard of the data and synchronize
+histogram bin statistics via AllReduce to agree on the best tree splits.
+
+Kubeflow Trainer integrates with XGBoost by:
+
+- Deploying worker pods as a [JobSet](https://github.com/kubernetes-sigs/jobset).
+- Automatically injecting the `DMLC_*` environment variables required by XGBoost's
+  Collective communication layer (`DMLC_TRACKER_URI`, `DMLC_TRACKER_PORT`,
+  `DMLC_TASK_ID`, `DMLC_NUM_WORKER`).
+- Providing the rank-0 pod with the tracker address so user code can start a
+  `RabitTracker` for worker coordination.
+- Supporting both CPU and GPU training workloads.
+
+The built-in runtime is called `xgboost-distributed` and uses the container image
+`ghcr.io/kubeflow/trainer/xgboost-runtime:latest`, which includes XGBoost with
+CUDA 12 support, NumPy, and scikit-learn.
+
+### Worker Count
+
+The total number of XGBoost workers is calculated as:
+
+```text
+DMLC_NUM_WORKER = numNodes × workersPerNode
+```
+
+- **CPU training**: 1 worker per node. Each worker uses OpenMP to parallelize
+  across all available CPU cores.
+- **GPU training**: 1 worker per GPU. The GPU count is derived from
+  `resourcesPerNode` limits in the TrainJob.
+
+---
+
+## Further Information
+
+For comprehensive documentation including complete training examples (Python SDK
+and kubectl YAML), best practices (`QuantileDMatrix`, early stopping,
+checkpointing, logging), and common issues, see the XGBoost documentation:
+
+**[Distributed XGBoost on Kubernetes — XGBoost Tutorial](https://xgboost.readthedocs.io/en/latest/tutorials/kubernetes.html)**

From 8b0c9f7980b576a38f23541ae4dadb1503a4828c Mon Sep 17 00:00:00 2001
From: Krishna Gupta <Krishnagupta.kg2k6@gmail.com>
Date: Fri, 13 Mar 2026 06:18:03 +0530
Subject: [PATCH 2/4] Apply suggestions from code review

Co-authored-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>
Signed-off-by: Krishna Gupta <Krishnagupta.kg2k6@gmail.com>
---
 content/en/docs/components/trainer/user-guides/xgboost.md | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/content/en/docs/components/trainer/user-guides/xgboost.md b/content/en/docs/components/trainer/user-guides/xgboost.md
index 27eacb4fac..7783c2012d 100644
--- a/content/en/docs/components/trainer/user-guides/xgboost.md
+++ b/content/en/docs/components/trainer/user-guides/xgboost.md
@@ -1,14 +1,12 @@
 +++
 title = "XGBoost Guide"
-description = "How to run distributed XGBoost on Kubernetes with Kubeflow Trainer"
+description = "How to run XGBoost on Kubernetes with Kubeflow Trainer"
 weight = 20
 +++
 
 This guide describes how to use TrainJob to run distributed
 [XGBoost](https://xgboost.readthedocs.io/) training on Kubernetes.
 
----
-
 ## Prerequisites
 
 Before exploring this guide, make sure to follow

From 3bf20f532229f5f3f1be12781976d595881a3b10 Mon Sep 17 00:00:00 2001
From: krishna-kg732 <krishnagupta.kg2k6@gmail.com>
Date: Fri, 13 Mar 2026 06:27:02 +0530
Subject: [PATCH 3/4] added notebook example

Signed-off-by: krishna-kg732 <krishnagupta.kg2k6@gmail.com>
---
 content/en/docs/components/trainer/user-guides/xgboost.md | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/content/en/docs/components/trainer/user-guides/xgboost.md b/content/en/docs/components/trainer/user-guides/xgboost.md
index 7783c2012d..3eca3c456d 100644
--- a/content/en/docs/components/trainer/user-guides/xgboost.md
+++ b/content/en/docs/components/trainer/user-guides/xgboost.md
@@ -59,3 +59,7 @@ and kubectl YAML), best practices (`QuantileDMatrix`, early stopping,
 checkpointing, logging), and common issues, see the XGBoost documentation:
 
 **[Distributed XGBoost on Kubernetes — XGBoost Tutorial](https://xgboost.readthedocs.io/en/latest/tutorials/kubernetes.html)**
+
+You can also use the Kubeflow Trainer distributed XGBoost notebook example:
+
+**[xgboost-distributed.ipynb](https://github.com/kubeflow/trainer/blob/master/examples/xgboost/distributed-training/xgboost-distributed.ipynb)**

From 7965b2f742787b9fb2784766b5a26d11c98900b1 Mon Sep 17 00:00:00 2001
From: krishna-kg732 <krishnagupta.kg2k6@gmail.com>
Date: Fri, 13 Mar 2026 07:23:13 +0530
Subject: [PATCH 4/4] added : next steps section

Signed-off-by: krishna-kg732 <krishnagupta.kg2k6@gmail.com>
---
 .../components/trainer/user-guides/xgboost.md | 20 +++++--------------
 1 file changed, 5 insertions(+), 15 deletions(-)

diff --git a/content/en/docs/components/trainer/user-guides/xgboost.md b/content/en/docs/components/trainer/user-guides/xgboost.md
index 3eca3c456d..eddd751381 100644
--- a/content/en/docs/components/trainer/user-guides/xgboost.md
+++ b/content/en/docs/components/trainer/user-guides/xgboost.md
@@ -13,9 +13,8 @@ Before exploring this guide, make sure to follow
 [the Getting Started guide](/docs/components/trainer/getting-started/)
 to understand the basics of Kubeflow Trainer.
 
----
 
-## XGBoost Distributed Overview
+## XGBoost Overview
 
 XGBoost supports distributed training through the
 [Collective](https://xgboost.readthedocs.io/en/latest/tutorials/kubernetes.html)
@@ -50,16 +49,7 @@ DMLC_NUM_WORKER = numNodes × workersPerNode
 - **GPU training**: 1 worker per GPU. The GPU count is derived from
   `resourcesPerNode` limits in the TrainJob.
 
----
-
-## Further Information
-
-For comprehensive documentation including complete training examples (Python SDK
-and kubectl YAML), best practices (`QuantileDMatrix`, early stopping,
-checkpointing, logging), and common issues, see the XGBoost documentation:
-
-**[Distributed XGBoost on Kubernetes — XGBoost Tutorial](https://xgboost.readthedocs.io/en/latest/tutorials/kubernetes.html)**
-
-You can also use the Kubeflow Trainer distributed XGBoost notebook example:
-
-**[xgboost-distributed.ipynb](https://github.com/kubeflow/trainer/blob/master/examples/xgboost/distributed-training/xgboost-distributed.ipynb)**
+## Next Steps
+- check out the [xgboost example](https://github.com/kubeflow/trainer/blob/master/examples/xgboost/distributed-training/xgboost-distributed.ipynb)
+- learn more about `TrainerClinet()` APIs in the [KubeFlow SDK](https://github.com/kubeflow/sdk/blob/main/kubeflow/trainer/api/trainer_client.py)
+- Explore **[XGboost documentation](https://xgboost.readthedocs.io/en/latest/tutorials/kubernetes.html)** for advanced configuration options
\ No newline at end of file