diff --git a/content/en/docs/components/trainer/user-guides/xgboost.md b/content/en/docs/components/trainer/user-guides/xgboost.md
new file mode 100644
index 0000000000..eddd751381
--- /dev/null
+++ b/content/en/docs/components/trainer/user-guides/xgboost.md
@@ -0,0 +1,55 @@
++++
+title = "XGBoost Guide"
+description = "How to run XGBoost on Kubernetes with Kubeflow Trainer"
+weight = 20
++++
+
+This guide describes how to use TrainJob to run distributed
+[XGBoost](https://xgboost.readthedocs.io/) training on Kubernetes.
+
+## Prerequisites
+
+Before exploring this guide, make sure to follow
+[the Getting Started guide](/docs/components/trainer/getting-started/)
+to understand the basics of Kubeflow Trainer.
+
+
+## XGBoost Overview
+
+XGBoost supports distributed training through the
+[Collective](https://xgboost.readthedocs.io/en/latest/tutorials/kubernetes.html)
+communication protocol (historically known as Rabit). In a distributed setting,
+multiple worker processes each operate on a shard of the data and synchronize
+histogram bin statistics via AllReduce to agree on the best tree splits.
+
+Kubeflow Trainer integrates with XGBoost by:
+
+- Deploying worker pods as a [JobSet](https://github.com/kubernetes-sigs/jobset).
+- Automatically injecting the `DMLC_*` environment variables required by XGBoost's
+  Collective communication layer (`DMLC_TRACKER_URI`, `DMLC_TRACKER_PORT`,
+  `DMLC_TASK_ID`, `DMLC_NUM_WORKER`).
+- Providing the rank-0 pod with the tracker address so user code can start a
+  `RabitTracker` for worker coordination.
+- Supporting both CPU and GPU training workloads.
+
+The built-in runtime is called `xgboost-distributed` and uses the container image
+`ghcr.io/kubeflow/trainer/xgboost-runtime:latest`, which includes XGBoost with
+CUDA 12 support, NumPy, and scikit-learn.
+
+### Worker Count
+
+The total number of XGBoost workers is calculated as:
+
+```text
+DMLC_NUM_WORKER = numNodes × workersPerNode
+```
+
+- **CPU training**: 1 worker per node. Each worker uses OpenMP to parallelize
+  across all available CPU cores.
+- **GPU training**: 1 worker per GPU. The GPU count is derived from
+  `resourcesPerNode` limits in the TrainJob.
+
+## Next Steps
+- check out the [xgboost example](https://github.com/kubeflow/trainer/blob/master/examples/xgboost/distributed-training/xgboost-distributed.ipynb)
+- learn more about `TrainerClinet()` APIs in the [KubeFlow SDK](https://github.com/kubeflow/sdk/blob/main/kubeflow/trainer/api/trainer_client.py)
+- Explore **[XGboost documentation](https://xgboost.readthedocs.io/en/latest/tutorials/kubernetes.html)** for advanced configuration options
\ No newline at end of file