trainer: add XGBoost distributed training user guide#4342
Conversation
Add a user guide for distributed XGBoost training on Kubernetes via Kubeflow Trainer at content/en/docs/components/trainer/user-guides/xgboost.md. The guide provides: - An overview of the XGBoost Collective protocol and how Kubeflow Trainer integrates with it (DMLC_* env vars, JobSet, built-in runtime) - Worker count formula for CPU and GPU training - A redirect to the comprehensive XGBoost tutorial at https://xgboost.readthedocs.io/en/latest/tutorials/kubernetes.html Signed-off-by: krishna-kg732 <krishnagupta.kg2k6@gmail.com>
|
Hi @Krishna-kg732. Thanks for your PR. I'm waiting for a kubeflow member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
🚫 This command cannot be processed. Only organization members or owners can use the commands. |
|
Thanks! |
andreyvelich
left a comment
There was a problem hiding this comment.
Thanks @Krishna-kg732, just a few nits.
|
you might need to fix your PR title too. |
Co-authored-by: Andrey Velichkevich <andrey.velichkevich@gmail.com> Signed-off-by: Krishna Gupta <Krishnagupta.kg2k6@gmail.com>
Signed-off-by: krishna-kg732 <krishnagupta.kg2k6@gmail.com>
Signed-off-by: krishna-kg732 <krishnagupta.kg2k6@gmail.com>
andreyvelich
left a comment
There was a problem hiding this comment.
Thanks @Krishna-kg732!
/lgtm
/approve
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: andreyvelich, terrytangyuan The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/ok-to-test |
Description
Adds a user guide for distributed XGBoost training on Kubernetes via Kubeflow Trainer at
content/en/docs/components/trainer/user-guides/xgboost.md.The guide provides a concise overview of how Kubeflow Trainer integrates with XGBoost (Collective protocol,
DMLC_*environment variable injection,xgboost-distributed runtime,CPU/GPU worker count formula) and redirects readers to the comprehensive tutorial in the XGBoost documentation for examples, best practices, and troubleshooting.Related
Xgboost official docs: XGBoost tutorial (cross-referenced)
Kubeflow/trainer : User Guide for XGBoost Runtime #3313