Skip to content

Commit e17f701

Browse files
authored
[backport][doc] Cleanup outdated documents for GPU. [skip ci] (dmlc#8378) (dmlc#8393)
1 parent aa30ce1 commit e17f701

File tree

1 file changed

+8
-142
lines changed

1 file changed

+8
-142
lines changed

doc/gpu/index.rst

Lines changed: 8 additions & 142 deletions
Original file line numberDiff line numberDiff line change
@@ -4,36 +4,21 @@ XGBoost GPU Support
44

55
This page contains information about GPU algorithms supported in XGBoost.
66

7-
.. note:: CUDA 10.1, Compute Capability 3.5 required
8-
9-
The GPU algorithms in XGBoost require a graphics card with compute capability 3.5 or higher, with
10-
CUDA toolkits 10.1 or later.
11-
(See `this list <https://en.wikipedia.org/wiki/CUDA#GPUs_supported>`_ to look up compute capability of your GPU card.)
7+
.. note:: CUDA 11.0, Compute Capability 5.0 required (See `this list <https://en.wikipedia.org/wiki/CUDA#GPUs_supported>`_ to look up compute capability of your GPU card.)
128

139
*********************************************
1410
CUDA Accelerated Tree Construction Algorithms
1511
*********************************************
16-
Tree construction (training) and prediction can be accelerated with CUDA-capable GPUs.
12+
13+
Most of the algorithms in XGBoost including training, prediction and evaluation can be accelerated with CUDA-capable GPUs.
1714

1815
Usage
1916
=====
20-
Specify the ``tree_method`` parameter as one of the following algorithms.
21-
22-
Algorithms
23-
----------
24-
25-
+-----------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+
26-
| tree_method | Description |
27-
+=======================+=======================================================================================================================================================================+
28-
| gpu_hist | Equivalent to the XGBoost fast histogram algorithm. Much faster and uses considerably less memory. NOTE: May run very slowly on GPUs older than Pascal architecture. |
29-
+-----------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+
17+
Specify the ``tree_method`` parameter as ``gpu_hist``. For details around the ``tree_method`` parameter, see :doc:`tree method </treemethod>`.
3018

3119
Supported parameters
3220
--------------------
3321

34-
.. |tick| unicode:: U+2714
35-
.. |cross| unicode:: U+2718
36-
3722
GPU accelerated prediction is enabled by default for the above mentioned ``tree_method`` parameters but can be switched to CPU prediction by setting ``predictor`` to ``cpu_predictor``. This could be useful if you want to conserve GPU memory. Likewise when using CPU algorithms, GPU accelerated prediction can be enabled by setting ``predictor`` to ``gpu_predictor``.
3823

3924
The device ordinal (which GPU to use if you have many of them) can be selected using the
@@ -69,128 +54,9 @@ See examples `here
6954

7055
Multi-node Multi-GPU Training
7156
=============================
72-
XGBoost supports fully distributed GPU training using `Dask <https://dask.org/>`_. For
73-
getting started see our tutorial :doc:`/tutorials/dask` and worked examples `here
74-
<https://github.com/dmlc/xgboost/tree/master/demo/dask>`__, also Python documentation
75-
:ref:`dask_api` for complete reference.
76-
77-
78-
Objective functions
79-
===================
80-
Most of the objective functions implemented in XGBoost can be run on GPU. Following table shows current support status.
81-
82-
+----------------------+-------------+
83-
| Objectives | GPU support |
84-
+----------------------+-------------+
85-
| reg:squarederror | |tick| |
86-
+----------------------+-------------+
87-
| reg:squaredlogerror | |tick| |
88-
+----------------------+-------------+
89-
| reg:logistic | |tick| |
90-
+----------------------+-------------+
91-
| reg:pseudohubererror | |tick| |
92-
+----------------------+-------------+
93-
| binary:logistic | |tick| |
94-
+----------------------+-------------+
95-
| binary:logitraw | |tick| |
96-
+----------------------+-------------+
97-
| binary:hinge | |tick| |
98-
+----------------------+-------------+
99-
| count:poisson | |tick| |
100-
+----------------------+-------------+
101-
| reg:gamma | |tick| |
102-
+----------------------+-------------+
103-
| reg:tweedie | |tick| |
104-
+----------------------+-------------+
105-
| multi:softmax | |tick| |
106-
+----------------------+-------------+
107-
| multi:softprob | |tick| |
108-
+----------------------+-------------+
109-
| survival:cox | |cross| |
110-
+----------------------+-------------+
111-
| survival:aft | |tick| |
112-
+----------------------+-------------+
113-
| rank:pairwise | |tick| |
114-
+----------------------+-------------+
115-
| rank:ndcg | |tick| |
116-
+----------------------+-------------+
117-
| rank:map | |tick| |
118-
+----------------------+-------------+
119-
120-
Objective will run on GPU if GPU updater (``gpu_hist``), otherwise they will run on CPU by
121-
default. For unsupported objectives XGBoost will fall back to using CPU implementation by
122-
default. Note that when using GPU ranking objective, the result is not deterministic due
123-
to the non-associative aspect of floating point summation.
124-
125-
Metric functions
126-
===================
127-
Following table shows current support status for evaluation metrics on the GPU.
128-
129-
+------------------------------+-------------+
130-
| Metric | GPU Support |
131-
+==============================+=============+
132-
| rmse | |tick| |
133-
+------------------------------+-------------+
134-
| rmsle | |tick| |
135-
+------------------------------+-------------+
136-
| mae | |tick| |
137-
+------------------------------+-------------+
138-
| mape | |tick| |
139-
+------------------------------+-------------+
140-
| mphe | |tick| |
141-
+------------------------------+-------------+
142-
| logloss | |tick| |
143-
+------------------------------+-------------+
144-
| error | |tick| |
145-
+------------------------------+-------------+
146-
| merror | |tick| |
147-
+------------------------------+-------------+
148-
| mlogloss | |tick| |
149-
+------------------------------+-------------+
150-
| auc | |tick| |
151-
+------------------------------+-------------+
152-
| aucpr | |tick| |
153-
+------------------------------+-------------+
154-
| ndcg | |tick| |
155-
+------------------------------+-------------+
156-
| map | |tick| |
157-
+------------------------------+-------------+
158-
| poisson-nloglik | |tick| |
159-
+------------------------------+-------------+
160-
| gamma-nloglik | |tick| |
161-
+------------------------------+-------------+
162-
| cox-nloglik | |cross| |
163-
+------------------------------+-------------+
164-
| aft-nloglik | |tick| |
165-
+------------------------------+-------------+
166-
| interval-regression-accuracy | |tick| |
167-
+------------------------------+-------------+
168-
| gamma-deviance | |tick| |
169-
+------------------------------+-------------+
170-
| tweedie-nloglik | |tick| |
171-
+------------------------------+-------------+
172-
173-
Similar to objective functions, default device for metrics is selected based on tree
174-
updater and predictor (which is selected based on tree updater).
175-
176-
Benchmarks
177-
==========
178-
You can run benchmarks on synthetic data for binary classification:
179-
180-
.. code-block:: bash
181-
182-
python tests/benchmark/benchmark_tree.py --tree_method=gpu_hist
183-
python tests/benchmark/benchmark_tree.py --tree_method=hist
184-
185-
Training time on 1,000,000 rows x 50 columns of random data with 500 boosting iterations and 0.25/0.75 test/train split with AMD Ryzen 7 2700 8 core @3.20GHz and NVIDIA 1080ti yields the following results:
186-
187-
+--------------+----------+
188-
| tree_method | Time (s) |
189-
+==============+==========+
190-
| gpu_hist | 12.57 |
191-
+--------------+----------+
192-
| hist | 36.01 |
193-
+--------------+----------+
57+
58+
XGBoost supports fully distributed GPU training using `Dask <https://dask.org/>`_, ``Spark`` and ``PySpark``. For getting started with Dask see our tutorial :doc:`/tutorials/dask` and worked examples `here <https://github.com/dmlc/xgboost/tree/master/demo/dask>`__, also Python documentation :ref:`dask_api` for complete reference. For usage with ``Spark`` using Scala see :doc:`/jvm/xgboost4j_spark_gpu_tutorial`. Lastly for distributed GPU training with ``PySpark``, see :doc:`/tutorials/spark_estimator`.
59+
19460

19561
Memory usage
19662
============
@@ -202,7 +68,7 @@ The dataset itself is stored on device in a compressed ELLPACK format. The ELLPA
20268

20369
Working memory is allocated inside the algorithm proportional to the number of rows to keep track of gradients, tree positions and other per row statistics. Memory is allocated for histogram bins proportional to the number of bins, number of features and nodes in the tree. For performance reasons we keep histograms in memory from previous nodes in the tree, when a certain threshold of memory usage is passed we stop doing this to conserve memory at some performance loss.
20470

205-
If you are getting out-of-memory errors on a big dataset, try the or :py:class:`xgboost.DeviceQuantileDMatrix` or :doc:`external memory version </tutorials/external_memory>`.
71+
If you are getting out-of-memory errors on a big dataset, try the or :py:class:`xgboost.QuantileDMatrix` or :doc:`external memory version </tutorials/external_memory>`. Note that when ``external memory`` is used for GPU hist, it's best to employ gradient based sampling as well. Last but not least, ``inplace_predict`` can be preferred over ``predict`` when data is already on GPU. Both ``QuantileDMatrix`` and ``inplace_predict`` are automatically enabled if you are using the scikit-learn interface.
20672

20773
Developer notes
20874
===============

0 commit comments

Comments
 (0)