Bitsandbytes docs improvements (#18903)

carmocca · lantiga · commit e211a9346163 · 2023-11-06T10:21:58.000-05:00
(cherry picked from commit ad93f64)
diff --git a/docs/source-fabric/api/fabric_args.rst b/docs/source-fabric/api/fabric_args.rst
@@ -144,6 +144,17 @@ Automatic mixed precision settings are denoted by a ``"-mixed"`` suffix, while "
     fabric = Fabric(precision="64-true", devices=1)
 
 
+Precision settings can also be enabled via the plugins argument (see section below on plugins).
+An example is the weights quantization plugin Bitsandbytes for 4-bit and 8-bit:
+
+.. code-block:: python
+
+    from lightning.fabric.plugins import BitsandbytesPrecision
+
+    precision = BitsandbytesPrecision(mode="nf4-dq", dtype=torch.bfloat16)
+    fabric = Fabric(plugins=precision)
+
+
 plugins
 =======
 
diff --git a/docs/source-fabric/fundamentals/precision.rst b/docs/source-fabric/fundamentals/precision.rst
@@ -214,7 +214,7 @@ See also: :doc:`../advanced/model_init`
 Quantization via Bitsandbytes
 *****************************
 
-`bitsandbytes <https://github.com/TimDettmers/bitsandbytes>`__ (BNB) is a library that supports quantizing Linear weights.
+`bitsandbytes <https://github.com/TimDettmers/bitsandbytes>`__ (BNB) is a library that supports quantizing :class:`torch.nn.Linear` weights.
 
 Both 4-bit (`paper reference <https://arxiv.org/abs/2305.14314v1>`__) and 8-bit (`paper reference <https://arxiv.org/abs/2110.02861>`__) quantization is supported.
 Specifically, we support the following modes:
@@ -228,20 +228,22 @@ Specifically, we support the following modes:
 
 While these techniques store weights in 4 or 8 bit, the computation still happens in 16 or 32-bit (float16, bfloat16, float32).
 This is configurable via the dtype argument in the plugin.
+If your model weights can fit on a single device with 16 bit precision, it's recommended that this plugin is not used as it will slow down training.
 
 Quantizing the model will dramatically reduce the weight's memory requirements but may have a negative impact on the model's performance or runtime.
 
-The :class:`~lightning.fabric.plugins.precision.bitsandbytes.BitsandbytesPrecision` a
+The :class:`~lightning.fabric.plugins.precision.bitsandbytes.BitsandbytesPrecision` automatically replaces the :class:`torch.nn.Linear` layers in your model with their BNB alternatives.
+
 .. code-block:: python
 
     from lightning.fabric.plugins import BitsandbytesPrecision
 
     # this will pick out the compute dtype automatically, by default `bfloat16`
-    precision = BitsandbytesPrecision("nf4-dq")
+    precision = BitsandbytesPrecision(mode="nf4-dq")
     fabric = Fabric(plugins=precision)
 
     # Customize the dtype, or ignore some modules
-    precision = BitsandbytesPrecision("int8-training", dtype=torch.float16, ignore_modules={"lm_head"})
+    precision = BitsandbytesPrecision(mode="int8-training", dtype=torch.float16, ignore_modules={"lm_head"})
     fabric = Fabric(plugins=precision)
 
     model = MyModel()
diff --git a/docs/source-fabric/glossary/index.rst b/docs/source-fabric/glossary/index.rst
@@ -28,6 +28,11 @@ Glossary
     :button_link: ../advanced/distributed_communication.html
     :col_css: col-md-4
 
+.. displayitem::
+    :header: Bfloat16
+    :button_link: ../fundamentals/precision.html
+    :col_css: col-md-4
+
 .. displayitem::
     :header: Broadcast
     :button_link: ../advanced/distributed_communication.html
@@ -89,7 +94,7 @@ Glossary
     :col_css: col-md-4
 
 .. displayitem::
-    :header: Jypyter
+    :header: Jupyter
     :button_link: ../launch/notebooks.html
     :col_css: col-md-4
 
@@ -148,6 +153,11 @@ Glossary
     :button_link: ../fundamentals/precision.html
     :col_css: col-md-4
 
+.. displayitem::
+    :header: Quantization
+    :button_link: ../fundamentals/precision.html
+    :col_css: col-md-4
+
 .. displayitem::
     :header: Reduce
     :button_link: ../advanced/distributed_communication.html
@@ -183,6 +193,11 @@ Glossary
     :button_link: ../guide/trainer_template.html
     :col_css: col-md-4
 
+.. displayitem::
+    :header: 16-bit, 8-bit, 4-bit
+    :button_link: ../fundamentals/precision.html
+    :col_css: col-md-4
+
 
 .. raw:: html
 
diff --git a/docs/source-pytorch/common/precision_intermediate.rst b/docs/source-pytorch/common/precision_intermediate.rst
@@ -165,7 +165,7 @@ Under the hood, we use `transformer_engine.pytorch.fp8_autocast <https://docs.nv
 Quantization via Bitsandbytes
 *****************************
 
-`bitsandbytes <https://github.com/TimDettmers/bitsandbytes>`__ (BNB) is a library that supports quantizing Linear weights.
+`bitsandbytes <https://github.com/TimDettmers/bitsandbytes>`__ (BNB) is a library that supports quantizing :class:`torch.nn.Linear` weights.
 
 Both 4-bit (`paper reference <https://arxiv.org/abs/2305.14314v1>`__) and 8-bit (`paper reference <https://arxiv.org/abs/2110.02861>`__) quantization is supported.
 Specifically, we support the following modes:
@@ -179,6 +179,7 @@ Specifically, we support the following modes:
 
 While these techniques store weights in 4 or 8 bit, the computation still happens in 16 or 32-bit (float16, bfloat16, float32).
 This is configurable via the dtype argument in the plugin.
+If your model weights can fit on a single device with 16 bit precision, it's recommended that this plugin is not used as it will slow down training.
 
 Quantizing the model will dramatically reduce the weight's memory requirements but  may have a negative impact on the model's performance or runtime.
 
@@ -189,11 +190,11 @@ The :class:`~lightning.pytorch.plugins.precision.bitsandbytes.BitsandbytesPrecis
     from lightning.pytorch.plugins import BitsandbytesPrecisionPlugin
 
     # this will pick out the compute dtype automatically, by default `bfloat16`
-    precision = BitsandbytesPrecisionPlugin("nf4-dq")
+    precision = BitsandbytesPrecisionPlugin(mode="nf4-dq")
     trainer = Trainer(plugins=precision)
 
     # Customize the dtype, or skip some modules
-    precision = BitsandbytesPrecisionPlugin("int8-training", dtype=torch.float16, ignore_modules={"lm_head"})
+    precision = BitsandbytesPrecisionPlugin(mode="int8-training", dtype=torch.float16, ignore_modules={"lm_head"})
     trainer = Trainer(plugins=precision)