You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
QLoRA need not work with NF4 specifically, though NF4 has been
324
+
shown to achieve competitive results compared to bf16 baselines
325
+
while significantly reducing the memory required for training.
326
+
This technique can also compose with other lower bit dtypes
327
+
such as regular INT4 or even newer `MXFP4 or NVFP4 <https://github.com/pytorch/ao/tree/main/torchao/prototype/mx_formats>`__
328
+
targeting Blackwell GPUs to reap similar memory benefits with
329
+
varying tradeoffs.
330
+
331
+
Option 1: TorchTune Integration
332
+
===============================
333
+
334
+
TorchTune incorporates the `NF4Tensor` in its QLoRA fine-tuning
335
+
recipe through their implementation of `LoRALinear <https://github.com/pytorch/torchtune/blob/a6290a5b40758f13bca61c386bc8756a49ef417e/torchtune/modules/peft/lora.py#L19>`__.
336
+
You can also try it out by running the following command,
337
+
or refer to their `QLoRA tutorial <https://docs.pytorch.org/torchtune/stable/tutorials/qlora_finetune.html>`__
338
+
for more details.
339
+
340
+
.. code::
341
+
342
+
tune run lora_finetune_single_device --config llama3_2/3B_qlora_single_device.yaml
also has a limited version of QLoRA leveraging TorchAO's INT8
349
+
quantization, though INT4 or NF4 are not supported yet. Users
350
+
can invoke this functionality by preparing their models as follows.
351
+
For full details, please refer to `this tutorial <https://huggingface.co/docs/peft/main/en/developer_guides/quantization#torchao-pytorch-architecture-optimization>`__.
352
+
353
+
.. code::
354
+
355
+
from peft import LoraConfig, get_peft_model
356
+
from transformers import AutoModelForCausalLM, TorchAoConfig
357
+
from torchao.quantization import Int8WeightOnlyConfig
0 commit comments