-
Notifications
You must be signed in to change notification settings - Fork 3.6k
Open
Labels
fabriclightning.fabric.Fabriclightning.fabric.FabricfeatureIs an improvement or enhancementIs an improvement or enhancementprecision: bnbBitsandbytes quantizationBitsandbytes quantization
Description
Description & Motivation
While the transformers library supports FSDP with bitsandbytes 4bit quantization (https://huggingface.co/docs/bitsandbytes/main/en/fsdp_qlora), Fabric appears unable to combine the two:
mode = "nf4"
plugin = BitsandbytesPrecision(mode=mode, dtype=torch.bfloat16)
policy = partial(size_based_auto_wrap_policy, min_num_params=1E8)
# use data parallel with quantization rather than FSDP:
fabric = Fabric(accelerator="auto", devices="auto",
strategy=FSDPStrategy(
auto_wrap_policy=policy, cpu_offload=False,
state_dict_type="sharded"),
plugins=plugin)
gemma-fine-tuning, pid=3426) File "/home/ubuntu/miniconda3/envs/fabric/lib/python3.10/site-packages/lightning/fabric/strategies/fsdp.py", line 242, in precision
raise TypeError(f"The FSDP strategy can only work with the `FSDPPrecision` plugin, found {precision}")
TypeError: The FSDP strategy can only work with the `FSDPPrecision` plugin, found <lightning.fabric.plugins.precision.bitsandbytes.BitsandbytesPrecision object at 0x78b7aa999ba0>
Pitch
Being able to fine-tuning large models with FSDP and quantization is becoming more important as large multi-modal LLMs require a lot of GPU memory for batches (e.g., multi-images per example and batch size >1).
Alternatives
No response
Additional context
No response
Metadata
Metadata
Assignees
Labels
fabriclightning.fabric.Fabriclightning.fabric.FabricfeatureIs an improvement or enhancementIs an improvement or enhancementprecision: bnbBitsandbytes quantizationBitsandbytes quantization