0.43.3: enabling LLama 405b with 8xH/A100 + 256GB RAM
·
262 commits
to main
since this release
Improvements:
- FSDP: Enable loading prequantized weights with bf16/fp16/fp32 quant_storage
- Background: This update, linked to Transformer PR #32276, allows loading prequantized weights with alternative storage formats. Metadata is tracked similarly to
Params4bit.__new__post PR #970. It supports models exported with non-defaultquant_storage, such as this NF4 model with BF16 storage. - Special thanks to @winglian and @matthewdouglas for enabling FSDP+QLoRA finetuning of Llama 3.1 405B on a single 8xH100 or 8xA100 node with as little as 256GB system RAM.
- Background: This update, linked to Transformer PR #32276, allows loading prequantized weights with alternative storage formats. Metadata is tracked similarly to