Skip to content

Commit 5beed13

Browse files
committed
updated readme
Signed-off-by: Suguna Velury <[email protected]>
1 parent 6254cad commit 5beed13

File tree

3 files changed

+9
-4
lines changed

3 files changed

+9
-4
lines changed

examples/llm_qat/README.md

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -354,7 +354,11 @@ To perform QLoRA training, run:
354354
--lora True
355355
```
356356

357-
> **_NOTE:_** QLoRA is currently an experimental feature designed to reduce the memory footprint during training. Deployment functionality is not yet available.
357+
After performing QLoRA training the final checkpoint is exported to be ready for deployment. For more details about QLoRA deployment using vLLM dere to the documentation [here](https://docs.vllm.ai/en/latest/features/lora.html). To deploy with vLLM, run:
358+
359+
```sh
360+
vllm serve llama3-fp4-qlora/base_model --enable-lora --lora-modules adapter=llama3-fp4-qlora --port 8000 --tokenizer llama3-fp4-qlora
361+
```
358362

359363
## Pre-Quantized Checkpoints
360364

examples/llm_qat/main.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -274,7 +274,7 @@ def train():
274274
trainer.save_model(training_args.output_dir, **kwargs)
275275

276276
if training_args.lora and getattr(quant_args, "compress", False):
277-
trainer.export_base_model_hf_checkpoint()
277+
trainer.export_base_model()
278278

279279

280280
if __name__ == "__main__":

modelopt/torch/quantization/plugins/transformers_trainer.py

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -279,17 +279,18 @@ def save_model(self, *args, **kwargs):
279279
return outputs
280280

281281
def _load_best_model(self, *args, **kwargs):
282-
"""Load the best model."""
282+
"""Load the best model for final evaluation."""
283283
is_lora = getattr(self.args, "lora", None)
284284
if not is_lora:
285285
super()._load_best_model(*args, **kwargs)
286286
else:
287287
# Custom logic for loading best model with LoRA
288+
# TODO: Remove once we migrate to using get_peft_model()
288289
adapter_name = self.model.active_adapter()
289290
self.model.delete_adapter(adapter_name)
290291
self.model.load_adapter(self.state.best_model_checkpoint, adapter_name)
291292

292-
def export_base_model_hf_checkpoint(self):
293+
def export_base_model(self):
293294
"""Export the basemodel to HF checkpoint for deployment."""
294295
# Save config.json
295296
if self.accelerator.is_main_process:

0 commit comments

Comments
 (0)