Skip to content

Commit 91a9505

Browse files
committed
updated README.md
Signed-off-by: Suguna Velury <[email protected]>
1 parent 803bdb8 commit 91a9505

File tree

1 file changed

+32
-0
lines changed

1 file changed

+32
-0
lines changed

examples/llm_ptq/README.md

Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -235,6 +235,38 @@ with init_quantized_weights(mtq.NVFP4_DEFAULT_CFG):
235235
mtq.calibrate(model, algorithm="max", forward_loop=calibrate_loop)
236236
```
237237

238+
## Multi-Node Post-Training Quantization with FSDP2
239+
240+
ModelOpt enables quantization of LLMs across multiple GPU nodes using various quantization formats. It leverages HuggingFace's Accelerate library and FSDP2 for distributed model sharding and calibration.
241+
242+
### Usage
243+
244+
For distributed execution across multiple nodes, use the `accelerate` library. A template configuration file (`fsdp2.yaml`) is provided and can be customized based on your specific requirements.
245+
246+
On each node run the following command:
247+
248+
```bash
249+
accelerate launch --config_file fsdp2.yaml \
250+
--num_machines=<num_nodes> \
251+
--machine_rank=<current_node_rank> \
252+
--main_process_ip=<node0_ip_addr> \
253+
--main_process_port=<port> \
254+
--fsdp_transformer_layer_cls_to_wrap=<decoder_layer_name>
255+
multinode-ptq.py \
256+
--pyt_ckpt_path <path_to_model> \
257+
--qformat <fp8/nvfp4/nvfp4_awq/int4_awq/int8_sq> \
258+
--kv_cache_quant <fp8/nvfp4/nvfp4_affine/none> \
259+
--batch_size <calib_batch_size> \
260+
--calib-size <no_calib_samples> \
261+
--dataset <dataset> \
262+
--export_path <export_path> \
263+
--trust_remote_code
264+
```
265+
266+
The exported checkpoint can be deployed using TensorRT-LLM/ vLLM/ SGLang. For more details refer to the [deployment section](#deployment) of this document.
267+
268+
> *Performance Note: FSDP2 is designed for training workloads and may result in longer calibration and export times. For faster calibration, maximize the batch size based on available GPU memory.*
269+
>
238270
## Framework Scripts
239271

240272
### Hugging Face Example [Script](./scripts/huggingface_example.sh)

0 commit comments

Comments
 (0)