foundation-model-stack
diff --git a/‎.spellcheck-en-custom.txt‎
Lines changed: 2 additions & 1 deletion b/‎.spellcheck-en-custom.txt‎
Lines changed: 2 additions & 1 deletion
diff --git a/‎examples/FP8_QUANT/README.md‎
Lines changed: 2 additions & 2 deletions b/‎examples/FP8_QUANT/README.md‎
Lines changed: 2 additions & 2 deletions
diff --git a/‎examples/GPTQ/README.md‎
Lines changed: 1 addition & 1 deletion b/‎examples/GPTQ/README.md‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎fms_mo/dq.py‎
Lines changed: 9 additions & 6 deletions b/‎fms_mo/dq.py‎
Lines changed: 9 additions & 6 deletions
@@ -26,10 +26,11 @@ eval
 fms
 fp
 FP
+FP8Arguments
 frac
 gptq
 GPTQ
-GPTQArgs
+GPTQArguments
 graphviz
 GPTQ
 hyperparameters
 
@@ -27,7 +27,7 @@ This is an example of mature FP8, which under the hood leverages some functional
 ## QuickStart
 This end-to-end example utilizes the common set of interfaces provided by `fms_mo` for easily applying multiple quantization algorithms with FP8 being the focus of this example. The steps involved are:
 
-1. **FP8 quantization through CLI**. Other arguments could be found here [FP8Args](../../fms_mo/training_args.py#L84).
+1. **FP8 quantization through CLI**. Other arguments could be found here [FP8Arguments](../../fms_mo/training_args.py#L84).
 
     ```bash
     python -m fms_mo.run_quant \
@@ -100,7 +100,7 @@ This end-to-end example utilizes the common set of interfaces provided by `fms_m
     tokenizer = AutoTokenizer.from_pretrained(model_args.model_name_or_path)
     ```
 
-2. Quantization setting is provided using `QuantizationModifier`, additional settings can be found in [FP8Args](../../fms_mo/training_args.py#L84).
+2. Quantization setting is provided using `QuantizationModifier`, additional settings can be found in [FP8Arguments](../../fms_mo/training_args.py#L84).
 
     ```python
     recipe = QuantizationModifier(
 
@@ -32,7 +32,7 @@ This end-to-end example utilizes the common set of interfaces provided by `fms_m
 > - Tokenized data will be saved in `<path_to_save>_train` and `<path_to_save>_test`
 > - If you have trouble downloading Llama family of models from Hugging Face ([LLama models require access](https://www.llama.com/docs/getting-the-models/hugging-face/)), you can use `ibm-granite/granite-8b-code` instead
 
-2. **Quantize the model** using the data generated above, the following command will kick off the quantization job (by invoking `auto_gptq` under the hood.) Additional acceptable arguments can be found here in [GPTQArgs](../../fms_mo/training_args.py#L127).
+2. **Quantize the model** using the data generated above, the following command will kick off the quantization job (by invoking `auto_gptq` under the hood.) Additional acceptable arguments can be found here in [GPTQArguments](../../fms_mo/training_args.py#L127).
 
     ```bash
     python -m fms_mo.run_quant \
 
@@ -51,7 +51,7 @@
 logger = logging.getLogger(__name__)
 
 
-def run_dq(model_args, data_args, fms_mo_args, output_dir):
+def run_dq(model_args, data_args, opt_args, fms_mo_args):
     """
     For direct quantization LLMs without optimization:
     Models are directly quantized into INT8 or FP8 precisions using
@@ -63,12 +63,15 @@ def run_dq(model_args, data_args, fms_mo_args, output_dir):
             the model
         data_args (fms_mo.training_args.DataArguments): Data arguments to be used when loading the
             tokenized dataset
+        opt_args (fms_mo.training_args.OptArguments): Generic optimization arguments to be used
+            during DQ
         fms_mo_args (fms_mo.training_args.FMSMOArguments): Parameters to use for DQ quantization
-        output_dir (str) Output directory to write to
+
     NOTE:
         use dynamo tracing instead of torchscript by default. if torchscript is needed, change
         1) config_kwarks and 2) use_dynamo in qmodel_prep()
-    """
+
+"""
     # for attention or kv-cache quantization, need to use eager attention
     attn_bits = [
         fms_mo_args.nbits_bmm1,
@@ -225,9 +228,9 @@ def run_dq(model_args, data_args, fms_mo_args, output_dir):
                 with patch_torch_bmm(qcfg):
                     model(**data_mb)
 
-    logger.info(f"Saving quantized model and tokenizer to {output_dir}")
-    model.save_pretrained(output_dir, use_safetensors=True)
-    tokenizer.save_pretrained(output_dir)
+    logger.info(f"Saving quantized model and tokenizer to {opt_args.output_dir}")
+    model.save_pretrained(opt_args.output_dir, use_safetensors=True)
+    tokenizer.save_pretrained(opt_args.output_dir)
 
     if fms_mo_args.eval_ppl:
         path_test = Path(data_args.test_data_path)