Update evaluation.md (#1442)

mikekgfb · web-flow · commit bd7354e70e1b · 2025-01-06T10:55:29.000+09:00
* Update evaluation.md 

1 - Remove outdated reference to running eval.py directly
2 - explain how we run ET/AOTI models with eval.
3 - Add an example with quantization to show how we can use eval to determine how to process models.

* Update evaluation.md

* Update evaluation.md

Highlight ability to use different options and encourage users to experiment with them.

* Update evaluation.md

Wording corrections

* Update build_native.sh

Update to C++11 ABI for AOTI, similar to ET
diff --git a/torchchat/utils/docs/evaluation.md b/torchchat/utils/docs/evaluation.md
@@ -23,7 +23,7 @@ The evaluation mode of `torchchat.py` script can be used to evaluate your langua
 
 ## Examples
 
-### Evaluation example with model in Python
+### Evaluation example with model in Python environment
 
 Running wikitext for 10 iterations
 ```
@@ -35,33 +35,45 @@ Running wikitext with torch.compile for 10 iterations
 python3 torchchat.py eval stories15M --compile --tasks wikitext --limit 10
 ```
 
-Running multiple tasks and calling eval.py directly (with torch.compile):
+Running multiple tasks with torch.compile for evaluation and prefill:
 ```
-python3 torchchat.py eval stories15M --compile --tasks wikitext hellaswag
+python3 torchchat.py eval stories15M --compile --compile-prefill --tasks wikitext hellaswag
 ```
 
 ### Evaluation with model exported to PTE with ExecuTorch
 
-Running an exported model with ExecuTorch (as PTE)
+Running an exported model with ExecuTorch (as PTE).  Advantageously, because you can 
+load an exported PTE model back into the Python environment with torchchat,
+you can run evaluation on the exported model!
 ```
 python3 torchchat.py export stories15M --output-pte-path stories15M.pte
 python3 torchchat.py eval stories15M --pte-path stories15M.pte
 ```
 
-Running multiple tasks and calling eval.py directly (with PTE):
+Running multiple tasks directly on the created PTE mobile model:
 ```
 python3 torchchat.py eval stories15M --pte-path stories15M.pte --tasks wikitext hellaswag
 ```
 
+Now let's evaluate the effect of quantization on evaluation results by exporting with quantization using `--quantize` and an exemplary quantization configuration:
+```
+python3 torchchat.py export stories15M --output-pte-path stories15M.pte --quantize torchchat/quant_config/mobile.json
+python3 torchchat.py eval stories15M --pte-path stories15M.pte --tasks wikitext hellaswag
+```
+
+Now try your own export options to explore different trade-offs between model size, evaluation speed and accuracy using model quantization!
+
 ### Evaluation with model exported to DSO with AOT Inductor (AOTI)
 
-Running an exported model with AOT Inductor (DSO model)
+Running an exported model with AOT Inductor (DSO model).  Advantageously, because you can 
+load an exported DSO model back into the Python environment with torchchat,
+you can run evaluation on the exported model!
 ```
 python3 torchchat.py export stories15M --dtype fast16 --output-dso-path stories15M.so
 python3 torchchat.py eval stories15M --dtype fast16 --dso-path stories15M.so
 ```
 
-Running multiple tasks and calling eval.py directly (with AOTI):
+Running multiple tasks with AOTI:
 ```
 python3 torchchat.py eval stories15M --dso-path stories15M.so --tasks wikitext hellaswag
 ```