diff --git a/torchchat/utils/docs/evaluation.md b/torchchat/utils/docs/evaluation.md index ac2aa54d3..77414eeb4 100644 --- a/torchchat/utils/docs/evaluation.md +++ b/torchchat/utils/docs/evaluation.md @@ -23,7 +23,7 @@ The evaluation mode of `torchchat.py` script can be used to evaluate your langua ## Examples -### Evaluation example with model in Python +### Evaluation example with model in Python environment Running wikitext for 10 iterations ``` @@ -35,33 +35,45 @@ Running wikitext with torch.compile for 10 iterations python3 torchchat.py eval stories15M --compile --tasks wikitext --limit 10 ``` -Running multiple tasks and calling eval.py directly (with torch.compile): +Running multiple tasks with torch.compile for evaluation and prefill: ``` -python3 torchchat.py eval stories15M --compile --tasks wikitext hellaswag +python3 torchchat.py eval stories15M --compile --compile-prefill --tasks wikitext hellaswag ``` ### Evaluation with model exported to PTE with ExecuTorch -Running an exported model with ExecuTorch (as PTE) +Running an exported model with ExecuTorch (as PTE). Advantageously, because you can +load an exported PTE model back into the Python environment with torchchat, +you can run evaluation on the exported model! ``` python3 torchchat.py export stories15M --output-pte-path stories15M.pte python3 torchchat.py eval stories15M --pte-path stories15M.pte ``` -Running multiple tasks and calling eval.py directly (with PTE): +Running multiple tasks directly on the created PTE mobile model: ``` python3 torchchat.py eval stories15M --pte-path stories15M.pte --tasks wikitext hellaswag ``` +Now let's evaluate the effect of quantization on evaluation results by exporting with quantization using `--quantize` and an exemplary quantization configuration: +``` +python3 torchchat.py export stories15M --output-pte-path stories15M.pte --quantize torchchat/quant_config/mobile.json +python3 torchchat.py eval stories15M --pte-path stories15M.pte --tasks wikitext hellaswag +``` + +Now try your own export options to explore different trade-offs between model size, evaluation speed and accuracy using model quantization! + ### Evaluation with model exported to DSO with AOT Inductor (AOTI) -Running an exported model with AOT Inductor (DSO model) +Running an exported model with AOT Inductor (DSO model). Advantageously, because you can +load an exported DSO model back into the Python environment with torchchat, +you can run evaluation on the exported model! ``` python3 torchchat.py export stories15M --dtype fast16 --output-dso-path stories15M.so python3 torchchat.py eval stories15M --dtype fast16 --dso-path stories15M.so ``` -Running multiple tasks and calling eval.py directly (with AOTI): +Running multiple tasks with AOTI: ``` python3 torchchat.py eval stories15M --dso-path stories15M.so --tasks wikitext hellaswag ```