Skip to content
This repository was archived by the owner on Sep 10, 2025. It is now read-only.
Merged
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
26 changes: 19 additions & 7 deletions torchchat/utils/docs/evaluation.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ The evaluation mode of `torchchat.py` script can be used to evaluate your langua

## Examples

### Evaluation example with model in Python
### Evaluation example with model in Python environment

Running wikitext for 10 iterations
```
Expand All @@ -35,33 +35,45 @@ Running wikitext with torch.compile for 10 iterations
python3 torchchat.py eval stories15M --compile --tasks wikitext --limit 10
```

Running multiple tasks and calling eval.py directly (with torch.compile):
Running multiple tasks with torch.compile for evaluation and prefill:
```
python3 torchchat.py eval stories15M --compile --tasks wikitext hellaswag
python3 torchchat.py eval stories15M --compile --compile-prefill --tasks wikitext hellaswag
```

### Evaluation with model exported to PTE with ExecuTorch

Running an exported model with ExecuTorch (as PTE)
Running an exported model with ExecuTorch (as PTE). Advantageously, because we can
load an exported PTE model back into the Python environment with torchchat,
we can run evaluation on the exported model!
```
python3 torchchat.py export stories15M --output-pte-path stories15M.pte
python3 torchchat.py eval stories15M --pte-path stories15M.pte
```

Running multiple tasks and calling eval.py directly (with PTE):
Running multiple tasks directly by creating a PTE mobile model:
```
python3 torchchat.py eval stories15M --pte-path stories15M.pte --tasks wikitext hellaswag
```

Now let's evaluate the effect of quantization on evaluation results by exporting with quantization using `--quantize` and an exemplary quantization configuration:
```
python3 torchchat.py export stories15M --output-pte-path stories15M.pte --quantize torchchat/quant_config/mobile.json
python3 torchchat.py eval stories15M --pte-path stories15M.pte --tasks wikitext hellaswag
```

Now try your own export options to explore different trade-offs between model size, evaluation speed and accuracy using model quantization!

### Evaluation with model exported to DSO with AOT Inductor (AOTI)

Running an exported model with AOT Inductor (DSO model)
Running an exported model with AOT Inductor (DSO model). Advantageously, because we can
load an exported PTE model back into the Python environment with torchchat,
we can run evaluation on the exported model!
```
python3 torchchat.py export stories15M --dtype fast16 --output-dso-path stories15M.so
python3 torchchat.py eval stories15M --dtype fast16 --dso-path stories15M.so
```

Running multiple tasks and calling eval.py directly (with AOTI):
Running multiple tasks with AOTI:
```
python3 torchchat.py eval stories15M --dso-path stories15M.so --tasks wikitext hellaswag
```
Expand Down
Loading