@@ -18,10 +18,10 @@ Torchchat is currently in a pre-release state and under extensive development.
1818[ shell default] : TORCHCHAT_ROOT=${PWD} ./torchchat/utils/scripts/install_et.sh
1919
2020
21- This is the advanced users guide, if you're looking to get started
21+ This is the advanced users'  guide, if you're looking to get started
2222with LLMs, please refer to the README at the root directory of the
2323torchchat distro.  This is an advanced user guide, so we will have
24- many more concepts and options to discuss and taking  advantage of them
24+ many more concepts and options to discuss and take  advantage of them
2525may take some effort.
2626
2727We welcome community contributions of all kinds.  If you find
@@ -41,7 +41,7 @@ While we strive to support a broad range of models, we can't test them
4141all. We classify supported models as tested ✅, work in progress 🚧 or
4242some restrictions ❹.
4343
44- We invite community contributions of new model suport  and test results!
44+ We invite community contributions of new model support  and test results!
4545
4646|  Model |  Tested |  Eager |  torch.compile |  AOT Inductor |  ExecuTorch |  Fits on Mobile | 
4747| -----| --------| -------| -----| -----| -----| -----| 
@@ -86,7 +86,7 @@ Server C++ runtime | n/a | run.cpp model.pte | ✅ |
8686Mobile C++ runtime | n/a | app model.pte | ✅ |
8787Mobile C++ runtime | n/a | app + AOTI | 🚧 |
8888
89- ** Getting help:**  Each command implements the --help option to give addititonal  information about available options:
89+ ** Getting help:**  Each command implements the --help option to give additional  information about available options:
9090
9191[ skip default ] : begin 
9292``` 
@@ -96,8 +96,8 @@ python3 torchchat.py [ export | generate | chat | eval | ... ] --help
9696
9797Exported models can be loaded back into torchchat for chat or text
9898generation, letting you experiment with the exported model and valid
99- model quality. The python  interface is the same in all cases and is
100- used for testing nad  test harnesses too.
99+ model quality. The Python  interface is the same in all cases and is
100+ used for testing and  test harnesses,  too.
101101
102102Torchchat comes with server C++ runtimes to execute AOT Inductor and
103103ExecuTorch models. A mobile C++ runtimes allow you to deploy
@@ -115,7 +115,7 @@ Some common models are recognized by torchchat based on their filename
115115through ` Model.from_name() `  to perform a fuzzy match against a
116116table of known model architectures. Alternatively, you can specify the
117117index into that table with the option ` --params-table ${INDEX} `  where
118- the index is the lookup key key  in the [ the list of known
118+ the index is the lookup key in the [ the list of known
119119pconfigurations] ( https://github.com/pytorch/torchchat/tree/main/torchchat/model_params ) 
120120For example, for the stories15M model, this would be expressed as
121121` --params-table stories15M ` . (We use the model constructor
@@ -237,7 +237,7 @@ which chooses the best 16-bit floating point type.
237237
238238The virtual device fast and virtual floating point data types fast and
239239fast16 are best used for eager/torch.compiled execution.  For export,
240- specify the  your device choice for the target system with --device for
240+ specify your device choice for the target system with --device for
241241AOTI-exported DSO models, and using ExecuTorch delegate selection for
242242ExecuTorch-exported PTE models.
243243
@@ -250,8 +250,7 @@ python3 torchchat.py generate [--compile] --checkpoint-path ${MODEL_PATH} --prom
250250To improve performance, you can compile the model with ` --compile ` 
251251trading off the time to first token processed with time per token.  To
252252improve performance further, you may also compile the prefill with
253- ` --compile_prefill ` . This will increase further compilation times though. The
254- ` --compile-prefill `  option is not compatible with ` --prefill-prefill ` .
253+ ` --compile-prefill ` . This will increase further compilation times though. 
255254
256255Parallel prefill is not yet supported by exported models, and may be
257256supported in a future release.
@@ -265,7 +264,7 @@ the introductory README.
265264In addition to running eval on models in eager mode and JIT-compiled
266265mode with ` torch.compile() ` , you can also load dso and pte models back
267266into the PyTorch to evaluate the accuracy of exported model objects
268- (e.g., after applying quantization or other traqnsformations  to
267+ (e.g., after applying quantization or other transformations  to
269268improve speed or reduce model size).
270269
271270Loading exported models back into a Python-based Pytorch allows you to
@@ -297,14 +296,14 @@ for ExecuTorch.)
297296
298297We export the stories15M model with the following command for
299298execution with the ExecuTorch runtime (and enabling execution on a
300- wide range of community and vendor  supported backends):
299+ wide range of community and vendor- supported backends):
301300
302301``` 
303302python3 torchchat.py export --checkpoint-path ${MODEL_PATH} --output-pte-path ${MODEL_NAME}.pte 
304303``` 
305304
306305Alternatively, we may generate a native instruction stream binary
307- using AOT Inductor for CPU oor  GPUs (the latter using Triton for
306+ using AOT Inductor for CPU or  GPUs (the latter using Triton for
308307optimizations such as operator fusion):
309308
310309``` 
@@ -319,10 +318,10 @@ the exported model artifact back into a model container with a
319318compatible API surface for the ` model.forward() `  function.  This
320319enables users to test, evaluate and exercise the exported model
321320artifact with familiar interfaces, and in conjunction with
322- pre-exiisting  Python model unit tests and common environments such as
321+ pre-existing  Python model unit tests and common environments such as
323322Jupyter notebooks and/or Google colab.
324323
325- Here is how to load an exported model into the python  environment on the example of  using an exported model with ` generate.oy  ` .
324+ Here is how to load an exported model into the Python  environment using an exported model with the  ` generate `  command .
326325
327326``` 
328327python3 torchchat.py generate --checkpoint-path ${MODEL_PATH} --pte-path ${MODEL_NAME}.pte --device cpu --prompt "Once upon a time" 
@@ -452,7 +451,7 @@ strategies:
452451You can find instructions for quantizing models in
453452[ docs/quantization.md] ( file:///./quantization.md ) .  Advantageously,
454453quantization is available in eager mode as well as during export,
455- enabling you to do an early exploration of your quantization setttings 
454+ enabling you to do an early exploration of your quantization settings 
456455in eager mode.  However, final accuracy should always be confirmed on
457456the actual execution target, since all targets have different build
458457processes, compilers, and kernel implementations with potentially
@@ -464,9 +463,8 @@ significant impact on accuracy.
464463
465464## Native (Stand-Alone) Execution of Exported Models  
466465
467- Refer to the [ README] (README.md]  for an introduction toNative
468- execution on servers, desktops and laptops is described under
469- [ runner-build.md] .  Mobile and Edge executipon for Android and iOS are
466+ Refer to the [ README] (README.md]  for an introduction to native
467+ execution on servers, desktops, and laptops.  Mobile and Edge execution for Android and iOS are
470468described under [ torchchat/edge/docs/Android.md]  and [ torchchat/edge/docs/iOS.md] , respectively.
471469
472470
@@ -475,7 +473,7 @@ described under [torchchat/edge/docs/Android.md] and [torchchat/edge/docs/iOS.md
475473
476474PyTorch and ExecuTorch support a broad range of devices for running
477475PyTorch with python (using either eager or eager + ` torch.compile ` ) or
478- in a python -free environment with AOT Inductor and ExecuTorch.
476+ in a Python -free environment with AOT Inductor and ExecuTorch.
479477
480478
481479|  Hardware |  OS |  Eager |  Eager + Compile |  AOT Compile |  ET Runtime | 
@@ -499,58 +497,6 @@ in a python-free environment with AOT Inductor and ExecuTorch.
499497* Key* : n/t -- not tested
500498
501499
502- ## Runtime performance with Llama 7B, in tokens per second (4b quantization)  
503- 
504- |  Hardware |  OS |  eager |  eager + compile |  AOT compile |  ET Runtime | 
505- | -----| ------| -----| -----| -----| -----| 
506- |  x86 |  Linux |  ? |  ? |  ? |  ? | 
507- |  x86 |  macOS |  ? |  ? |  ? |  ? | 
508- |  aarch64 |  Linux |  ? |  ? |  ? |  ? | 
509- |  aarch64 |  macOS |  ? |  ? |  ? |  ? | 
510- |  AMD GPU |  Linux |  ? |  ? |  ? |  ? | 
511- |  Nvidia GPU |  Linux |  ? |  ? |  ? |  ? | 
512- |  MPS |  macOS |  ? |  ? |  ? |  ? | 
513- |  MPS |  iOS |  ? |  ? |  ? |  ? | 
514- |  aarch64 |  Android |  ? |  ? |  ? |  ? | 
515- |  Mobile GPU (Vulkan) |  Android |  ? |  ? |  ? |  ? | 
516- |  CoreML |  iOS |  |  ? |  ? |  ? |  ? | 
517- |  Hexagon DSP |  Android |  |  ? |  ? |  ? |  ? | 
518- |  Raspberry Pi 4/5 |  Raspbian |  ? |  ? |  ? |  ? | 
519- |  Raspberry Pi 4/5 |  Android |  ? |  ? |  ? |  ? | 
520- |  ARM 32b (up to v7) |  any |  |  ? |  ? |  ? |  ? | 
521- 
522- 
523- ## Runtime performance with Llama3, in tokens per second (4b quantization)  
524- 
525- |  Hardware |  OS |  eager |  eager + compile |  AOT compile |  ET Runtime | 
526- | -----| ------| -----| -----| -----| -----| 
527- |  x86 |  Linux |  ? |  ? |  ? |  ? | 
528- |  x86 |  macOS |  ? |  ? |  ? |  ? | 
529- |  aarch64 |  Linux |  ? |  ? |  ? |  ? | 
530- |  aarch64 |  macOS |  ? |  ? |  ? |  ? | 
531- |  AMD GPU |  Linux |  ? |  ? |  ? |  ? | 
532- |  Nvidia GPU |  Linux |  ? |  ? |  ? |  ? | 
533- |  MPS |  macOS |  ? |  ? |  ? |  ? | 
534- |  MPS |  iOS |  ? |  ? |  ? |  ? | 
535- |  aarch64 |  Android |  ? |  ? |  ? |  ? | 
536- |  Mobile GPU (Vulkan) |  Android |  ? |  ? |  ? |  ? | 
537- |  CoreML |  iOS |  |  ? |  ? |  ? |  ? | 
538- |  Hexagon DSP |  Android |  |  ? |  ? |  ? |  ? | 
539- |  Raspberry Pi 4/5 |  Raspbian |  ? |  ? |  ? |  ? | 
540- |  Raspberry Pi 4/5 |  Android |  ? |  ? |  ? |  ? | 
541- |  ARM 32b (up to v7) |  any |  |  ? |  ? |  ? |  ? | 
542- 
543- 
544- 
545- 
546- # CONTRIBUTING to torchchat  
547- 
548- We welcome any feature requests, bug reports, or pull requests from
549- the community. See the [ CONTRIBUTING] ( CONTRIBUTING.md )  for
550- instructions how to contribute to torchchat.
551- 
552- 
553- 
554500# LICENSE  
555501
556502Torchchat is released under the [ BSD 3 license] ( ./LICENSE ) . However
0 commit comments