You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: triton/apps/llms.rst
+16-13Lines changed: 16 additions & 13 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -14,7 +14,8 @@ instructions on how to run inference and training on the models.
14
14
15
15
Pre-downloaded model weights
16
16
----------------------------
17
-
17
+
Raw model weights
18
+
~~~~~~~~~~~~~~~~~
18
19
We have downloaded the following models weights (PyTorch model checkpoint directories):
19
20
20
21
.. list-table::
@@ -66,7 +67,7 @@ Each module will set the following environment variables:
66
67
- ``MODEL_ROOT`` - Folder where model weights are stored, i.e., PyTorch model checkpoint directory.
67
68
- ``TOKENIZER_PATH`` - File path to the tokenizer.model.
68
69
69
-
Here is an example slurm script using the raw weights to do batch inference. For detailed environment setting up, example prompts and python code, please check out `this repo <>`__.
70
+
Here is an example slurm script using the raw weights to do batch inference. For detailed environment setting up, example prompts and python code, please check out `this repo <https://github.com/AaltoSciComp/llm-examples/tree/main/batch-inference-llama2>`__.
70
71
71
72
.. code-block:: slurm
72
73
@@ -98,15 +99,14 @@ Here is an example slurm script using the raw weights to do batch inference. For
98
99
99
100
Model weight conversions
100
101
------------------------
101
-
102
102
Usually models produced in research are stored as weights from PyTorch or other
103
103
frameworks. When doing inference,
104
104
105
105
We also have models that are already converted to different formats.
106
106
107
107
108
-
Huggingface
109
-
~~~~~~~~~~~
108
+
Huggingface Models
109
+
~~~~~~~~~~~~~~~~~~~
110
110
111
111
112
112
@@ -148,7 +148,7 @@ Here is a python script using huggingface model.
148
148
149
149
150
150
llama.cpp and GGUF
151
-
------------------
151
+
~~~~~~~~~~~~~~~~~~~
152
152
153
153
`llama.cpp <https://github.com/ggerganov/llama.cpp>`__ is a popular framework
154
154
for running inference on LLM models with CPUs or GPUs. llama.cpp uses a format
@@ -192,7 +192,7 @@ Each module will set the following environment variables:
192
192
- ``MODEL_ROOT`` - Folder where model weights are stored.
193
193
- ``MODEL_WEIGHTS`` - Path to the model weights in GGUF format.
194
194
195
-
This Python code snippet is part of a 'Chat with Your PDF Documents' example, utilizing LangChain and leveraging model weights stored in a .gguf file. For detailed environment setting up and python code, please check out `this repo <>`__.
195
+
This Python code snippet is part of a 'Chat with Your PDF Documents' example, utilizing LangChain and leveraging model weights stored in a .gguf file. For detailed environment setting up and python code, please check out `this repo <https://github.com/AaltoSciComp/llm-examples/tree/main/chat-with-pdf>`__.
196
196
197
197
.. code-block:: python
198
198
@@ -202,15 +202,18 @@ This Python code snippet is part of a 'Chat with Your PDF Documents' example, ut
With the predownloaded model weights, you are also able create an API endpoint locally and initiate an interactive chat interface directly from your shell or command line environment. For detailed setup insturctions, you can checkout `this repo <https://github.com/AaltoSciComp/llm-examples/tree/main/gpt4all-api>`__.
0 commit comments