You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Large-language models are AI models that can understand and generate text
8
-
using transformer architectures.
5
+
Large-language models are AI models that can understand and generate text,
6
+
primarily using transformer architectures.
9
7
10
8
Because the model weights are typically very large and the interest in the
11
-
models is high, we provide our users pre-downloaded model weights and
12
-
instructions on how to run inference and training on the models.
9
+
models is high, so we provide our users with pre-downloaded model weights and instructions on how to load these weights for inference purposes or for retraining and fine-tuning the models.
13
10
14
11
15
12
Pre-downloaded model weights
16
13
----------------------------
17
14
Raw model weights
18
15
~~~~~~~~~~~~~~~~~
19
-
We have downloaded the following models weights (PyTorch model checkpoint directories):
16
+
We have downloaded the following raw model weights (PyTorch model checkpoints):
20
17
21
18
.. list-table::
22
19
:header-rows: 1
@@ -28,7 +25,7 @@ We have downloaded the following models weights (PyTorch model checkpoint direct
28
25
* Description
29
26
30
27
* * Llama 2
31
-
* Raw data
28
+
* Raw Data
32
29
* ``module load model-llama2/raw-data``
33
30
* Raw weights of `Llama 2 <https://ai.meta.com/llama/>`__.
34
31
@@ -62,12 +59,59 @@ We have downloaded the following models weights (PyTorch model checkpoint direct
62
59
* ``module load model-llama2/70b-chat``
63
60
* Raw weights of 70B parameter chat optimized version of `Llama 2 <https://ai.meta.com/llama/>`__.
64
61
62
+
* * CodeLlama
63
+
* Raw Data
64
+
* ``module load model-codellama/raw-data``
65
+
* Raw weights of `CodeLlama <https://ai.meta.com/blog/code-llama-large-language-model-coding/>`__.
66
+
67
+
* * CodeLlama
68
+
* 7b
69
+
* ``module load model-codellama/7b``
70
+
* Raw weights of 7B parameter version of `CodeLlama <https://ai.meta.com/blog/code-llama-large-language-model-coding/>`__.
71
+
72
+
* * CodeLlama
73
+
* 7b-Python
74
+
* ``module load model-codellama/7b-python``
75
+
* Raw weights of 7B parameter version `CodeLlama <https://ai.meta.com/blog/code-llama-large-language-model-coding/>`__, specifically designed for Python.
76
+
* * CodeLlama
77
+
* 7b-Instruct
78
+
* ``module load model-codellama/7b-instruct``
79
+
* Raw weights of 7B parameter version `CodeLlama <https://ai.meta.com/blog/code-llama-large-language-model-coding/>`__, designed for instruction following.
80
+
81
+
* * CodeLlama
82
+
* 13b
83
+
* ``module load model-codellama/13b``
84
+
* Raw weights of 13B parameter version of `CodeLlama <https://ai.meta.com/blog/code-llama-large-language-model-coding/>`__.
85
+
86
+
* * CodeLlama
87
+
* 13b-Python
88
+
* ``module load model-codellama/13b-python``
89
+
* Raw weights of 13B parameter version `CodeLlama <https://ai.meta.com/blog/code-llama-large-language-model-coding/>`__, specifically designed for Python.
90
+
* * CodeLlama
91
+
* 13b-Instruct
92
+
* ``module load model-codellama/13b-instruct``
93
+
* Raw weights of 13B parameter version `CodeLlama <https://ai.meta.com/blog/code-llama-large-language-model-coding/>`__, designed for instruction following.
94
+
95
+
* * CodeLlama
96
+
* 34b
97
+
* ``module load model-codellama/34b``
98
+
* Raw weights of 34B parameter version of `CodeLlama <https://ai.meta.com/blog/code-llama-large-language-model-coding/>`__.
99
+
100
+
* * CodeLlama
101
+
* 34b-Python
102
+
* ``module load model-codellama/34b-python``
103
+
* Raw weights of 34B parameter version `CodeLlama <https://ai.meta.com/blog/code-llama-large-language-model-coding/>`__, specifically designed for Python.
104
+
* * CodeLlama
105
+
* 34b-Instruct
106
+
* ``module load model-codellama/34b-instruct``
107
+
* Raw weights of 34B parameter version `CodeLlama <https://ai.meta.com/blog/code-llama-large-language-model-coding/>`__, designed for instruction following.
108
+
65
109
Each module will set the following environment variables:
66
110
67
111
- ``MODEL_ROOT`` - Folder where model weights are stored, i.e., PyTorch model checkpoint directory.
68
112
- ``TOKENIZER_PATH`` - File path to the tokenizer.model.
69
113
70
-
Here is an example slurm script using the raw weights to do batch inference. For detailed environment setting up, example prompts and python code, please check out `this repo <https://github.com/AaltoSciComp/llm-examples/tree/main/batch-inference-llama2>`__.
114
+
Here is an example `slurm <https://scicomp.aalto.fi/triton/tut/slurm/>`__, script using the raw weights to do batch inference. For detailed environment setting up, example prompts and Python code, please check out `this repo <https://github.com/AaltoSciComp/llm-examples/tree/main/batch-inference-llama2>`__.
71
115
72
116
.. code-block:: slurm
73
117
@@ -86,7 +130,7 @@ Here is an example slurm script using the raw weights to do batch inference. For
@@ -99,46 +143,110 @@ Here is an example slurm script using the raw weights to do batch inference. For
99
143
100
144
Model weight conversions
101
145
------------------------
102
-
Usually models produced in research are stored as weights from PyTorch or other
146
+
Usually, models produced in research are stored as weights from PyTorch or other
103
147
frameworks. As for inference, we also have models that are already converted to different formats.
104
148
105
149
106
150
Huggingface Models
107
151
~~~~~~~~~~~~~~~~~~~
108
152
109
153
110
-
We have the following Huggingface models stored:
154
+
Currently, we have the following Huggingface models stored on triton. Please contact us if you need any other models.
111
155
112
156
.. list-table::
113
157
:header-rows: 1
114
-
:widths: 1 1 3 2
158
+
:widths: 1 1
115
159
116
160
* * Model type
117
-
* Model version
118
-
* Module command to load
119
-
* Description
161
+
* Huggingface model identifier
120
162
121
-
* * Llama 2
122
-
*
123
-
* Module command to load
124
-
* Description
163
+
* * Text Generation
164
+
* mistralai/Mistral-7B-v0.1
165
+
166
+
* * Text Generation
167
+
* mistralai/Mistral-7B-Instruct-v0.1
125
168
126
-
All Huggingface models can be loaded with: ``module load model-huggingface/all``,
127
-
Here is a python script using huggingface model.
169
+
* * Text Generation
170
+
* tiiuae/falcon-7b
171
+
172
+
* * Text Generation
173
+
* tiiuae/falcon-7b-instruct
174
+
175
+
* * Text Generation
176
+
* tiiuae/falcon-40b
177
+
178
+
* * Text Generation
179
+
* tiiuae/falcon-40b-instruct
180
+
181
+
* * Text Generation
182
+
* meta-llama/Llama-2-7b-hf
183
+
184
+
* * Text Generation
185
+
* meta-llama/Llama-2-13b-hf
186
+
187
+
* * Text Generation
188
+
* meta-llama/Llama-2-70b-hf
189
+
190
+
* * Text Generation
191
+
* codellama/CodeLlama-7b-hf
192
+
193
+
* * Text Generation
194
+
* codellama/CodeLlama-13b-hf
195
+
196
+
* * Text Generation
197
+
* codellama/CodeLlama-34b-hf
198
+
199
+
* * Translation
200
+
* Helsinki-NLP/opus-mt-en-fi
201
+
202
+
* * Translation
203
+
* Helsinki-NLP/opus-mt-fi-en
204
+
205
+
* * Translation
206
+
* t5-base
207
+
208
+
* * Fill Mask
209
+
* bert-base-uncased
210
+
211
+
* * Fill Mask
212
+
* bert-base-cased
213
+
214
+
* * Fill Mask
215
+
* distilbert-base-uncased
216
+
217
+
* * Text to Speech
218
+
* microsoft/speecht5_hifigan
219
+
220
+
* * Text to Speech
221
+
* facebook/hf-seamless-m4t-large
222
+
223
+
* * Automatic Speech Recognition
224
+
* openai/whisper-large-v3
225
+
226
+
* * Token Classification
227
+
* dslim/bert-base-NER-uncased
228
+
229
+
230
+
231
+
All Huggingface models can be loaded with ``module load model-huggingface/all``.
232
+
Here is a Python script using huggingface model.
128
233
129
234
.. code-block:: python
130
235
131
-
#force transformer to use local hub instead of download from remote hub
236
+
## Force transformer to load model(s) from local hub instead of download and load model(s) from remote hub. NOTE: this must be run before importing transformers.
132
237
import os
133
238
os.environ['TRANSFORMERS_OFFLINE'] ='1'
134
239
135
240
from transformers import AutoModelForCausalLM, AutoTokenizer
for running inference on LLM models with CPUs or GPUs. llama.cpp uses a format
152
260
called GGUF as its storage format.
153
261
154
-
We have llama.cpp conversions of all models with multiple quantizations levels.
262
+
We have llama.cpp conversions of all Llama 2 and CodeLlama models with multiple quantization levels.
155
263
156
-
Before loading the modulesload a module for the model weight you want to use.
264
+
NOTE: Before loading the following modules, one must first load a module for the raw model weights. For example, run ``module load model-codellama/34b`` first, and then run ``module load codellama.cpp/q8_0-2023-12-04`` to get the 8-bit integer version of CodeLlama weights in a .gguf file.
157
265
158
266
.. list-table::
159
267
:header-rows: 1
@@ -164,32 +272,47 @@ Before loading the modules load a module for the model weight you want to use.
164
272
* Module command to load
165
273
* Description
166
274
167
-
* * Llama 2
275
+
* * Llama 2
168
276
* f16-2023-08-28
169
-
* ``module load model-llama.cpp/f16-2023-08-28`` (after loading a Llama 2 model for some weight)
170
-
* Half precision version of Llama 2 weights done with llama.cpp on 28th of Aug 2023.
277
+
* ``module load model-llama.cpp/f16-2023-12-04`` (after loading a Llama 2 model for some raw weights)
278
+
* Half precision version of Llama 2 weights done with llama.cpp on 4th of Dec 2023.
171
279
172
-
* * Llama 2
280
+
* * Llama 2
173
281
* q4_0-2023-08-28
174
-
* ``module load model-llama.cpp/q4_0-2023-08-28`` (after loading a Llama 2 model for some weight)
175
-
* 4-bit integer version of Llama 2 weights done with llama.cpp on 28th of Aug 2023.
282
+
* ``module load model-llama.cpp/q4_0-2023-12-04`` (after loading a Llama 2 model for some raw weights)
283
+
* 4-bit integer version of Llama 2 weights done with llama.cpp on 4th of Dec 2023.
176
284
177
285
* * Llama 2
178
286
* q4_1-2023-08-28
179
-
* ``module load model-llama.cpp/q4_1-2023-08-28`` (after loading a Llama 2 model for some weight)
180
-
* 4-bit integer version of Llama 2 weights done with llama.cpp on 28th of Aug 2023.
287
+
* ``module load model-llama.cpp/q4_1-2023-12-04`` (after loading a Llama2 model for some raw weights)
288
+
* 4-bit integer version of Llama 2 weights done with llama.cpp on 4th of Dec 2023.
181
289
182
-
* * Llama 2
290
+
* * Llama 2
291
+
* q8_0-2023-08-28
292
+
* ``module load model-llama.cpp/q8_0-2023-12-04`` (after loading a Llama 2 model for some raw weights)
293
+
* 8-bit integer version of Llama 2 weights done with llama.cpp on 4th of Dec 2023.
294
+
295
+
* * CodeLlama
296
+
* f16-2023-08-28
297
+
* ``module load codellama.cpp/f16-2023-12-04`` (after loading a CodeLlama model for some raw weights)
298
+
* Half precision version of CodeLlama weights done with llama.cpp on 4th of Dec 2023.
299
+
300
+
* * CodeLlama
301
+
* q4_0-2023-08-28
302
+
* ``module load codellama.cpp/q4_0-2023-12-04`` (after loading a CodeLlama model for some raw weights)
303
+
* 4-bit integer version of CodeLlama weights done with llama.cpp on 4th of Dec 2023.
304
+
305
+
* * CodeLlama
183
306
* q8_0-2023-08-28
184
-
* ``module load model-llama.cpp/q8_0-2023-08-28`` (after loading a Llama 2 model for some weight)
185
-
* 8-bit integer version of Llama 2 weights done with llama.cpp on 28th of Aug 2023.
307
+
* ``module load codellama.cpp/q8_0-2023-12-04`` (after loading a CodeLlama model for some raw weights)
308
+
* 8-bit integer version of CodeLlama weights done with llama.cpp on 4th of Dec 2023.
186
309
187
310
Each module will set the following environment variables:
188
311
189
312
- ``MODEL_ROOT`` - Folder where model weights are stored.
190
-
- ``MODEL_WEIGHTS`` - Path to the model weights in GGUF format.
313
+
- ``MODEL_WEIGHTS`` - Path to the model weights in GGUF file format.
191
314
192
-
This Python code snippet is part of a 'Chat with Your PDF Documents' example, utilizing LangChain and leveraging model weights stored in a .gguf file. For detailed environment setting up and python code, please check out `this repo <https://github.com/AaltoSciComp/llm-examples/tree/main/chat-with-pdf>`__.
315
+
This Python code snippet is part of a 'Chat with Your PDF Documents' example, utilizing LangChain and leveraging model weights stored in a .gguf file. For detailed environment setting up and Python code, please check out `this repo <https://github.com/AaltoSciComp/llm-examples/tree/main/chat-with-pdf>`__.
193
316
194
317
.. code-block:: python
195
318
@@ -203,15 +326,7 @@ This Python code snippet is part of a 'Chat with Your PDF Documents' example, ut
With the predownloaded model weights, you are also able create an API endpoint locally and initiate an interactive chat interface directly from your shell or command line environment. For detailed setup insturctions, you can checkout `this repo <https://github.com/AaltoSciComp/llm-examples/tree/main/gpt4all-api>`__.
209
-
210
-
211
-
Running llama with huggingface
212
-
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
213
-
214
-
215
-
Running inference with LangChain
216
-
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
331
+
With the pre-downloaded model weights, you are also able create an API endpoint locally. For detailed examples, you can checkout `this repo <https://github.com/AaltoSciComp/llm-examples/tree/main/>`__.
0 commit comments