@@ -13,7 +13,7 @@ instructions on how to run inference and training on the models.
1313
1414
1515Pre-downloaded model weights
16- ****************************
16+ ----------------------------
1717
1818We have downloaded the following models weights (PyTorch model checkpoint directories):
1919
@@ -97,7 +97,7 @@ Here is an example slurm script using the raw weights to do batch inference. For
9797 --max_seq_len 512 --max_batch_size 16
9898
9999 Model weight conversions
100- ************************
100+ ------------------------
101101
102102Usually models produced in research are stored as weights from PyTorch or other
103103frameworks. When doing inference,
@@ -106,7 +106,7 @@ We also have models that are already converted to different formats.
106106
107107
108108Huggingface
109- -----------
109+ ~~~~~~~~~~~
110110
111111
112112
@@ -145,6 +145,8 @@ Here is a python script using huggingface model.
145145 generated_ids = model.generate(** model_inputs, max_new_tokens = 20 )
146146 print (tokenizer.batch_decode(generated_ids[:, input_length:], skip_special_tokens = True )[0 ])
147147
148+
149+
148150 llama.cpp and GGUF
149151------------------
150152
@@ -201,15 +203,15 @@ This Python code snippet is part of a 'Chat with Your PDF Documents' example, ut
201203 llm = LlamaCpp(model_path = model_path, verbose = False )
202204
203205 Ollama models
204- -------------
206+ ~~~~~~~~~~~~~
205207
206208
207209Doing inference with LLMs
208- *************************
210+ -------------------------
209211
210212Running an interactive chat with Ollama
211- ---------------------------------------
213+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
212214
213215Running inference with LangChain
214- --------------------------------
216+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
215217
0 commit comments