Skip to content

Commit 7e5f883

Browse files
author
Yu Tian
committed
update example links
1 parent cb87057 commit 7e5f883

File tree

1 file changed

+16
-13
lines changed

1 file changed

+16
-13
lines changed

triton/apps/llms.rst

Lines changed: 16 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,8 @@ instructions on how to run inference and training on the models.
1414

1515
Pre-downloaded model weights
1616
----------------------------
17-
17+
Raw model weights
18+
~~~~~~~~~~~~~~~~~
1819
We have downloaded the following models weights (PyTorch model checkpoint directories):
1920

2021
.. list-table::
@@ -66,7 +67,7 @@ Each module will set the following environment variables:
6667
- ``MODEL_ROOT`` - Folder where model weights are stored, i.e., PyTorch model checkpoint directory.
6768
- ``TOKENIZER_PATH`` - File path to the tokenizer.model.
6869

69-
Here is an example slurm script using the raw weights to do batch inference. For detailed environment setting up, example prompts and python code, please check out `this repo <>`__.
70+
Here is an example slurm script using the raw weights to do batch inference. For detailed environment setting up, example prompts and python code, please check out `this repo <https://github.com/AaltoSciComp/llm-examples/tree/main/batch-inference-llama2>`__.
7071

7172
.. code-block:: slurm
7273
@@ -98,15 +99,14 @@ Here is an example slurm script using the raw weights to do batch inference. For
9899
99100
Model weight conversions
100101
------------------------
101-
102102
Usually models produced in research are stored as weights from PyTorch or other
103103
frameworks. When doing inference,
104104

105105
We also have models that are already converted to different formats.
106106

107107

108-
Huggingface
109-
~~~~~~~~~~~
108+
Huggingface Models
109+
~~~~~~~~~~~~~~~~~~~
110110

111111

112112

@@ -148,7 +148,7 @@ Here is a python script using huggingface model.
148148
149149
150150
llama.cpp and GGUF
151-
------------------
151+
~~~~~~~~~~~~~~~~~~~
152152

153153
`llama.cpp <https://github.com/ggerganov/llama.cpp>`__ is a popular framework
154154
for running inference on LLM models with CPUs or GPUs. llama.cpp uses a format
@@ -192,7 +192,7 @@ Each module will set the following environment variables:
192192
- ``MODEL_ROOT`` - Folder where model weights are stored.
193193
- ``MODEL_WEIGHTS`` - Path to the model weights in GGUF format.
194194

195-
This Python code snippet is part of a 'Chat with Your PDF Documents' example, utilizing LangChain and leveraging model weights stored in a .gguf file. For detailed environment setting up and python code, please check out `this repo <>`__.
195+
This Python code snippet is part of a 'Chat with Your PDF Documents' example, utilizing LangChain and leveraging model weights stored in a .gguf file. For detailed environment setting up and python code, please check out `this repo <https://github.com/AaltoSciComp/llm-examples/tree/main/chat-with-pdf>`__.
196196

197197
.. code-block:: python
198198
@@ -202,15 +202,18 @@ This Python code snippet is part of a 'Chat with Your PDF Documents' example, ut
202202
model_path = os.environ.get('MODEL_WEIGHTS')
203203
llm = LlamaCpp(model_path=model_path, verbose=False)
204204
205-
Ollama models
206-
~~~~~~~~~~~~~
205+
206+
More examples
207+
------------------------------------------------------------
208+
209+
Running an interactive chat via a local API
210+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
211+
With the predownloaded model weights, you are also able create an API endpoint locally and initiate an interactive chat interface directly from your shell or command line environment. For detailed setup insturctions, you can checkout `this repo <https://github.com/AaltoSciComp/llm-examples/tree/main/gpt4all-api>`__.
207212

208213

209-
Doing inference with LLMs
210-
-------------------------
214+
Running llama with huggingface
215+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
211216

212-
Running an interactive chat with Ollama
213-
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
214217

215218
Running inference with LangChain
216219
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

0 commit comments

Comments
 (0)