Skip to content

Commit 6c6db52

Browse files
committed
Add untracked docs and change display names
1 parent 038053b commit 6c6db52

23 files changed

+250
-74
lines changed
File renamed without changes.
Lines changed: 62 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,62 @@
1+
# LlamaCpp
2+
3+
## Prerequisites
4+
5+
Install Llama Cpp Python by following the instructions provided in the [Llama Cpp Python repository](https://github.com/abetlen/llama-cpp-python).
6+
7+
```shell
8+
pip install llama-cpp-python
9+
```
10+
11+
alternatively, to install with CUDA support:
12+
13+
```shell
14+
CMAKE_ARGS="-DGGML_CUDA=on" pip install llama-cpp-python
15+
```
16+
17+
18+
### Initializing the Llama model
19+
20+
Initialize the model within your program with the desired parameters.
21+
22+
```python
23+
from llama_cpp import Llama
24+
25+
llm = Llama(
26+
model_path="./sppo_finetuned_llama_3_8b.gguf",
27+
n_gpu_layers=-1,
28+
n_ctx=0,
29+
verbose=False
30+
)
31+
```
32+
33+
34+
### Sending requests to the model
35+
36+
After initializing the Llama model, you can interact with it using the `LlamaCpp` client.
37+
38+
```python
39+
import dspy
40+
41+
llamalm = dspy.LlamaCpp(model="llama", llama_model=llm, model_type="chat", temperature=0.4)
42+
dspy.settings.configure(lm=llamalm)
43+
44+
45+
#Define a simple signature for basic question answering
46+
class BasicQA(dspy.Signature):
47+
"""Answer questions with short factoid answers."""
48+
question = dspy.InputField()
49+
answer = dspy.OutputField(desc="often between 1 and 5 words")
50+
51+
#Pass signature to Predict module
52+
generate_answer = dspy.Predict(BasicQA)
53+
54+
# Call the predictor on a particular input.
55+
question='What is the color of the sky?'
56+
pred = generate_answer(question=question)
57+
58+
print(f"Question: {question}")
59+
print(f"Predicted Answer: {pred.answer}")
60+
61+
62+
```
Lines changed: 41 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,41 @@
1+
# ChatModuleClient
2+
3+
## Prerequisites
4+
5+
1. Install the required packages using the following commands:
6+
7+
```shell
8+
pip install --no-deps --pre --force-reinstall mlc-ai-nightly-cu118 mlc-chat-nightly-cu118 -f https://mlc.ai/wheels
9+
pip install transformers
10+
git lfs install
11+
```
12+
13+
Adjust the pip wheels according to your OS/platform by referring to the provided commands in [MLC packages](https://mlc.ai/package/).
14+
15+
## Running MLC Llama-2 models
16+
17+
1. Create a directory for prebuilt models:
18+
19+
```shell
20+
mkdir -p dist/prebuilt
21+
```
22+
23+
2. Clone the necessary libraries from the repository:
24+
25+
```shell
26+
git clone https://github.com/mlc-ai/binary-mlc-llm-libs.git dist/prebuilt/lib
27+
cd dist/prebuilt
28+
```
29+
30+
3. Choose a Llama-2 model from [MLC LLMs](https://huggingface.co/mlc-ai) and clone the model repository:
31+
32+
```shell
33+
git clone https://huggingface.co/mlc-ai/mlc-chat-Llama-2-7b-chat-hf-q4f16_1
34+
```
35+
36+
4. Initialize the `ChatModuleClient` within your program with the desired parameters. Here's an example call:
37+
38+
```python
39+
llama = dspy.ChatModuleClient(model='dist/prebuilt/mlc-chat-Llama-2-7b-chat-hf-q4f16_1', model_path='dist/prebuilt/lib/Llama-2-7b-chat-hf-q4f16_1-cuda.so')
40+
```
41+
Please refer to the [official MLC repository](https://github.com/mlc-ai/mlc-llm) for more detailed information and [documentation](https://mlc.ai/mlc-llm/docs/get_started/try_out.html).
Lines changed: 45 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,45 @@
1+
# OllamaLocal
2+
3+
:::note
4+
Adapted from documentation provided by https://github.com/insop
5+
:::
6+
7+
Ollama is a good software tool that allows you to run LLMs locally, such as Mistral, Llama2, and Phi.
8+
The following are the instructions to install and run Ollama.
9+
10+
### Prerequisites
11+
12+
Install Ollama by following the instructions from this page:
13+
14+
- https://ollama.ai
15+
16+
Download model: `ollama pull`
17+
18+
Download a model by running the `ollama pull` command. You can download Mistral, Llama2, and Phi.
19+
20+
```bash
21+
# download mistral
22+
ollama pull mistral
23+
```
24+
25+
Here is the list of other models you can download:
26+
- https://ollama.ai/library
27+
28+
### Running Ollama model
29+
30+
Run model: `ollama run`
31+
32+
You need to start the model server with the `ollama run` command.
33+
34+
```bash
35+
# run mistral
36+
ollama run mistral
37+
```
38+
39+
### Sending requests to the server
40+
41+
Here is the code to load a model through Ollama:
42+
43+
```python
44+
lm = dspy.OllamaLocal(model='mistral')
45+
```
Lines changed: 82 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,82 @@
1+
# TensorRTModel
2+
3+
TensorRT LLM by Nvidia happens to be one of the most optimized inference engines to run open-source Large Language Models locally or in production.
4+
5+
### Prerequisites
6+
7+
Install TensorRT LLM by the following instructions [here](https://nvidia.github.io/TensorRT-LLM/installation/linux.html). You need to install `dspy` inside the same Docker environment in which `tensorrt` is installed.
8+
9+
In order to use this module, you should have the model weights file in engine format. To understand how we convert weights in torch (from HuggingFace models) to TensorRT engine format, you can check out [this documentation](https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/llama#build-tensorrt-engines).
10+
11+
### Running TensorRT model inside dspy
12+
13+
```python
14+
from dspy import TensorRTModel
15+
16+
engine_dir = "<your-path-to-engine-dir>"
17+
model_name_or_path = "<hf-model-id-or-path-to-tokenizer>"
18+
19+
model = TensorRTModel(engine_dir=engine_dir, model_name_or_path=model_name_or_path)
20+
```
21+
22+
You can perform more customization on model loading based on the following example. Below is a list of optional parameters that are supported while initializing the `dspy` TensorRT model.
23+
24+
- **use_py_session** (`bool`, optional): Whether to use a Python session or not. Defaults to `False`.
25+
- **lora_dir** (`str`): The directory of LoRA adapter weights.
26+
- **lora_task_uids** (`List[str]`): List of LoRA task UIDs; use `-1` to disable the LoRA module.
27+
- **lora_ckpt_source** (`str`): The source of the LoRA checkpoint.
28+
29+
If `use_py_session` is set to `False`, the following kwargs are supported (This runs in C++ runtime):
30+
31+
- **max_batch_size** (`int`, optional): The maximum batch size. Defaults to `1`.
32+
- **max_input_len** (`int`, optional): The maximum input context length. Defaults to `1024`.
33+
- **max_output_len** (`int`, optional): The maximum output context length. Defaults to `1024`.
34+
- **max_beam_width** (`int`, optional): The maximum beam width, similar to `n` in OpenAI API. Defaults to `1`.
35+
- **max_attention_window_size** (`int`, optional): The attention window size that controls the sliding window attention / cyclic KV cache behavior. Defaults to `None`.
36+
- **sink_token_length** (`int`, optional): The sink token length. Defaults to `1`.
37+
38+
> Please note that you need to complete the build processes properly before applying these customizations, because a lot of customization depends on how the model engine was built. You can learn more [here](https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/llama#build-tensorrt-engines).
39+
40+
Now to run the model, we need to add the following code:
41+
42+
```python
43+
response = model("hello")
44+
```
45+
46+
This gives this result:
47+
48+
```
49+
["nobody is perfect, and we all have our own unique struggles and challenges. But what sets us apart is how we respond to those challenges. Do we let them define us, or do we use them as opportunities to grow and learn?\nI know that I have my own personal struggles, and I'm sure you do too. But I also know that we are capable of overcoming them, and becoming the best versions of ourselves. So let's embrace our imperfections, and use them to fuel our growth and success.\nRemember, nobody is perfect, but everybody has the potential to be amazing. So let's go out there and make it happen!"]
50+
```
51+
52+
You can also invoke chat mode by just changing the prompt to chat format like this:
53+
54+
```python
55+
prompt = [{"role":"user", "content":"hello"}]
56+
response = model(prompt)
57+
58+
print(response)
59+
```
60+
61+
Output:
62+
63+
```
64+
[" Hello! It's nice to meet you. Is there something I can help you with or would you like to chat?"]
65+
```
66+
67+
Here are some optional parameters that are supported while doing generation:
68+
69+
- **max_new_tokens** (`int`): The maximum number of tokens to output. Defaults to `1024`.
70+
- **max_attention_window_size** (`int`): Defaults to `None`.
71+
- **sink_token_length** (`int`): Defaults to `None`.
72+
- **end_id** (`int`): The end of sequence ID of the tokenizer, defaults to the tokenizer's default end ID.
73+
- **pad_id** (`int`): The pad sequence ID of the tokenizer, defaults to the tokenizer's default end ID.
74+
- **temperature** (`float`): The temperature to control probabilistic behavior in generation. Defaults to `1.0`.
75+
- **top_k** (`int`): Defaults to `1`.
76+
- **top_p** (`float`): Defaults to `1`.
77+
- **num_beams** (`int`): The number of responses to generate. Defaults to `1`.
78+
- **length_penalty** (`float`): Defaults to `1.0`.
79+
- **repetition_penalty** (`float`): Defaults to `1.0`.
80+
- **presence_penalty** (`float`): Defaults to `0.0`.
81+
- **frequency_penalty** (`float`): Defaults to `0.0`.
82+
- **early_stopping** (`int`): Use this only when `num_beams` > 1. Defaults to `1`.
Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
{
2+
"label": "Local Language Model Clients",
3+
"position": 1,
4+
"link": {
5+
"type": "generated-index",
6+
"description": "Local Language Model Clients in DSPy"
7+
}
8+
}

docs/docs/deep-dive/retrieval_models_clients/Azure.mdx

Lines changed: 0 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,3 @@
1-
---
2-
sidebar_position: 2
3-
---
4-
51
import AuthorDetails from '@site/src/components/AuthorDetails';
62

73
# AzureAISearch

docs/docs/deep-dive/retrieval_models_clients/ChromadbRM.mdx

Lines changed: 4 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,9 @@
1-
---
2-
sidebar_position: 1
3-
---
4-
5-
#### Adapted from documentation provided by https://github.com/animtel
6-
71
# ChromadbRM
82

3+
:::note
4+
Adapted from documentation provided by https://github.com/animtel
5+
:::
6+
97
ChromadbRM have the flexibility from a variety of embedding functions as outlined in the [chromadb embeddings documentation](https://docs.trychroma.com/embeddings). While different options are available, this example demonstrates how to utilize OpenAI embeddings specifically.
108

119

docs/docs/deep-dive/retrieval_models_clients/ColBERTv2.mdx

Lines changed: 0 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,3 @@
1-
---
2-
sidebar_position: 1
3-
---
4-
51
import AuthorDetails from '@site/src/components/AuthorDetails';
62

73
# ColBERTv2

0 commit comments

Comments
 (0)