Add untracked docs and change display names

krypticmouse · krypticmouse · commit 6c6db523b056 · 2024-10-14T19:32:10.000-07:00
diff --git a/docs/docs/deep-dive/language_model_clients/lm_local_models/HFClientTGI.mdx b/docs/docs/deep-dive/language_model_clients/lm_local_models/HFClientTGI.mdx
diff --git a/docs/docs/deep-dive/language_model_clients/lm_local_models/HFClientVLLM.mdx b/docs/docs/deep-dive/language_model_clients/lm_local_models/HFClientVLLM.mdx
diff --git a/docs/docs/deep-dive/language_model_clients/lm_local_models/LlamaCpp.md b/docs/docs/deep-dive/language_model_clients/lm_local_models/LlamaCpp.md
@@ -0,0 +1,62 @@
+# LlamaCpp
+
+## Prerequisites
+
+Install Llama Cpp Python by following the instructions provided in the [Llama Cpp Python repository](https://github.com/abetlen/llama-cpp-python).
+
+```shell
+pip install llama-cpp-python
+```
+
+alternatively, to install with CUDA support:
+
+```shell
+CMAKE_ARGS="-DGGML_CUDA=on" pip install llama-cpp-python
+```
+
+
+### Initializing the Llama model
+
+Initialize the model within your program with the desired parameters.
+
+```python
+from llama_cpp import Llama
+
+llm = Llama(
+      model_path="./sppo_finetuned_llama_3_8b.gguf",
+      n_gpu_layers=-1,
+      n_ctx=0,
+      verbose=False
+)
+```
+
+
+### Sending requests to the model
+
+After initializing the Llama model, you can interact with it using the `LlamaCpp` client.
+
+```python
+import dspy
+
+llamalm = dspy.LlamaCpp(model="llama", llama_model=llm,  model_type="chat", temperature=0.4)
+dspy.settings.configure(lm=llamalm)
+
+
+#Define a simple signature for basic question answering
+class BasicQA(dspy.Signature):
+    """Answer questions with short factoid answers."""
+    question = dspy.InputField()
+    answer = dspy.OutputField(desc="often between 1 and 5 words")
+
+#Pass signature to Predict module
+generate_answer = dspy.Predict(BasicQA)
+
+# Call the predictor on a particular input.
+question='What is the color of the sky?'
+pred = generate_answer(question=question)
+
+print(f"Question: {question}")
+print(f"Predicted Answer: {pred.answer}")
+
+
+```
diff --git a/docs/docs/deep-dive/language_model_clients/lm_local_models/MLC.md b/docs/docs/deep-dive/language_model_clients/lm_local_models/MLC.md
@@ -0,0 +1,41 @@
+# ChatModuleClient
+
+## Prerequisites
+
+1. Install the required packages using the following commands:
+   
+   ```shell
+   pip install --no-deps --pre --force-reinstall mlc-ai-nightly-cu118 mlc-chat-nightly-cu118 -f https://mlc.ai/wheels
+   pip install transformers
+   git lfs install
+   ```
+   
+   Adjust the pip wheels according to your OS/platform by referring to the provided commands in [MLC packages](https://mlc.ai/package/).
+
+## Running MLC Llama-2 models
+
+1. Create a directory for prebuilt models:
+
+   ```shell
+   mkdir -p dist/prebuilt
+   ```
+   
+2. Clone the necessary libraries from the repository:
+
+   ```shell
+   git clone https://github.com/mlc-ai/binary-mlc-llm-libs.git dist/prebuilt/lib
+   cd dist/prebuilt
+   ```
+   
+3. Choose a Llama-2 model from [MLC LLMs](https://huggingface.co/mlc-ai) and clone the model repository:
+
+   ```shell
+   git clone https://huggingface.co/mlc-ai/mlc-chat-Llama-2-7b-chat-hf-q4f16_1
+   ```
+
+4. Initialize the `ChatModuleClient` within your program with the desired parameters. Here's an example call:
+
+   ```python
+   llama = dspy.ChatModuleClient(model='dist/prebuilt/mlc-chat-Llama-2-7b-chat-hf-q4f16_1', model_path='dist/prebuilt/lib/Llama-2-7b-chat-hf-q4f16_1-cuda.so')
+   ```
+Please refer to the [official MLC repository](https://github.com/mlc-ai/mlc-llm) for more detailed information and [documentation](https://mlc.ai/mlc-llm/docs/get_started/try_out.html).
diff --git a/docs/docs/deep-dive/language_model_clients/lm_local_models/Ollama.md b/docs/docs/deep-dive/language_model_clients/lm_local_models/Ollama.md
@@ -0,0 +1,45 @@
+# OllamaLocal
+
+:::note
+Adapted from documentation provided by https://github.com/insop
+:::
+
+Ollama is a good software tool that allows you to run LLMs locally, such as Mistral, Llama2, and Phi.
+The following are the instructions to install and run Ollama.
+
+### Prerequisites
+
+Install Ollama by following the instructions from this page:
+
+- https://ollama.ai
+
+Download model: `ollama pull`
+
+Download a model by running the `ollama pull` command. You can download Mistral, Llama2, and Phi.
+
+```bash
+# download mistral
+ollama pull mistral
+```
+
+Here is the list of other models you can download:
+- https://ollama.ai/library
+
+### Running Ollama model
+
+Run model: `ollama run`
+
+You need to start the model server with the `ollama run` command.
+
+```bash
+# run mistral
+ollama run mistral
+```
+
+### Sending requests to the server
+
+Here is the code to load a model through Ollama:
+
+```python
+lm = dspy.OllamaLocal(model='mistral')
+```
diff --git a/docs/docs/deep-dive/language_model_clients/lm_local_models/TensorRTLLM.md b/docs/docs/deep-dive/language_model_clients/lm_local_models/TensorRTLLM.md
@@ -0,0 +1,82 @@
+# TensorRTModel
+
+TensorRT LLM by Nvidia happens to be one of the most optimized inference engines to run open-source Large Language Models locally or in production.
+
+### Prerequisites
+
+Install TensorRT LLM by the following instructions [here](https://nvidia.github.io/TensorRT-LLM/installation/linux.html). You need to install `dspy` inside the same Docker environment in which `tensorrt` is installed.
+
+In order to use this module, you should have the model weights file in engine format. To understand how we convert weights in torch (from HuggingFace models) to TensorRT engine format, you can check out [this documentation](https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/llama#build-tensorrt-engines).
+
+### Running TensorRT model inside dspy
+
+```python
+from dspy import TensorRTModel
+
+engine_dir = "<your-path-to-engine-dir>"
+model_name_or_path = "<hf-model-id-or-path-to-tokenizer>"
+
+model = TensorRTModel(engine_dir=engine_dir, model_name_or_path=model_name_or_path)
+```
+
+You can perform more customization on model loading based on the following example. Below is a list of optional parameters that are supported while initializing the `dspy` TensorRT model.
+
+- **use_py_session** (`bool`, optional): Whether to use a Python session or not. Defaults to `False`.
+- **lora_dir** (`str`): The directory of LoRA adapter weights.
+- **lora_task_uids** (`List[str]`): List of LoRA task UIDs; use `-1` to disable the LoRA module.
+- **lora_ckpt_source** (`str`): The source of the LoRA checkpoint.
+
+If `use_py_session` is set to `False`, the following kwargs are supported (This runs in C++ runtime):
+
+- **max_batch_size** (`int`, optional): The maximum batch size. Defaults to `1`.
+- **max_input_len** (`int`, optional): The maximum input context length. Defaults to `1024`.
+- **max_output_len** (`int`, optional): The maximum output context length. Defaults to `1024`.
+- **max_beam_width** (`int`, optional): The maximum beam width, similar to `n` in OpenAI API. Defaults to `1`.
+- **max_attention_window_size** (`int`, optional): The attention window size that controls the sliding window attention / cyclic KV cache behavior. Defaults to `None`.
+- **sink_token_length** (`int`, optional): The sink token length. Defaults to `1`.
+
+> Please note that you need to complete the build processes properly before applying these customizations, because a lot of customization depends on how the model engine was built. You can learn more [here](https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/llama#build-tensorrt-engines).
+
+Now to run the model, we need to add the following code:
+
+```python
+response = model("hello")
+```
+
+This gives this result:
+
+```
+["nobody is perfect, and we all have our own unique struggles and challenges. But what sets us apart is how we respond to those challenges. Do we let them define us, or do we use them as opportunities to grow and learn?\nI know that I have my own personal struggles, and I'm sure you do too. But I also know that we are capable of overcoming them, and becoming the best versions of ourselves. So let's embrace our imperfections, and use them to fuel our growth and success.\nRemember, nobody is perfect, but everybody has the potential to be amazing. So let's go out there and make it happen!"]
+```
+
+You can also invoke chat mode by just changing the prompt to chat format like this:
+
+```python
+prompt = [{"role":"user", "content":"hello"}]
+response = model(prompt)
+
+print(response)
+```
+
+Output:
+
+```
+[" Hello! It's nice to meet you. Is there something I can help you with or would you like to chat?"]
+```
+
+Here are some optional parameters that are supported while doing generation:
+
+- **max_new_tokens** (`int`): The maximum number of tokens to output. Defaults to `1024`.
+- **max_attention_window_size** (`int`): Defaults to `None`.
+- **sink_token_length** (`int`): Defaults to `None`.
+- **end_id** (`int`): The end of sequence ID of the tokenizer, defaults to the tokenizer's default end ID.
+- **pad_id** (`int`): The pad sequence ID of the tokenizer, defaults to the tokenizer's default end ID.
+- **temperature** (`float`): The temperature to control probabilistic behavior in generation. Defaults to `1.0`.
+- **top_k** (`int`): Defaults to `1`.
+- **top_p** (`float`): Defaults to `1`.
+- **num_beams** (`int`): The number of responses to generate. Defaults to `1`.
+- **length_penalty** (`float`): Defaults to `1.0`.
+- **repetition_penalty** (`float`): Defaults to `1.0`.
+- **presence_penalty** (`float`): Defaults to `0.0`.
+- **frequency_penalty** (`float`): Defaults to `0.0`.
+- **early_stopping** (`int`): Use this only when `num_beams` > 1. Defaults to `1`.
diff --git a/docs/docs/deep-dive/language_model_clients/lm_local_models/_category_.json b/docs/docs/deep-dive/language_model_clients/lm_local_models/_category_.json
@@ -0,0 +1,8 @@
+{
+    "label": "Local Language Model Clients",
+    "position": 1,
+    "link": {
+      "type": "generated-index",
+      "description": "Local Language Model Clients in DSPy"
+    }
+}
diff --git a/docs/docs/deep-dive/retrieval_models_clients/Azure.mdx b/docs/docs/deep-dive/retrieval_models_clients/Azure.mdx
@@ -1,7 +1,3 @@
----
-sidebar_position: 2
----
-
 import AuthorDetails from '@site/src/components/AuthorDetails';
 
 # AzureAISearch
diff --git a/docs/docs/deep-dive/retrieval_models_clients/ChromadbRM.mdx b/docs/docs/deep-dive/retrieval_models_clients/ChromadbRM.mdx
@@ -1,11 +1,9 @@
----
-sidebar_position: 1
----
-
-#### Adapted from documentation provided by https://github.com/animtel
-
 # ChromadbRM
 
+:::note
+Adapted from documentation provided by https://github.com/animtel
+:::
+
 ChromadbRM have the flexibility from a variety of embedding functions as outlined in the [chromadb embeddings documentation](https://docs.trychroma.com/embeddings). While different options are available, this example demonstrates how to utilize OpenAI embeddings specifically.
 
 
diff --git a/docs/docs/deep-dive/retrieval_models_clients/ColBERTv2.mdx b/docs/docs/deep-dive/retrieval_models_clients/ColBERTv2.mdx
@@ -1,7 +1,3 @@
----
-sidebar_position: 1
----
-
 import AuthorDetails from '@site/src/components/AuthorDetails';
 
 # ColBERTv2
diff --git a/docs/docs/deep-dive/retrieval_models_clients/DatabricksRM.md b/docs/docs/deep-dive/retrieval_models_clients/DatabricksRM.md
@@ -1,8 +1,4 @@
----
-sidebar_position: 2
----
-
-# retrieve.DatabricksRM
+# DatabricksRM
 
 ### Constructor
 
diff --git a/docs/docs/deep-dive/retrieval_models_clients/FaissRM.md b/docs/docs/deep-dive/retrieval_models_clients/FaissRM.md
@@ -1,8 +1,4 @@
----
-sidebar_position: 5
----
-
-# retrieve.FaissRM
+# FaissRM
 
 ### Constructor
 
diff --git a/docs/docs/deep-dive/retrieval_models_clients/LancedbRM.mdx b/docs/docs/deep-dive/retrieval_models_clients/LancedbRM.mdx
@@ -1,7 +1,3 @@
----
-sidebar_position: 1
----
-
 # LancedbRM
 
 [LanceDB](http://lancedb.com/) is a developer-friendly, open source database for AI. From hyper scalable vector search and advanced retrieval for RAG, to streaming training data and interactive exploration of large scale AI datasets, LanceDB is the best foundation for your AI application.
diff --git a/docs/docs/deep-dive/retrieval_models_clients/MilvusRM.mdx b/docs/docs/deep-dive/retrieval_models_clients/MilvusRM.mdx
@@ -1,7 +1,3 @@
----
-sidebar_position: 1
----
-
 # MilvusRM
 
 MilvusRM uses OpenAI's `text-embedding-3-small` embedding by default or any customized embedding function.
diff --git a/docs/docs/deep-dive/retrieval_models_clients/MyScaleRM.md b/docs/docs/deep-dive/retrieval_models_clients/MyScaleRM.md
@@ -1,8 +1,4 @@
----
-sidebar_position: 7
----
-
-# retrieve.MyScaleRM
+# MyScaleRM
 
 ## Constructor
 
diff --git a/docs/docs/deep-dive/retrieval_models_clients/Neo4jRM.md b/docs/docs/deep-dive/retrieval_models_clients/Neo4jRM.md
@@ -1,8 +1,4 @@
----
-sidebar_position: 8
----
-
-# retrieve.neo4j_rm
+# Neo4jRM
 
 ### Constructor
 
diff --git a/docs/docs/deep-dive/retrieval_models_clients/QdrantRM.mdx b/docs/docs/deep-dive/retrieval_models_clients/QdrantRM.mdx
@@ -1,7 +1,3 @@
----
-sidebar_position: 1
----
-
 # QdrantRM
 
 [Qdrant](https://qdrant.tech/) is an open-source, high-performance vector search engine/database written in Rust. It can be used to retrieve semantically relevant passages to pass as context to your language model.
diff --git a/docs/docs/deep-dive/retrieval_models_clients/RAGatouilleRM.md b/docs/docs/deep-dive/retrieval_models_clients/RAGatouilleRM.md
@@ -1,8 +1,4 @@
----
-sidebar_position: 9
----
-
-# retrieve.RAGatouilleRM
+# RAGatouilleRM
 
 ### Constructor
 
diff --git a/docs/docs/deep-dive/retrieval_models_clients/SnowflakeRM.md b/docs/docs/deep-dive/retrieval_models_clients/SnowflakeRM.md
@@ -1,8 +1,4 @@
----
-sidebar_position: 10
----
-
-# retrieve.SnowflakeRM
+# SnowflakeRM
 
 ### Constructor
 
diff --git a/docs/docs/deep-dive/retrieval_models_clients/WatsonDiscovery.md b/docs/docs/deep-dive/retrieval_models_clients/WatsonDiscovery.md
@@ -1,8 +1,4 @@
----
-sidebar_position: 11
----
-
-# retrieve.WatsonDiscoveryRM
+# WatsonDiscoveryRM
 
 ### Constructor
 
diff --git a/docs/docs/deep-dive/retrieval_models_clients/WeaviateRM.mdx b/docs/docs/deep-dive/retrieval_models_clients/WeaviateRM.mdx
@@ -1,7 +1,3 @@
----
-sidebar_position: 1
----
-
 # Weaviate Retrieval Model
 [Weaviate](https://weaviate.io/) is an open-source vector database that can be used to retrieve relevant passages before passing it to the language model. Weaviate supports a variety of [embedding models](https://weaviate.io/developers/weaviate/model-providers) from OpenAI, Cohere, Google and more! Before building your DSPy program, you will need a Weaviate cluster running with data. You can follow this [notebook](https://github.com/weaviate/recipes/blob/main/integrations/llm-frameworks/dspy/Weaviate-Import.ipynb) as an example. 
 
diff --git a/docs/docs/deep-dive/retrieval_models_clients/YouRM.md b/docs/docs/deep-dive/retrieval_models_clients/YouRM.md
@@ -1,8 +1,4 @@
----
-sidebar_position: 12
----
-
-# retrieve.YouRM
+# YouRM
 
 ### Constructor
 
diff --git a/docs/docs/deep-dive/retrieval_models_clients/custom-rm-client.mdx b/docs/docs/deep-dive/retrieval_models_clients/custom-rm-client.mdx
@@ -1,7 +1,3 @@
----
-sidebar_position: 3
----
-
 import AuthorDetails from '@site/src/components/AuthorDetails';
 
 # Creating Custom RM Client