Skip to content

Commit aabc175

Browse files
authored
Merge pull request #4 from stackhpc/feat/mistral
Fix Mistral 7B support
2 parents 708dfc5 + 445ec79 commit aabc175

File tree

3 files changed

+30
-7
lines changed

3 files changed

+30
-7
lines changed

README.md

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ This repository contains a Helm chart for deploying Large Language Models (LLMs)
44

55
## Azimuth App
66

7-
This app ~~is~~ will soon be provided as part of a standard deployment Azimuth so no specific steps are required to use this app other than access to an up to date Azimuth deployment.
7+
This app ~~is~~ will soon be provided as part of a standard deployment Azimuth, so no specific steps are required to use this app other than access to an up-to-date Azimuth deployment.
88

99
## Manual Deployment
1010

@@ -16,7 +16,7 @@ helm repo update
1616
helm install <installation-name> <chosen-repo-name>/azimuth-llm --version <version>
1717
```
1818

19-
where version is the full published version for the specified commit (e.g. `0.1.0-dev.0.main.125`). To see the latest published version, see [this page](https://github.com/stackhpc/azimuth-llm/tree/gh-pages).
19+
where `version` is the full name of the published version for the specified commit (e.g. `0.1.0-dev.0.main.125`). To see the latest published version, see [this page](https://github.com/stackhpc/azimuth-llm/tree/gh-pages).
2020

2121
### Customisation
2222

@@ -39,8 +39,10 @@ The following is a non-exhaustive list of models which have been tested with thi
3939
- [Llama 2 7B chat](https://huggingface.co/meta-llama/Llama-2-7b-chat-hf)
4040
- [AWQ Quantized Llama 2 70B](https://huggingface.co/TheBloke/Llama-2-70B-Chat-AWQ)
4141
- [Magicoder 6.7B](https://huggingface.co/ise-uiuc/Magicoder-S-DS-6.7B)
42+
- [Mistral 7B Instruct v0.2](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2)
43+
<!-- - [AWQ Quantized Mixtral 8x7B Instruct v0.1](https://huggingface.co/TheBloke/Mixtral-8x7B-Instruct-v0.1-AWQ) (Not producing output properly) -->
4244

43-
Due to the combination of [components](##Components) used in this app, some Huggingface models may not work as expected (usually due to the way in which LangChain formats the prompt messages). Any errors when using new model will appear in the pod logs for either the web-app deployment the backend API deployment.
45+
Due to the combination of [components](##Components) used in this app, some HuggingFace models may not work as expected (usually due to the way in which LangChain formats the prompt messages). Any errors when using new model will appear in the pod logs for either the web-app deployment the backend API deployment.
4446

4547

4648
## Components

chart/templates/NOTES.txt

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,6 @@ On deployment of a new model, the app must first download the model's weights fr
66
This can take a significant amount of time depending on model choice and network speeds.
77
Download progress can be monitored by inspecting the logs for the LLM API pod(s) via the Kubernetes Dashboard for the target cluster.
88

9-
The app uses [vLLM](https://docs.vllm.ai/en/latest/) as a model serving backend and [gradio](https://github.com/gradio-app/gradio) + [LangChain](https://python.langchain.com/docs/get_started/introduction) to provide the web interface.
9+
The app uses [vLLM](https://docs.vllm.ai/en/latest/) as a model serving backend and [Gradio](https://github.com/gradio-app/gradio) + [LangChain](https://python.langchain.com/docs/get_started/introduction) to provide the web interface.
1010
The official list of HuggingFace models supported by vLLM can be found [here](https://docs.vllm.ai/en/latest/models/supported_models.html), though some of these may not be compatible with the LangChain prompt format.
1111
See [this documentation](https://github.com/stackhpc/azimuth-llm/) for a non-exhaustive list of languange models against which the app has been tested.

chart/web-app/app.py

Lines changed: 24 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,6 @@
11
import requests
22
import warnings
3+
import re
34
import rich
45
import gradio as gr
56
from urllib.parse import urljoin
@@ -17,6 +18,18 @@
1718
backend_health_endpoint = urljoin(backend_url, "/health")
1819
backend_initialised = False
1920

21+
# NOTE(sd109): The Mistral family of models explicitly require a chat
22+
# history of the form user -> ai -> user -> ... and so don't like having
23+
# a SystemPrompt at the beginning. Since these models seem to be the
24+
# best around right now, it makes sense to treat them as special and make
25+
# sure the web app works correctly with them. To do so, we detect when a
26+
# mistral model is specified using this regex and then handle it explicitly
27+
# when contructing the `context` list in the `inference` function below.
28+
MISTRAL_REGEX = re.compile(r".*mi(s|x)tral.*", re.IGNORECASE)
29+
IS_MISTRAL_MODEL = (MISTRAL_REGEX.match(settings.model_name) is not None)
30+
if IS_MISTRAL_MODEL:
31+
print("Detected Mistral model - will alter LangChain conversation format appropriately.")
32+
2033
llm = ChatOpenAI(
2134
base_url=urljoin(backend_url, "v1"),
2235
model = settings.model_name,
@@ -57,9 +70,17 @@ def inference(latest_message, history):
5770

5871

5972
try:
60-
context = [SystemMessage(content=settings.model_instruction)]
61-
for human, ai in history:
62-
context.append(HumanMessage(content=human))
73+
# To handle Mistral models we have to add the model instruction to
74+
# the first user message since Mistral requires user -> ai -> user
75+
# chat format and therefore doesn't allow system prompts.
76+
context = []
77+
if not IS_MISTRAL_MODEL:
78+
context.append(SystemMessage(content=settings.model_instruction))
79+
for i, (human, ai) in enumerate(history):
80+
if IS_MISTRAL_MODEL and i == 0:
81+
context.append(HumanMessage(content=f"{settings.model_instruction}\n\n{human}"))
82+
else:
83+
context.append(HumanMessage(content=human))
6384
context.append(AIMessage(content=ai))
6485
context.append(HumanMessage(content=latest_message))
6586

0 commit comments

Comments
 (0)