Skip to content

Commit 825c5ab

Browse files
author
sd109
committed
Update various pieces of documentation
1 parent aabc175 commit 825c5ab

File tree

3 files changed

+37
-36
lines changed

3 files changed

+37
-36
lines changed

README.md

Lines changed: 11 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -25,12 +25,16 @@ The `chart/values.yaml` file documents the various customisation options which a
2525
api:
2626
service:
2727
type: LoadBalancer
28+
zenith:
29+
enabled: false
2830
ui:
2931
service:
3032
type: LoadBalancer
33+
zenith:
34+
enabled: false
3135
```
3236

33-
***Warning*** - Exposing the services in this way provides no authentication mechanism and anyone with access to the load balancer IPs will be able to query the language model. In the Azimuth deployment case, authentication is provided via the standard Azimuth identity provider mechanisms and the authenticated services are exposed via [Zenith](https://github.com/stackhpc/zenith).
37+
***Warning*** - Exposing the services in this way provides no authentication mechanism and anyone with access to the load balancer IPs will be able to query the language model. It is up to you to secure the running service in your own way. In contrast, when deploying via Azimuth, authentication is provided via the standard Azimuth Identity Provider mechanisms and the authenticated services are exposed via [Zenith](https://github.com/stackhpc/zenith).
3438

3539

3640
## Tested Models
@@ -47,4 +51,9 @@ Due to the combination of [components](##Components) used in this app, some Hugg
4751

4852
## Components
4953

50-
*TO-DO*
54+
The Helm chart consists of the following components:
55+
- A backend web API which runs [vLLM](https://github.com/vllm-project/vllm)'s [OpenAI compatible web server](https://docs.vllm.ai/en/latest/getting_started/quickstart.html#openai-compatible-server).
56+
57+
- A frontend web-app built using [Gradio](https://www.gradio.app) and [LangChain](https://www.langchain.com). The web app source code can be found in `chart/web-app` and gets written to a ConfigMap during the chart build and is then mounted into the UI pod and executed as the entry point for the UI docker image (built from `images/ui-base/Dockerfile`).
58+
59+
- A [stakater/Reloader](https://github.com/stakater/Reloader) instance which monitors the web-app ConfigMap for changes and restarts the frontend when the app code changes (i.e. whenever the Helm values are updated).

chart/values.yaml

Lines changed: 10 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@ huggingface:
1818
# HUGGING_FACE_HUB_TOKEN=<token-value>
1919
secretName:
2020
# OR FOR TESTING PURPOSES ONLY, you can instead provide the secret directly
21-
# as a chart value here (if secretName is set about then it will take priority)
21+
# as a chart value here (if secretName is set above then it will take priority)
2222
token: ""
2323

2424
# Configuration for the backend model serving API
@@ -45,18 +45,23 @@ api:
4545
path: /tmp/llm/huggingface-cache
4646
# Number of gpus to requests for each api pod instance
4747
# NOTE: This must be in the range 1 <= value <= N, where
48-
# 'N' is the number of GPUs available in a single
49-
# worker node on the target Kubernetes cluster.
48+
# 'N' is the number of GPUs available in a single
49+
# worker node on the target Kubernetes cluster.
50+
# NOTE: According to the vLLM docs found here
51+
# https://docs.vllm.ai/en/latest/serving/distributed_serving.html
52+
# distributed / multi-GPU support should be available, though it
53+
# has not been tested against this app.
5054
gpus: 1
5155
# The update strategy to use for the deployment
5256
# See https://kubernetes.io/docs/concepts/workloads/controllers/deployment/#updating-a-deployment
5357
# NOTE: Changing this has implications for the number of additional GPU worker nodes required
54-
# to preform a rolling zero-downtime update
58+
# to preform a rolling zero-downtime update
5559
updateStrategy:
5660
rollingUpdate:
5761
maxSurge: 0%
5862
maxUnavailable: 100%
59-
# Extra args to supply to the vLLM backend
63+
# Extra args to supply to the vLLM backend, see
64+
# https://github.com/vllm-project/vllm/blob/main/vllm/entrypoints/openai/api_server.py
6065
extraArgs: []
6166

6267
# Configuration for the frontend web interface

chart/web-app/example-settings.yml

Lines changed: 16 additions & 29 deletions
Original file line numberDiff line numberDiff line change
@@ -1,29 +1,16 @@
1-
prompt_template: |
2-
[INST] <<SYS>>
3-
You are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe. Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature. If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information.
4-
<</SYS>>
5-
{context}[/INST]
6-
llm_params:
7-
temperature: 0.7
8-
9-
#####
10-
# Alternative prompt suggestions:
11-
#####
12-
13-
14-
### - Suggested for Magicode model
15-
16-
# You are an exceptionally intelligent coding assistant that consistently delivers accurate and reliable responses to user instructions.
17-
18-
# @@ Instruction
19-
# {prompt}
20-
21-
# @@ Response
22-
23-
24-
### - For some fun responses...
25-
26-
# [INST] <<SYS>>
27-
# You are a cheeky, disrespectful and comedic assistant. Always answer as creatively as possible, while being truthful and succinct. Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature. If a question does not make any sense, or is not factually coherent, tell the user that they are being stupid. If you don't know the answer to a question, please don't share false information.
28-
# <</SYS>>
29-
# [/INST]
1+
backend_url: http://128.232.226.230
2+
model_name: tiiuae/falcon-7b
3+
4+
model_instruction: You are a helpful and cheerful AI assistant. Please respond appropriately.
5+
6+
# UI theming tweaks
7+
# theme_title_colour: white
8+
# theme_background_colour: "#00376c"
9+
# theme_params:
10+
# primary_hue: blue
11+
12+
# llm_max_tokens:
13+
# llm_temperature:
14+
# llm_top_p:
15+
# llm_frequency_penalty:
16+
# llm_presence_penalty:

0 commit comments

Comments
 (0)