Update various pieces of documentation

sd109 · sd109 · commit 825c5abc08df · 2024-01-30T16:31:59.000Z
diff --git a/README.md b/README.md
@@ -25,12 +25,16 @@ The `chart/values.yaml` file documents the various customisation options which a
 api:
   service:
     type: LoadBalancer
+    zenith:
+      enabled: false
 ui:
   service:
     type: LoadBalancer
+    zenith:
+      enabled: false
 ```
 
-***Warning*** - Exposing the services in this way provides no authentication mechanism and anyone with access to the load balancer IPs will be able to query the language model. In the Azimuth deployment case, authentication is provided via the standard Azimuth identity provider mechanisms and the authenticated services are exposed via [Zenith](https://github.com/stackhpc/zenith).
+***Warning*** - Exposing the services in this way provides no authentication mechanism and anyone with access to the load balancer IPs will be able to query the language model. It is up to you to secure the running service in your own way. In contrast, when deploying via Azimuth, authentication is provided via the standard Azimuth Identity Provider mechanisms and the authenticated services are exposed via [Zenith](https://github.com/stackhpc/zenith).
 
 
 ## Tested Models
@@ -47,4 +51,9 @@ Due to the combination of [components](##Components) used in this app, some Hugg
 
 ## Components
 
-*TO-DO*
+The Helm chart consists of the following components:
+- A backend web API which runs [vLLM](https://github.com/vllm-project/vllm)'s [OpenAI compatible web server](https://docs.vllm.ai/en/latest/getting_started/quickstart.html#openai-compatible-server).
+
+- A frontend web-app built using [Gradio](https://www.gradio.app) and [LangChain](https://www.langchain.com). The web app source code can be found in `chart/web-app` and gets written to a ConfigMap during the chart build and is then mounted into the UI pod and executed as the entry point for the UI docker image (built from `images/ui-base/Dockerfile`).
+
+- A [stakater/Reloader](https://github.com/stakater/Reloader) instance which monitors the web-app ConfigMap for changes and restarts the frontend when the app code changes (i.e. whenever the Helm values are updated).
diff --git a/chart/values.yaml b/chart/values.yaml
@@ -18,7 +18,7 @@ huggingface:
   # HUGGING_FACE_HUB_TOKEN=<token-value>
   secretName:
   # OR FOR TESTING PURPOSES ONLY, you can instead provide the secret directly
-  # as a chart value here (if secretName is set about then it will take priority)
+  # as a chart value here (if secretName is set above then it will take priority)
   token: ""
 
 # Configuration for the backend model serving API
@@ -45,18 +45,23 @@ api:
       path: /tmp/llm/huggingface-cache
   # Number of gpus to requests for each api pod instance
   # NOTE: This must be in the range 1 <= value <= N, where
-  #       'N' is the number of GPUs available in a single 
-  #       worker node on the target Kubernetes cluster.
+  # 'N' is the number of GPUs available in a single 
+  # worker node on the target Kubernetes cluster.
+  # NOTE: According to the vLLM docs found here
+  # https://docs.vllm.ai/en/latest/serving/distributed_serving.html
+  # distributed / multi-GPU support should be available, though it 
+  # has not been tested against this app.
   gpus: 1
   # The update strategy to use for the deployment
   # See https://kubernetes.io/docs/concepts/workloads/controllers/deployment/#updating-a-deployment
   # NOTE: Changing this has implications for the number of additional GPU worker nodes required
-  #       to preform a rolling zero-downtime update
+  # to preform a rolling zero-downtime update
   updateStrategy:
     rollingUpdate:
       maxSurge: 0%
       maxUnavailable: 100%
-  # Extra args to supply to the vLLM backend
+  # Extra args to supply to the vLLM backend, see
+  # https://github.com/vllm-project/vllm/blob/main/vllm/entrypoints/openai/api_server.py
   extraArgs: []
   
 # Configuration for the frontend web interface
diff --git a/chart/web-app/example-settings.yml b/chart/web-app/example-settings.yml
@@ -1,29 +1,16 @@
-prompt_template: |
-  [INST] <<SYS>>
-  You are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe. Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature. If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information.
-  <</SYS>>
-  {context}[/INST]
-llm_params:
-  temperature: 0.7
-
-#####
-# Alternative prompt suggestions:
-#####
-
-
-### - Suggested for Magicode model
-
-# You are an exceptionally intelligent coding assistant that consistently delivers accurate and reliable responses to user instructions.
-
-# @@ Instruction
-# {prompt}
-
-# @@ Response
-
-
-### - For some fun responses...
-
-# [INST] <<SYS>>
-# You are a cheeky, disrespectful and comedic assistant. Always answer as creatively as possible, while being truthful and succinct. Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature. If a question does not make any sense, or is not factually coherent, tell the user that they are being stupid. If you don't know the answer to a question, please don't share false information.
-# <</SYS>>
-# [/INST]
+backend_url: http://128.232.226.230
+model_name: tiiuae/falcon-7b
+
+model_instruction: You are a helpful and cheerful AI assistant. Please respond appropriately.
+
+# UI theming tweaks
+# theme_title_colour: white
+# theme_background_colour: "#00376c"
+# theme_params:
+#   primary_hue: blue
+
+# llm_max_tokens:
+# llm_temperature:
+# llm_top_p: 
+# llm_frequency_penalty:
+# llm_presence_penalty: