Skip to content

Commit 5920cfa

Browse files
authored
Merge pull request #51 from stackhpc/feat/image-models
Add image-analysis UI and refactor into multiple Helm charts
2 parents 0f07c01 + 717e582 commit 5920cfa

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

55 files changed

+847
-408
lines changed

.github/workflows/build-push-artifacts.yml

Lines changed: 15 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -28,18 +28,24 @@ jobs:
2828
images:
2929
- 'web-apps/**'
3030
chart:
31-
- 'chart/**'
31+
- 'charts/**'
3232
3333
# Job to build container images
3434
build_push_images:
3535
name: Build and push images
3636
runs-on: ubuntu-latest
37+
permissions:
38+
contents: read
39+
id-token: write # needed for signing the images with GitHub OIDC Token
40+
packages: write # required for pushing container images
41+
security-events: write # required for pushing SARIF files
3742
needs: changes
38-
if: ${{ needs.changes.outputs.images == 'true' || github.ref_type == 'tag' }}
43+
if: ${{ github.ref_type == 'tag' || needs.changes.outputs.images == 'true' }}
3944
strategy:
4045
matrix:
4146
include:
42-
- component: chat-interface
47+
- component: chat
48+
- component: image-analysis
4349
steps:
4450
- name: Check out the repository
4551
uses: actions/checkout@v4
@@ -55,18 +61,19 @@ jobs:
5561
id: image-meta
5662
uses: docker/metadata-action@v5
5763
with:
58-
images: ghcr.io/stackhpc/azimuth-llm-${{ matrix.component }}
64+
images: ghcr.io/stackhpc/azimuth-llm-${{ matrix.component }}-ui
5965
# Produce the branch name or tag and the SHA as tags
6066
tags: |
6167
type=ref,event=branch
6268
type=ref,event=tag
6369
type=sha,prefix=
6470
6571
- name: Build and push image
66-
uses: azimuth-cloud/github-actions/docker-multiarch-build-push@update-trivy-action
72+
uses: azimuth-cloud/github-actions/docker-multiarch-build-push@master
6773
with:
6874
cache-key: ${{ matrix.component }}
69-
context: ./web-apps/${{ matrix.component }}
75+
context: ./web-apps/
76+
file: ./web-apps/${{ matrix.component }}/Dockerfile
7077
platforms: linux/amd64,linux/arm64
7178
push: true
7279
tags: ${{ steps.image-meta.outputs.tags }}
@@ -78,7 +85,7 @@ jobs:
7885
runs-on: ubuntu-latest
7986
# Only build and push the chart if chart files have changed
8087
needs: [changes]
81-
if: ${{ needs.changes.outputs.chart == 'true' || github.ref_type == 'tag' }}
88+
if: ${{ github.ref_type == 'tag' || needs.changes.outputs.chart == 'true' }}
8289
steps:
8390
- name: Check out the repository
8491
uses: actions/checkout@v4
@@ -94,6 +101,7 @@ jobs:
94101
- name: Publish Helm charts
95102
uses: azimuth-cloud/github-actions/helm-publish@master
96103
with:
104+
directory: charts
97105
token: ${{ secrets.GITHUB_TOKEN }}
98106
version: ${{ steps.semver.outputs.version }}
99107
app-version: ${{ steps.semver.outputs.short-sha }}

.github/workflows/test-pr.yml

Lines changed: 0 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -28,10 +28,6 @@ jobs:
2828
- name: Run chart linting
2929
run: ct lint --config ct.yaml
3030

31-
- name: Run helm template with default values
32-
run: helm template ci-test .
33-
working-directory: chart
34-
3531
- name: Create Kind Cluster
3632
uses: helm/kind-action@v1
3733
with:

.gitignore

Lines changed: 8 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -11,5 +11,11 @@ test-values.y[a]ml
1111
**venv*/
1212

1313
# Helm chart stuff
14-
chart/Chart.lock
15-
chart/charts
14+
charts/*/Chart.lock
15+
charts/*/charts
16+
17+
# Python stuff
18+
**/build/
19+
**/*.egg-info/
20+
**/flagged/
21+
web-apps/**/overrides.yml

README.md

Lines changed: 18 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -34,38 +34,36 @@ ui:
3434
enabled: false
3535
```
3636

37-
***Warning*** - Exposing the services in this way provides no authentication mechanism and anyone with access to the load balancer IPs will be able to query the language model. It is up to you to secure the running service in your own way. In contrast, when deploying via Azimuth, authentication is provided via the standard Azimuth Identity Provider mechanisms and the authenticated services are exposed via [Zenith](https://github.com/stackhpc/zenith).
37+
[!WARNING] Exposing the services in this way provides no authentication mechanism and anyone with access to the load balancer IPs will be able to query the language model. It is up to you to secure the running service as appropriate for your use case. In contrast, when deployed via Azimuth, authentication is provided via the standard Azimuth Identity Provider mechanisms and the authenticated services are exposed via [Zenith](https://github.com/stackhpc/zenith).
3838

39-
The UI can also optionally be exposed using a Kubernetes Ingress resource. See the `ui.ingress` section in `values.yml` for available config options.
39+
The both the web-based interface and the backend OpenAI-compatible vLLM API server can also optionally be exposed using [Kubernetes Ingress](https://kubernetes.io/docs/concepts/services-networking/ingress/). See the `ingress` section in `values.yml` for available config options.
4040

4141
## Tested Models
4242

43-
The following is a non-exhaustive list of models which have been tested with this app:
44-
- [Llama 2 7B chat](https://huggingface.co/meta-llama/Llama-2-7b-chat-hf)
45-
- [AWQ Quantized Llama 2 70B](https://huggingface.co/TheBloke/Llama-2-70B-Chat-AWQ)
46-
- [Magicoder 6.7B](https://huggingface.co/ise-uiuc/Magicoder-S-DS-6.7B)
47-
- [Mistral 7B Instruct v0.2](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2)
48-
- [WizardCoder Python 34B](https://huggingface.co/WizardLM/WizardCoder-Python-34B-V1.0)
49-
- [AWQ Quantized Mixtral 8x7B Instruct v0.1](https://huggingface.co/TheBloke/Mixtral-8x7B-Instruct-v0.1-AWQ)
43+
The application uses [vLLM](https://docs.vllm.ai/en/latest/index.html) for model serving, therefore any of the vLLM [supported models](https://docs.vllm.ai/en/latest/models/supported_models.html) should work. Since vLLM pulls the model files directly from [HuggingFace](https://huggingface.co/models) it is likely that some other models will also be compatible with vLLM but mileage may vary between models and model architectures. If a model is incompatible with vLLM then the API pod will likely enter a `CrashLoopBackoff` state and any relevant error information will be found in the API pod logs. These logs can be viewed with
5044

51-
Due to the combination of [components](##Components) used in this app, some HuggingFace models may not work as expected (usually due to the way in which LangChain formats the prompt messages). Any errors when using a new model will appear in the logs for either the web-app pod or the backend API pod. Please open an issue if you would like explicit support for a specific model that is not in the above list.
45+
```
46+
kubectl (-n <helm-release-namespace>) logs deploy/<helm-release-name>-api
47+
```
48+
49+
If you suspect that a given error is not caused by the upstream vLLM support and a problem with this Helm chart then please [open an issue](https://github.com/stackhpc/azimuth-llm/issues).
5250

5351
## Monitoring
5452

55-
The LLM chart integrates with [kube-prometheus-stack](https://artifacthub.io/packages/helm/prometheus-community/kube-prometheus-stack) by creating a `ServiceMonitor` resource and installing a custom Grafana dashboard as a Kubernetes `ConfigMap`. If the target cluster has an existing `kube-prometheus-stack` deployment which is appropriately configured to watch all namespaces for new Grafana dashboards, the custom LLM dashboard provided here will automatically picked up by Grafana. It will appear in the Grafana dashboard list with the name 'LLM dashboard'.
53+
The LLM chart integrates with [kube-prometheus-stack](https://artifacthub.io/packages/helm/prometheus-community/kube-prometheus-stack) by creating a `ServiceMonitor` resource and installing two custom Grafana dashboard as Kubernetes `ConfigMap`s. If the target cluster has an existing `kube-prometheus-stack` deployment which is appropriately configured to watch all namespaces for new Grafana dashboards, the LLM dashboards will automatically appear in Grafana's dashboard list.
5654

5755
To disable the monitoring integrations, set the `api.monitoring.enabled` value to `false`.
5856

5957
## Components
6058

6159
The Helm chart consists of the following components:
62-
- A backend web API which runs [vLLM](https://github.com/vllm-project/vllm)'s [OpenAI compatible web server](https://docs.vllm.ai/en/latest/getting_started/quickstart.html#openai-compatible-server).
60+
- A backend web API which runs [vLLM](https://github.com/vllm-project/vllm)'s [OpenAI compatible web server](https://docs.vllm.ai/en/stable/getting_started/quickstart.html#openai-compatible-server).
6361

64-
- A frontend web-app built using [Gradio](https://www.gradio.app) and [LangChain](https://www.langchain.com). The web app source code can be found in `chart/web-app` and gets written to a ConfigMap during the chart build and is then mounted into the UI pod and executed as the entry point for the UI docker image (built from `images/ui-base/Dockerfile`).
62+
- A choice of frontend web-apps built using [Gradio](https://www.gradio.app) (see [web-apps](./web-apps/)). Each web interface is available as a pre-built container image [hosted on ghcr.io](https://github.com/orgs/stackhpc/packages?repo_name=azimuth-llm) and be configured for each Helm release by changing the `ui.image` section of the chart values.
6563

66-
- A [stakater/Reloader](https://github.com/stakater/Reloader) instance which monitors the web-app ConfigMap for changes and restarts the frontend when the app code changes (i.e. whenever the Helm values are updated).
64+
<!-- ## Development
6765
68-
## Development
66+
TODO: Update this
6967
7068
The GitHub repository includes a [tilt](https://tilt.dev) file for easier development. After installing tilt locally, simply run `tilt up` from the repo root to get started with development. This will trigger the following:
7169
@@ -77,4 +75,8 @@ The GitHub repository includes a [tilt](https://tilt.dev) file for easier develo
7775
7876
- Launch the frontend web app locally on `127.0.0.1:7860`, configured to use `localhost:8080` as the backend API
7977
80-
- Watch all components and only reload the minimal set of components needed when a file in the repo changes (e.g. modifying `chart/web-app/app.py` will restart the local web app instance only)
78+
- Watch all components and only reload the minimal set of components needed when a file in the repo changes (e.g. modifying `chart/web-app/app.py` will restart the local web app instance only) -->
79+
80+
<!-- ## Adding a new web interface
81+
82+
TODO: Write these docs... -->

chart/azimuth-ui.schema.yaml

Lines changed: 0 additions & 34 deletions
This file was deleted.

chart/values.schema.json

Lines changed: 0 additions & 124 deletions
This file was deleted.

charts/azimuth-chat/Chart.yaml

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,22 @@
1+
apiVersion: v2
2+
name: azimuth-llm-chat
3+
description: HuggingFace vision model serving along with a simple web interface.
4+
maintainers:
5+
- name: "Scott Davidson"
6+
7+
8+
type: application
9+
10+
version: 0.1.0
11+
12+
appVersion: "0.1.0"
13+
14+
icon: https://huggingface.co/datasets/huggingface/brand-assets/resolve/main/hf-logo.svg
15+
16+
annotations:
17+
azimuth.stackhpc.com/label: HuggingFace Image Analysis
18+
19+
dependencies:
20+
- name: azimuth-llm
21+
version: ">=0-0"
22+
repository: "file://../azimuth-llm/"
Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,33 @@
1+
controls:
2+
/azimuth-llm/huggingface/model:
3+
type: TextControl
4+
required: true
5+
/azimuth-llm/huggingface/token:
6+
type: TextControl
7+
secret: true
8+
# Use mirror to mimic yaml anchor in base Helm chart
9+
/azimuth-llm/ui/appSettings/model_name:
10+
type: MirrorControl
11+
path: /azimuth-llm/huggingface/model
12+
visuallyHidden: true
13+
# Azimuth UI doesn't handle json type ["integer","null"]
14+
# properly so we allow any type in JSON schema then
15+
# constrain to (optional) integer here.
16+
/azimuth-llm/api/modelMaxContextLength:
17+
type: IntegerControl
18+
minimum: 100
19+
required: false
20+
21+
sortOrder:
22+
- /azimuth-llm/huggingface/model
23+
- /azimuth-llm/huggingface/token
24+
- /azimuth-llm/ui/appSettings/model_instruction
25+
- /azimuth-llm/ui/appSettings/page_title
26+
- /azimuth-llm/api/image/version
27+
- /azimuth-llm/ui/appSettings/llm_params/temperature
28+
- /azimuth-llm/ui/appSettings/llm_params/max_tokens
29+
- /azimuth-llm/ui/appSettings/llm_params/frequency_penalty
30+
- /azimuth-llm/ui/appSettings/llm_params/presence_penalty
31+
- /azimuth-llm/ui/appSettings/llm_params/top_p
32+
- /azimuth-llm/ui/appSettings/llm_params/top_k
33+
- /azimuth-llm/api/modelMaxContextLength
Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
azimuth-llm:
2+
api:
3+
enabled: false
4+
ui:
5+
service:
6+
zenith:
7+
enabled: false
8+
appSettings:
9+
# Verify that we can set non-standard LLM params
10+
llm_params:
11+
max_tokens: 101
12+
temperature: 0.1
13+
top_p: 0.15
14+
top_k: 1
15+
presence_penalty: 0.9
16+
frequency_penalty: 1

0 commit comments

Comments
 (0)