Skip to content

Commit 846e74e

Browse files
authored
[add] secrets to configure HF_TOKEN and [update] tutorial (#22)
Signed-off-by: ApostaC <jc4xvyp@outlook.com>
1 parent 8781b28 commit 846e74e

14 files changed

+45
-38
lines changed

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -93,3 +93,4 @@ perf-test.py
9393
/try
9494

9595
values-*.yaml
96+
helm/examples

helm/Chart.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@ type: application
1515
# This is the chart version. This version number should be incremented each time you make changes
1616
# to the chart and its templates, including the app version.
1717
# Versions are expected to follow Semantic Versioning (https://semver.org/)
18-
version: 0.0.1
18+
version: 0.0.2
1919

2020
maintainers:
2121
- name: apostac

helm/templates/deployment-vllm-multi.yaml

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -65,6 +65,13 @@ spec:
6565
env:
6666
- name: HF_HOME
6767
value: /data
68+
{{- if $modelSpec.hf_token }}
69+
- name: HF_TOKEN
70+
valueFrom:
71+
secretKeyRef:
72+
name: {{ .Release.Name }}-secrets
73+
key: hf_token_{{ $modelSpec.name }}
74+
{{- end }}
6875
{{- with $modelSpec.env }}
6976
{{- toYaml . | nindent 10 }}
7077
{{- end }}

helm/templates/secrets.yaml

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
apiVersion: v1
2+
kind: Secret
3+
metadata:
4+
name: "{{ .Release.Name }}-secrets"
5+
namespace: {{ .Release.Namespace }}
6+
type: Opaque
7+
data:
8+
{{- range $modelSpec := .Values.servingEngineSpec.modelSpec }}
9+
{{- with $ -}}
10+
{{- if $modelSpec.hf_token }}
11+
hf_token_{{ $modelSpec.name }}: {{ $modelSpec.hf_token | b64enc | quote }}
12+
{{- end }}
13+
{{- end }}
14+
{{- end }}

helm/values.schema.json

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -53,6 +53,9 @@
5353
},
5454
"required": ["enabled", "cpuOffloadingBufferSize"]
5555
},
56+
"hf_token": {
57+
"type": "string"
58+
},
5659
"env": {
5760
"type": "array",
5861
"items": {

helm/values.yaml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -35,6 +35,8 @@ servingEngineSpec:
3535
# - enabled: (optional, bool) Enable LMCache, e.g., true
3636
# - cpuOffloadingBufferSize: (optional, string) The CPU offloading buffer size, e.g., "30"
3737
#
38+
# - hf_token: (optional, string) the Huggingface tokens for this model
39+
#
3840
# - env: (optional, list) The environment variables to set in the container, e.g., your HF_TOKEN
3941
#
4042
# - nodeSelectorTerms: (optional, list) The node selector terms to match the nodes

tutorials/02-basic-vllm-config.md

Lines changed: 5 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -12,13 +12,13 @@ This tutorial guides you through the basic configurations required to deploy a v
1212
## Prerequisites
1313
- A Kubernetes environment with GPU support, as set up in the [00-install-kubernetes-env tutorial](00-install-kubernetes-env.md).
1414
- Helm installed on your system.
15-
- Access to a Hugging Face token (`HF_TOKEN`).
15+
- Access to a HuggingFace token (`HF_TOKEN`).
1616

1717
## Step 1: Preparing the Configuration File
1818

1919
1. Locate the example configuration file `tutorials/assets/values-02-basic-config.yaml`.
2020
2. Open the file and update the following fields:
21-
- Replace `<USERS SHOULD PUT THEIR HF_TOKEN HERE>` with your actual Hugging Face token.
21+
- Write your actual huggingface token in `hf_token: <YOUR HF TOKEN>` in the yaml file.
2222

2323
### Explanation of Key Items in `values-02-basic-config.yaml`
2424

@@ -37,7 +37,8 @@ This tutorial guides you through the basic configurations required to deploy a v
3737
- `maxModelLen`: The maximum sequence length the model can handle.
3838
- `dtype`: Data type for computations, e.g., `bfloat16` for faster performance on modern GPUs.
3939
- `extraArgs`: Additional arguments passed to the vLLM engine for fine-tuning behavior.
40-
- **`env`**: Environment variables such as `HF_TOKEN` for authentication with Hugging Face.
40+
- **`hf_token`**: The Hugging Face token for authenticating with the Hugging Face model hub.
41+
- **`env`**: Extra environment variables to pass to the model-serving engine.
4142

4243
### Example Snippet
4344
```yaml
@@ -62,10 +63,7 @@ servingEngineSpec:
6263
dtype: "bfloat16"
6364
extraArgs: ["--disable-log-requests", "--gpu-memory-utilization", "0.8"]
6465

65-
env:
66-
- name: HF_TOKEN
67-
value: <YOUR_HF_TOKEN>
68-
66+
hf_token: <YOUR HF TOKEN>
6967
```
7068
7169
## Step 2: Applying the Configuration

tutorials/03-load-model-from-pv.md

Lines changed: 2 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -84,14 +84,12 @@ servingEngineSpec:
8484
vllmConfig:
8585
maxModelLen: 4096
8686
87-
env:
88-
- name: HF_TOKEN
89-
value: <YOUR_HF_TOKEN>
87+
hf_token: <YOUR HF TOKEN>
9088
```
9189

9290
> **Explanation:** The `pvcMatchLabels` field specifies the labels to match an existing Persistent Volume. In this example, it ensures that the deployment uses the PV with the label `model: "llama3-pv"`. This provides a way to link a specific PV to your application.
9391

94-
> **Note:** Make sure to replace `<YOUR_HF_TOKEN>` with your actual Hugging Face token in the `env` section.
92+
> **Note:** Make sure to replace `<YOUR_HF_TOKEN>` with your actual Hugging Face token in the yaml.
9593

9694
2. Deploy the Helm chart:
9795

tutorials/04-launch-multiple-model.md

Lines changed: 3 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -36,9 +36,7 @@ servingEngineSpec:
3636
pvcStorage: "50Gi"
3737
vllmConfig:
3838
maxModelLen: 4096
39-
env:
40-
- name: HF_TOKEN
41-
value: <YOUR_HF_TOKEN_FOR_LLAMA3.1>
39+
hf_token: <YOUR HF TOKEN FOR LLAMA 3.1>
4240

4341
- name: "mistral"
4442
repository: "vllm/vllm-openai"
@@ -51,12 +49,10 @@ servingEngineSpec:
5149
pvcStorage: "50Gi"
5250
vllmConfig:
5351
maxModelLen: 4096
54-
env:
55-
- name: HF_TOKEN
56-
value: <YOUR_HF_TOKEN_FOR_MISTRAL>
52+
hf_token: <YOUR HF TOKEN FOR MISTRAL>
5753
```
5854
59-
> **Note:** Replace `<YOUR_HF_TOKEN_FOR_LLAMA3.1>` and `<YOUR_HF_TOKEN_FOR_MISTRAL>` with your Hugging Face tokens.
55+
> **Note:** Replace `<YOUR HF TOKEN FOR LLAMA 3.1>` and `<YOUR HF TOKEN FOR MISTRAL>` with your Hugging Face tokens.
6056

6157

6258
## Step 2: Deploying the Helm Chart

tutorials/05-offload-kv-cache.md

Lines changed: 2 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -44,12 +44,10 @@ servingEngineSpec:
4444
enabled: true
4545
cpuOffloadingBufferSize: "20"
4646

47-
env:
48-
- name: HF_TOKEN
49-
value: <YOUR_HF_TOKEN_HERE>
47+
hf_token: <YOUR HF TOKEN>
5048
```
5149
52-
> **Note:** Replace `<YOUR_HF_TOKEN_HERE>` with your actual Hugging Face token.
50+
> **Note:** Replace `<YOUR HF TOKEN>` with your actual Hugging Face token.
5351

5452
The `lmcacheConfig` field enables LMCache and sets the CPU offloading buffer size to `20`GB. You can adjust this value based on your workload.
5553

0 commit comments

Comments
 (0)