Skip to content

Commit ddc0195

Browse files
committed
nerwork config
Signed-off-by: JaredforReal <[email protected]>
1 parent 113c00f commit ddc0195

File tree

6 files changed

+565
-48
lines changed

6 files changed

+565
-48
lines changed

deploy/kubernetes/kustomization.yaml

Lines changed: 11 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -5,21 +5,22 @@ metadata:
55
name: semantic-router
66

77
resources:
8-
- namespace.yaml
9-
- pvc.yaml
10-
- deployment.yaml
11-
- service.yaml
8+
- namespace.yaml
9+
- pvc.yaml
10+
- deployment.yaml
11+
- service.yaml
1212

1313
# Generate ConfigMap
1414
configMapGenerator:
15-
- name: semantic-router-config
16-
files:
17-
- config.yaml
18-
- tools_db.json
15+
- name: semantic-router-config
16+
files:
17+
- config.yaml
18+
- tools_db.json
1919

2020
# Namespace for all resources
2121
namespace: vllm-semantic-router-system
2222

2323
images:
24-
- name: ghcr.io/vllm-project/semantic-router/extproc
25-
newTag: latest
24+
- name: ghcr.io/vllm-project/semantic-router/extproc
25+
newName: semantic-router-extproc
26+
newTag: local
Lines changed: 235 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,235 @@
1+
# Kind 集群镜像拉取问题解决方案(中国大陆)
2+
3+
## 问题
4+
5+
在中国大陆使用 Kind 部署时,即使本地有 VPN,Kind 集群内的容器运行时也无法直接拉取 GitHub Container Registry (ghcr.io) 的镜像。
6+
7+
## 解决方案
8+
9+
### 方案 1:本地构建并加载镜像(推荐)⭐
10+
11+
这是最可靠的方法,利用你本地的 VPN 连接构建镜像,然后加载到 Kind 集群。
12+
13+
#### 步骤:
14+
15+
1. **确保在 vllm 环境中运行**
16+
17+
```bash
18+
conda activate vllm # 如果还没激活的话
19+
cd /home/jared/vllm-project/semantic-router
20+
```
21+
22+
2. **构建镜像(使用本地 VPN)**
23+
24+
```bash
25+
docker build -t semantic-router-extproc:local -f Dockerfile.extproc .
26+
```
27+
28+
3. **加载镜像到 Kind 集群**
29+
30+
```bash
31+
kind load docker-image semantic-router-extproc:local --name semantic-router-cluster
32+
```
33+
34+
4. **更新 Kubernetes 配置使用本地镜像**
35+
36+
编辑 `deploy/kubernetes/kustomization.yaml`,修改 images 部分:
37+
38+
```yaml
39+
images:
40+
- name: ghcr.io/vllm-project/semantic-router/extproc
41+
newName: semantic-router-extproc
42+
newTag: local
43+
```
44+
45+
5. **重新部署**:
46+
47+
```bash
48+
# 删除旧的部署
49+
kubectl delete deployment semantic-router -n vllm-semantic-router-system
50+
51+
# 重新应用配置
52+
kubectl apply -k deploy/kubernetes/
53+
54+
# 监控部署状态
55+
kubectl get pods -n vllm-semantic-router-system -w
56+
```
57+
58+
---
59+
60+
### 方案 2:使用自动化脚本
61+
62+
我已经创建了自动化脚本来帮你完成上述步骤:
63+
64+
```bash
65+
conda activate vllm
66+
cd /home/jared/vllm-project/semantic-router
67+
./tools/kind/build-and-load-image.sh
68+
```
69+
70+
**注意**:需要修复脚本中的集群名称检测问题。如果脚本失败,请手动执行方案 1 的步骤。
71+
72+
---
73+
74+
### 方案 3:配置 Kind 节点使用代理
75+
76+
这个方法让 Kind 集群节点能够使用你的代理服务器。
77+
78+
#### 步骤:
79+
80+
1. **获取主机 IP**
81+
82+
```bash
83+
HOST_IP=$(hostname -I | awk '{print $1}')
84+
echo "Host IP: $HOST_IP"
85+
```
86+
87+
2. **配置 Kind 节点代理**
88+
89+
对于每个节点(control-plane 和 worker),执行:
90+
91+
```bash
92+
# Control plane
93+
docker exec semantic-router-cluster-control-plane bash -c "mkdir -p /etc/systemd/system/docker.service.d"
94+
docker exec semantic-router-cluster-control-plane bash -c "cat > /etc/systemd/system/docker.service.d/http-proxy.conf << 'EOF'
95+
[Service]
96+
Environment=\"HTTP_PROXY=http://${HOST_IP}:7897\"
97+
Environment=\"HTTPS_PROXY=http://${HOST_IP}:7897\"
98+
Environment=\"NO_PROXY=localhost,127.0.0.1,10.0.0.0/8,172.16.0.0/12,192.168.0.0/16,.svc,.svc.cluster.local\"
99+
EOF"
100+
101+
# Worker
102+
docker exec semantic-router-cluster-worker bash -c "mkdir -p /etc/systemd/system/docker.service.d"
103+
docker exec semantic-router-cluster-worker bash -c "cat > /etc/systemd/system/docker.service.d/http-proxy.conf << 'EOF'
104+
[Service]
105+
Environment=\"HTTP_PROXY=http://${HOST_IP}:7897\"
106+
Environment=\"HTTPS_PROXY=http://${HOST_IP}:7897\"
107+
Environment=\"NO_PROXY=localhost,127.0.0.1,10.0.0.0/8,172.16.0.0/12,192.168.0.0/16,.svc,.svc.cluster.local\"
108+
EOF"
109+
110+
# 重启 containerd
111+
docker exec semantic-router-cluster-control-plane systemctl daemon-reload
112+
docker exec semantic-router-cluster-control-plane systemctl restart containerd
113+
docker exec semantic-router-cluster-worker systemctl daemon-reload
114+
docker exec semantic-router-cluster-worker systemctl restart containerd
115+
```
116+
117+
3. **确保代理可以从容器访问**
118+
119+
需要确保你的代理(localhost:7897)可以从 Docker 容器访问。可能需要修改代理设置允许来自 Docker 网络的连接。
120+
121+
4. **重启部署**
122+
123+
```bash
124+
kubectl rollout restart deployment/semantic-router -n vllm-semantic-router-system
125+
kubectl get pods -n vllm-semantic-router-system -w
126+
```
127+
128+
---
129+
130+
### 方案 4:使用国内镜像源
131+
132+
如果镜像已经推送到国内的镜像仓库(如阿里云、腾讯云),可以修改配置使用这些镜像源。
133+
134+
编辑 `deploy/kubernetes/kustomization.yaml`
135+
136+
```yaml
137+
images:
138+
- name: ghcr.io/vllm-project/semantic-router/extproc
139+
newName: registry.cn-hangzhou.aliyuncs.com/your-namespace/semantic-router-extproc
140+
newTag: latest
141+
```
142+
143+
---
144+
145+
## 推荐执行流程
146+
147+
**最简单可靠的方式是方案 1(本地构建)**:
148+
149+
```bash
150+
# 1. 切换环境
151+
conda activate vllm
152+
153+
# 2. 进入项目目录
154+
cd /home/jared/vllm-project/semantic-router
155+
156+
# 3. 构建镜像(会使用你的 VPN)
157+
docker build -t semantic-router-extproc:local -f Dockerfile.extproc .
158+
159+
# 4. 加载到 Kind
160+
kind load docker-image semantic-router-extproc:local --name semantic-router-cluster
161+
162+
# 5. 更新配置文件
163+
# 编辑 deploy/kubernetes/kustomization.yaml,将镜像改为:
164+
# newName: semantic-router-extproc
165+
# newTag: local
166+
167+
# 6. 重新部署
168+
kubectl delete deployment semantic-router -n vllm-semantic-router-system
169+
kubectl apply -k deploy/kubernetes/
170+
kubectl get pods -n vllm-semantic-router-system -w
171+
```
172+
173+
---
174+
175+
## 验证部署
176+
177+
```bash
178+
# 查看 Pod 状态
179+
kubectl get pods -n vllm-semantic-router-system
180+
181+
# 查看详细信息
182+
kubectl describe pod -n vllm-semantic-router-system -l app=semantic-router
183+
184+
# 查看日志
185+
kubectl logs -f deployment/semantic-router -n vllm-semantic-router-system
186+
187+
# 检查镜像
188+
kubectl get pods -n vllm-semantic-router-system -o jsonpath='{.items[*].spec.containers[*].image}'
189+
```
190+
191+
---
192+
193+
## 常见问题
194+
195+
### Q: init 容器仍然失败,提示无法拉取 python:3.11-slim
196+
197+
A: 同样需要预先拉取这个基础镜像:
198+
199+
```bash
200+
# 本地拉取
201+
docker pull python:3.11-slim
202+
203+
# 加载到 Kind
204+
kind load docker-image python:3.11-slim --name semantic-router-cluster
205+
```
206+
207+
### Q: Hugging Face 模型下载失败
208+
209+
A: 可以使用 Hugging Face 镜像站点:
210+
211+
```bash
212+
export HF_ENDPOINT=https://hf-mirror.com
213+
```
214+
215+
然后在 deployment.yaml 的 init 容器中添加环境变量:
216+
217+
```yaml
218+
env:
219+
- name: HF_ENDPOINT
220+
value: "https://hf-mirror.com"
221+
```
222+
223+
---
224+
225+
## 其他注意事项
226+
227+
1. **init 容器下载模型**:deployment 中的 init 容器需要从 Hugging Face 下载模型,这也可能因为网络问题失败。建议在 VPN 环境下先本地下载模型,然后挂载到容器中。
228+
229+
2. **资源限制**:当前配置需要较多资源(6Gi 内存)。如果你的机器资源有限,可以进一步调整 `deploy/kubernetes/deployment.yaml` 中的资源限制。
230+
231+
3. **持久化存储**:模型使用 PVC 存储,确保 Kind 集群的存储类可用:
232+
233+
```bash
234+
kubectl get storageclass
235+
```

tools/kind/build-and-load-image.sh

Lines changed: 112 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,112 @@
1+
#!/bin/bash
2+
# Build and load semantic-router image locally into Kind cluster
3+
# This is the most reliable solution for China mainland network issues
4+
5+
set -e
6+
7+
# Color codes for output
8+
RED='\033[0;31m'
9+
GREEN='\033[0;32m'
10+
YELLOW='\033[1;33m'
11+
NC='\033[0m' # No Color
12+
13+
CLUSTER_NAME="semantic-router-cluster"
14+
IMAGE_NAME="semantic-router-extproc"
15+
IMAGE_TAG="local"
16+
17+
echo -e "${GREEN}=== Building and Loading Semantic Router Image into Kind ===${NC}"
18+
echo ""
19+
20+
# Check if Kind cluster exists
21+
if ! kind get clusters 2>/dev/null | grep -q "$CLUSTER_NAME"; then
22+
echo -e "${RED}Error: Cluster '${CLUSTER_NAME}' does not exist.${NC}"
23+
echo -e "${YELLOW}Please create it first using:${NC}"
24+
echo -e " kind create cluster --name ${CLUSTER_NAME} --config tools/kind/kind-config.yaml"
25+
exit 1
26+
fi
27+
28+
echo -e "${GREEN}Step 1: Building Docker image locally...${NC}"
29+
echo -e "${YELLOW}This will use your local VPN connection${NC}"
30+
echo ""
31+
32+
# Build the image locally
33+
docker build -t ${IMAGE_NAME}:${IMAGE_TAG} -f Dockerfile.extproc .
34+
35+
if [ $? -ne 0 ]; then
36+
echo -e "${RED}Failed to build Docker image${NC}"
37+
exit 1
38+
fi
39+
40+
echo ""
41+
echo -e "${GREEN}Step 2: Loading image into Kind cluster...${NC}"
42+
echo ""
43+
44+
# Load the image into Kind
45+
kind load docker-image ${IMAGE_NAME}:${IMAGE_TAG} --name ${CLUSTER_NAME}
46+
47+
if [ $? -ne 0 ]; then
48+
echo -e "${RED}Failed to load image into Kind cluster${NC}"
49+
exit 1
50+
fi
51+
52+
echo ""
53+
echo -e "${GREEN}Step 3: Updating Kubernetes manifests...${NC}"
54+
echo ""
55+
56+
# Update the kustomization.yaml to use the local image
57+
cat > deploy/kubernetes/kustomization.yaml.new << EOF
58+
apiVersion: kustomize.config.k8s.io/v1beta1
59+
kind: Kustomization
60+
61+
metadata:
62+
name: semantic-router
63+
64+
resources:
65+
- namespace.yaml
66+
- pvc.yaml
67+
- deployment.yaml
68+
- service.yaml
69+
70+
# Generate ConfigMap
71+
configMapGenerator:
72+
- name: semantic-router-config
73+
files:
74+
- config.yaml
75+
- tools_db.json
76+
77+
# Namespace for all resources
78+
namespace: vllm-semantic-router-system
79+
80+
images:
81+
- name: ghcr.io/vllm-project/semantic-router/extproc
82+
newName: ${IMAGE_NAME}
83+
newTag: ${IMAGE_TAG}
84+
85+
EOF
86+
87+
mv deploy/kubernetes/kustomization.yaml.new deploy/kubernetes/kustomization.yaml
88+
89+
echo -e "${GREEN}Updated kustomization.yaml to use local image${NC}"
90+
echo ""
91+
92+
echo -e "${GREEN}Step 4: Applying changes to cluster...${NC}"
93+
echo ""
94+
95+
# Delete the existing deployment if it exists
96+
kubectl delete deployment semantic-router -n vllm-semantic-router-system --ignore-not-found=true
97+
98+
# Wait a moment for cleanup
99+
sleep 5
100+
101+
# Reapply the manifests
102+
kubectl apply -k deploy/kubernetes/
103+
104+
echo ""
105+
echo -e "${GREEN}=== Setup Complete! ===${NC}"
106+
echo ""
107+
echo -e "${YELLOW}Monitor the deployment with:${NC}"
108+
echo -e " ${GREEN}kubectl get pods -n vllm-semantic-router-system -w${NC}"
109+
echo ""
110+
echo -e "${YELLOW}Check logs with:${NC}"
111+
echo -e " ${GREEN}kubectl logs -f deployment/semantic-router -n vllm-semantic-router-system${NC}"
112+
echo ""

0 commit comments

Comments
 (0)