|
| 1 | +--- |
| 2 | +title: Network Tips |
| 3 | +sidebar_label: Network Tips |
| 4 | +--- |
| 5 | + |
| 6 | +This guide shows how to build and run in restricted or slow network environments without modifying repo files. You’ll use small local override files and a compose override so the codebase stays clean. |
| 7 | + |
| 8 | +What you’ll solve: |
| 9 | + |
| 10 | +- Hugging Face model downloads blocked/slow |
| 11 | +- Go modules fetching blocked during Docker build |
| 12 | +- PyPI access for the mock-vLLM test image |
| 13 | + |
| 14 | +## TL;DR: Choose your path |
| 15 | + |
| 16 | +- Fastest and most reliable: use local models in `./models` and skip HF network entirely. |
| 17 | +- Otherwise: mount an HF cache + set mirror env vars via a compose override. |
| 18 | +- For building: use an override Dockerfile to set Go mirrors (examples provided). |
| 19 | +- For mock-vllm: use an override Dockerfile to set pip mirror (examples provided). |
| 20 | + |
| 21 | +You can mix these based on your situation. |
| 22 | + |
| 23 | +## 1. Hugging Face models |
| 24 | + |
| 25 | +The router will download embedding models on first run unless you provide them locally. Prefer Option A if possible. |
| 26 | + |
| 27 | +### Option A — Use local models (no external network) |
| 28 | + |
| 29 | +1) Download the required model(s) with any reachable method (VPN/offline) into the repo’s `./models` folder. Example layout: |
| 30 | + |
| 31 | + - `models/all-MiniLM-L12-v2/` |
| 32 | + - `models/category_classifier_modernbert-base_model` |
| 33 | + |
| 34 | +2) In `config/config.yaml`, point to the local path. Example: |
| 35 | + |
| 36 | + ```yaml |
| 37 | + bert_model: |
| 38 | + # point to a local folder under /app/models (already mounted by compose) |
| 39 | + model_id: /app/models/all-MiniLM-L12-v2 |
| 40 | + ``` |
| 41 | +
|
| 42 | +3) No extra env is required. `docker-compose.yml` already mounts `./models:/app/models:ro`. |
| 43 | + |
| 44 | +### Option B — Use HF cache + mirror |
| 45 | + |
| 46 | +Create a compose override to persist cache and use a regional mirror (example below uses a China mirror). Save as `docker-compose.override.yml` in the repo root: |
| 47 | + |
| 48 | +```yaml |
| 49 | +services: |
| 50 | + semantic-router: |
| 51 | + volumes: |
| 52 | + - ~/.cache/huggingface:/root/.cache/huggingface |
| 53 | + environment: |
| 54 | + - HUGGINGFACE_HUB_CACHE=/root/.cache/huggingface |
| 55 | + - HF_HUB_ENABLE_HF_TRANSFER=1 |
| 56 | + - HF_ENDPOINT=https://hf-mirror.com # example mirror endpoint (China) |
| 57 | +``` |
| 58 | + |
| 59 | +Optional: pre-warm cache on the host (only if you have `huggingface_hub` installed): |
| 60 | + |
| 61 | +```bash |
| 62 | +python -m pip install -U huggingface_hub |
| 63 | +python - <<'PY' |
| 64 | +from huggingface_hub import snapshot_download |
| 65 | +snapshot_download(repo_id="sentence-transformers/all-MiniLM-L6-v2", local_dir="~/.cache/huggingface/hub/models--sentence-transformers--all-MiniLM-L6-v2") |
| 66 | +PY |
| 67 | +``` |
| 68 | + |
| 69 | +## 2. Build with Go mirrors (Dockerfile override) |
| 70 | + |
| 71 | +When building `Dockerfile.extproc`, the Go stage may hang on `proxy.golang.org`. Create an override Dockerfile that enables mirrors without touching the original. |
| 72 | + |
| 73 | +1) Create `Dockerfile.extproc.cn` at repo root with this content: |
| 74 | + |
| 75 | +```Dockerfile |
| 76 | +# syntax=docker/dockerfile:1 |
| 77 | +
|
| 78 | +FROM rust:1.85 AS rust-builder |
| 79 | +RUN apt-get update && apt-get install -y make build-essential pkg-config && rm -rf /var/lib/apt/lists/* |
| 80 | +WORKDIR /app |
| 81 | +COPY tools/make/ tools/make/ |
| 82 | +COPY Makefile ./ |
| 83 | +COPY candle-binding/Cargo.toml candle-binding/ |
| 84 | +COPY candle-binding/src/ candle-binding/src/ |
| 85 | +RUN make rust |
| 86 | +
|
| 87 | +FROM golang:1.24 AS go-builder |
| 88 | +WORKDIR /app |
| 89 | +
|
| 90 | +# Go module mirrors (example: goproxy.cn) |
| 91 | +ENV GOPROXY=https://goproxy.cn,direct |
| 92 | +ENV GOSUMDB=sum.golang.google.cn |
| 93 | +
|
| 94 | +RUN mkdir -p src/semantic-router |
| 95 | +COPY src/semantic-router/go.mod src/semantic-router/go.sum src/semantic-router/ |
| 96 | +COPY candle-binding/go.mod candle-binding/semantic-router.go candle-binding/ |
| 97 | +
|
| 98 | +# Pre-download modules to fail fast if mirrors are unreachable |
| 99 | +RUN cd src/semantic-router && go mod download && \ |
| 100 | + cd /app/candle-binding && go mod download |
| 101 | +
|
| 102 | +COPY src/semantic-router/ src/semantic-router/ |
| 103 | +COPY --from=rust-builder /app/candle-binding/target/release/libcandle_semantic_router.so /app/candle-binding/target/release/ |
| 104 | +
|
| 105 | +ENV CGO_ENABLED=1 |
| 106 | +ENV LD_LIBRARY_PATH=/app/candle-binding/target/release |
| 107 | +RUN mkdir -p bin && cd src/semantic-router && go build -o ../../bin/router cmd/main.go |
| 108 | +
|
| 109 | +FROM quay.io/centos/centos:stream9 |
| 110 | +WORKDIR /app |
| 111 | +COPY --from=go-builder /app/bin/router /app/extproc-server |
| 112 | +COPY --from=go-builder /app/candle-binding/target/release/libcandle_semantic_router.so /app/lib/ |
| 113 | +COPY config/config.yaml /app/config/ |
| 114 | +ENV LD_LIBRARY_PATH=/app/lib |
| 115 | +EXPOSE 50051 |
| 116 | +COPY scripts/entrypoint.sh /app/entrypoint.sh |
| 117 | +RUN chmod +x /app/entrypoint.sh |
| 118 | +ENTRYPOINT ["/app/entrypoint.sh"] |
| 119 | +``` |
| 120 | + |
| 121 | +2) Point compose to the override Dockerfile by extending `docker-compose.override.yml`: |
| 122 | + |
| 123 | +```yaml |
| 124 | +services: |
| 125 | + semantic-router: |
| 126 | + build: |
| 127 | + dockerfile: Dockerfile.extproc.cn |
| 128 | +``` |
| 129 | + |
| 130 | +## 3. Mock vLLM (PyPI mirror via Dockerfile override) |
| 131 | + |
| 132 | +For the optional testing profile, create an override Dockerfile to configure pip mirrors. |
| 133 | + |
| 134 | +1) Create `tools/mock-vllm/Dockerfile.cn`: |
| 135 | + |
| 136 | +```Dockerfile |
| 137 | +FROM python:3.11-slim |
| 138 | +WORKDIR /app |
| 139 | +RUN apt-get update && apt-get install -y --no-install-recommends curl && rm -rf /var/lib/apt/lists/* |
| 140 | +
|
| 141 | +# Pip mirror (example: TUNA mirror in China) |
| 142 | +RUN python -m pip config set global.index-url https://pypi.tuna.tsinghua.edu.cn/simple && \ |
| 143 | + python -m pip config set global.trusted-host pypi.tuna.tsinghua.edu.cn |
| 144 | +
|
| 145 | +COPY requirements.txt /app/requirements.txt |
| 146 | +RUN pip install --no-cache-dir -r requirements.txt |
| 147 | +
|
| 148 | +COPY app.py /app/app.py |
| 149 | +EXPOSE 8000 |
| 150 | +CMD ["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "8000"] |
| 151 | +``` |
| 152 | + |
| 153 | +2) Extend `docker-compose.override.yml` to use the override Dockerfile for `mock-vllm`: |
| 154 | + |
| 155 | +```yaml |
| 156 | +services: |
| 157 | + mock-vllm: |
| 158 | + build: |
| 159 | + dockerfile: Dockerfile.cn |
| 160 | +``` |
| 161 | + |
| 162 | +## 4. Build and run |
| 163 | + |
| 164 | +With the overrides in place, build and run normally (Compose will auto-merge): |
| 165 | + |
| 166 | +```bash |
| 167 | +# Build all images with overrides |
| 168 | +docker compose -f docker-compose.yml -f docker-compose.override.yml build |
| 169 | +
|
| 170 | +# Run router + envoy |
| 171 | +docker compose -f docker-compose.yml -f docker-compose.override.yml up -d |
| 172 | +
|
| 173 | +# If you need the testing profile (mock-vllm) |
| 174 | +docker compose -f docker-compose.yml -f docker-compose.override.yml --profile testing up -d |
| 175 | +``` |
| 176 | + |
| 177 | +## 5. Troubleshooting |
| 178 | + |
| 179 | +- Go modules still time out: |
| 180 | + - Verify `GOPROXY` and `GOSUMDB` are present in the go-builder stage logs. |
| 181 | + - Try a clean build: `docker compose build --no-cache`. |
| 182 | + |
| 183 | +- HF models still download slowly: |
| 184 | + - Prefer Option A (local models). |
| 185 | + - Ensure the cache volume is mounted and `HF_ENDPOINT`/`HF_HUB_ENABLE_HF_TRANSFER` are set. |
| 186 | + |
| 187 | +- PyPI slow for mock-vllm: |
| 188 | + - Confirm the CN Dockerfile is being used for that service. |
0 commit comments