Skip to content

Commit 32319c3

Browse files
[CI/CD] Add static e2e test for prefixaware (#532)
* [CI] Refactor static discovery testing so that it can support multiple logic Signed-off-by: Rui Zhang <[email protected]> * [CI] Add static e2e test for prefixaware Signed-off-by: Rui Zhang <[email protected]> * [Bug] fix prefixaware for chat completion Signed-off-by: Rui Zhang <[email protected]> * [CI] Code refactor Signed-off-by: Rui Zhang <[email protected]> * reuse vllm backend Signed-off-by: Rui Zhang <[email protected]> * refactor the code Signed-off-by: Rui Zhang <[email protected]> * modify Signed-off-by: Rui Zhang <[email protected]> * fix bug Signed-off-by: Rui Zhang <[email protected]> * fix upload Signed-off-by: Rui Zhang <[email protected]> --------- Signed-off-by: Rui Zhang <[email protected]> Co-authored-by: Rui Zhang <[email protected]>
1 parent 406923c commit 32319c3

File tree

6 files changed

+614
-100
lines changed

6 files changed

+614
-100
lines changed

.github/template-chatml.jinja

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
{% for message in messages %}{{'<|im_start|>' + message['role'] + '\n' + message['content']}}{% if (loop.last and add_generation_prompt) or not loop.last %}{{ '<|im_end|>' + '\n'}}{% endif %}{% endfor %}
2+
{% if add_generation_prompt and messages[-1]['role'] != 'assistant' %}{{ '<|im_start|>assistant\n' }}{% endif %}

.github/workflows/router-e2e-test.yml

Lines changed: 17 additions & 37 deletions
Original file line numberDiff line numberDiff line change
@@ -209,7 +209,8 @@ jobs:
209209
needs: e2e-test
210210
if: github.event.pull_request.draft == false
211211
env:
212-
LOG_DIR: /tmp/debug-logs-${{ github.event.pull_request.number || 'main' }}
212+
LOG_DIR: /tmp/static-discovery-e2e-test-${{ github.event.pull_request.number || 'main' }}
213+
213214
steps:
214215
- name: Check out repository code
215216
uses: actions/checkout@v4
@@ -232,46 +233,26 @@ jobs:
232233
run: |
233234
echo "🚀 Starting vLLM serve backend"
234235
mkdir -p "$LOG_DIR"
235-
CUDA_VISIBLE_DEVICES=0 vllm serve facebook/opt-125m --port 8001 --gpu-memory-utilization 0.7 > "$LOG_DIR/backend1.log" 2>&1 &
236-
CUDA_VISIBLE_DEVICES=1 vllm serve facebook/opt-125m --port 8002 --gpu-memory-utilization 0.7 > "$LOG_DIR/backend2.log" 2>&1 &
237-
sleep 3
236+
CUDA_VISIBLE_DEVICES=0 vllm serve facebook/opt-125m --port 8001 --gpu-memory-utilization 0.7 --chat-template .github/template-chatml.jinja > "$LOG_DIR/backend1.log" 2>&1 &
237+
CUDA_VISIBLE_DEVICES=1 vllm serve facebook/opt-125m --port 8002 --gpu-memory-utilization 0.7 --chat-template .github/template-chatml.jinja > "$LOG_DIR/backend2.log" 2>&1 &
238238
239239
- name: Wait for backends to be ready
240240
run: |
241241
echo "⏳ Waiting for backends to be ready"
242-
chmod +x .github/wait-for-backends.sh
243-
./.github/wait-for-backends.sh 180 "http://localhost:8001" "http://localhost:8002"
242+
chmod +x tests/e2e/wait-for-backends.sh
243+
./tests/e2e/wait-for-backends.sh 180 "http://localhost:8001" "http://localhost:8002"
244244
245-
- name: Start Router with static discovery and roundrobin routing
245+
- name: Run All Static Discovery Routing Tests
246246
env:
247247
PYTHONPATH: ${{ github.workspace }}/src
248248
run: |
249-
echo "🔧 Starting router with static discovery and roundrobin routing"
250-
echo "PYTHONPATH=$PYTHONPATH"
251-
# Start router in background with log capture
252-
python3 -m src.vllm_router.app --port 30080 \
253-
--service-discovery static \
254-
--static-backends "http://localhost:8001,http://localhost:8002" \
255-
--static-models "facebook/opt-125m,facebook/opt-125m" \
256-
--static-model-types "chat,chat" \
257-
--log-stats \
258-
--log-stats-interval 10 \
259-
--engine-stats-interval 10 \
260-
--request-stats-window 10 \
261-
--routing-logic roundrobin > "$LOG_DIR/router.log" 2>&1 &
262-
ROUTER_PID=$!
263-
echo "Router started with PID: $ROUTER_PID"
264-
# Check if router is running
265-
timeout 30 bash -c 'until curl -s http://localhost:30080 > /dev/null 2>&1; do sleep 1; done' || {
266-
echo "❌ Router failed to start within 30 seconds"
267-
exit 1
268-
}
269-
echo "✅ Router started successfully"
270-
271-
- name: Run static discovery E2E test
272-
run: |
273-
echo "🧪 Running static discovery test"
274-
python3 tests/e2e/test-static-discovery.py --num-requests 20 --verbose --log-file-path "$LOG_DIR/router.log" --router-url http://localhost:30080
249+
echo "🧪 Running all static discovery routing tests sequentially"
250+
chmod +x tests/e2e/run-static-discovery-routing-test.sh
251+
./tests/e2e/run-static-discovery-routing-test.sh all \
252+
--pythonpath "$PYTHONPATH" \
253+
--log-dir "$LOG_DIR" \
254+
--num-requests 20 \
255+
--verbose
275256
timeout-minutes: 5
276257

277258
- name: Archive static discovery test results and logs
@@ -280,14 +261,13 @@ jobs:
280261
with:
281262
name: static-discovery-test-results-pr-${{ github.event.pull_request.number || 'main' }}
282263
path: |
283-
/tmp/static-discovery-results-*
284-
$LOG_DIR/
264+
${{ env.LOG_DIR }}/*
285265
286266
- name: Cleanup processes
287267
if: always()
288268
run: |
289269
echo "🧹 Cleaning up processes"
290-
pkill -f "vllm serve"
291-
pkill -f "python3 -m src.vllm_router.app"
270+
pkill -f "vllm serve" || true
271+
pkill -f "python3 -m src.vllm_router.app" || true
292272
293273
- run: echo "🍏 Static discovery e2e test job status is ${{ job.status }}."

src/vllm_router/routers/routing_logic.py

Lines changed: 28 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -398,14 +398,40 @@ async def route_request(
398398
longest prefix match)
399399
"""
400400

401+
# Handle chat completions
402+
if "messages" in request_json:
403+
# Get the last message from the messages array
404+
messages = request_json["messages"]
405+
if messages:
406+
# Concatenate all message content
407+
prompt_parts = []
408+
for message in messages:
409+
content = message.get("content", "")
410+
if isinstance(content, list):
411+
# Handle multimodal messages
412+
text_content = " ".join(
413+
part.get("text", "")
414+
for part in content
415+
if part.get("type") == "text"
416+
)
417+
prompt_parts.append(text_content)
418+
elif content is not None:
419+
prompt_parts.append(content)
420+
prompt = "\n".join(prompt_parts)
421+
else:
422+
prompt = ""
423+
else:
424+
# Handle regular completions
425+
prompt = request_json["prompt"]
426+
401427
available_endpoints = set(endpoint.url for endpoint in endpoints)
402428
_, matched_endpoint = await self.hashtrie.longest_prefix_match(
403-
request_json["prompt"], available_endpoints
429+
prompt, available_endpoints
404430
)
405431

406432
selected_endpoint = random.choice(list(matched_endpoint))
407433

408-
await self.hashtrie.insert(request_json["prompt"], selected_endpoint)
434+
await self.hashtrie.insert(prompt, selected_endpoint)
409435

410436
return selected_endpoint
411437

0 commit comments

Comments
 (0)