[DONT MERGE] Branch for vllm nightly ci #1610

Jackmin801 · 2026-01-17T00:22:06Z

Note

Introduces token-in chat completions and adapts to vLLM nightly module/layout changes.

Adds OpenAIServingChatWithTokens and /v1/chat/completions/tokens route; allows overriding prompt_token_ids from request tokens and supports streaming/full responses
Updates imports to new vLLM OpenAI modules (chat_completion.protocol/serving, engine.protocol) and adjusts server wiring
Monkeypatches:
- PrometheusStatLogger init to bypass DP-mode LoRA check while restoring LoRA metrics
- OpenAIServingModels.load_lora_adapter to reuse/update existing adapters by name
- LRUCacheWorkerLoRAManager to avoid redundant per-request loads and manage cache activation/eviction
Customizes init_app_state and worker proc to ensure patches are applied in multi-API-server setups
Test utils now only consider SUCCESS step lines matching the reward pattern for numeric checks

^{Written by Cursor Bugbot for commit b16f34a. This will update automatically on new commits. Configure here.}

Jackmin801 force-pushed the integ-vllm-main branch from 0aa1c3c to 36c9cc1 Compare January 17, 2026 18:52

Jackmin801 marked this pull request as ready for review January 18, 2026 18:25

Jackmin801 added 3 commits January 21, 2026 01:04

make search for line more robust

d21cc6b

need the fix in above line as well

f398de0

import path changes

b16f34a

Jackmin801 force-pushed the integ-vllm-main branch from 36c9cc1 to b16f34a Compare January 21, 2026 01:33

Provide feedback