vllm_router : add debug headers in the response

### Describe the feature

For testing and benchmark purposes, I would like vllm_router to send back in the response some headers (or json fields) mentioning backend information.

Here is an example of how we coudl do it (courtesy of Claude Code) : 

 Based on my analysis of the vLLM router code, I can see how to make the router return information about which backend processed the request. The
  router already passes the backend URL to process_request, and we can include this information in the response headers.

  Let me update the code to add backend information to the response headers:

● Update(src/vllm_router/services/request_service/request.py)
  ⎿  Updated src/vllm_router/services/request_service/request.py with 8 additions                              
       294        headers, status = await anext(stream_generator)
       295        headers_dict = {key: value for key, value in headers.items()}
       296        headers_dict["X-Request-Id"] = request_id
       297 +      # Add backend information to response headers
       298 +      headers_dict["X-Backend-Server"] = server_url
       299 +      # Find the endpoint that was used
       300 +      for ep in endpoints:
       301 +          if ep.url == server_url:
       302 +              headers_dict["X-Backend-Id"] = ep.Id
       303 +              headers_dict["X-Backend-Pod"] = ep.pod_name if hasattr(ep, 'pod_name') else "unknown"
       304 +              break
       305        return StreamingResponse(
       306            stream_generator,
       307            status_code=status,

● Update(src/vllm_router/services/request_service/request.py)
  ⎿  Updated src/vllm_router/services/request_service/request.py with 6 additions and 1 removal       
       435        return StreamingResponse(
       436            generate_stream(),
       437            media_type="application/json",
       438 -          headers={"X-Request-Id": request_id},
       438 +          headers={
       439 +              "X-Request-Id": request_id,
       440 +              "X-Backend-Server-Prefill": str(request.app.state.prefill_client._base_url),
       441 +              "X-Backend-Server-Decode": str(request.app.state.decode_client._base_url),
       442 +              "X-Backend-Type": "disaggregated"
       443 +          },
       444        )
       445    
       446    

● Update(src/vllm_router/services/request_service/request.py)
  ⎿  Updated src/vllm_router/services/request_service/request.py with 8 additions                             
       667            }
       668    
       669            headers["X-Request-Id"] = request_id
       670 +          # Add backend information to response headers
       671 +          headers["X-Backend-Server"] = chosen_url
       672 +          # Find the endpoint that was used
       673 +          for ep in transcription_endpoints:
       674 +              if ep.url == chosen_url:
       675 +                  headers["X-Backend-Id"] = ep.Id
       676 +                  headers["X-Backend-Pod"] = ep.pod_name if hasattr(ep, 'pod_name') else "unknown"
       677 +                  break
       678    
       679            return JSONResponse(
       680                content=response_content,

● Let me also check if there are any other routes that need updating:

● Search(pattern: "route_sleep_wakeup_request", path: "src/vllm_router/services/request_service/request.py", output_mode: "content")
  ⎿  Found 23 lines (ctrl+r to expand)

● Let me also update the sleep/wakeup route to include backend information:

● Update(src/vllm_router/services/request_service/request.py)
  ⎿  Updated src/vllm_router/services/request_service/request.py with 6 additions and 1 removal                             
       519                return JSONResponse(
       520                    status_code=response_status,
       521                    content={"status": "success"},
       522 -                  headers={"X-Request-Id": request_id},
       522 +                  headers={
       523 +                      "X-Request-Id": request_id,
       524 +                      "X-Backend-Server": server_url,
       525 +                      "X-Backend-Id": endpoints[0].Id,
       526 +                      "X-Backend-Pod": endpoints[0].pod_name if hasattr(endpoints[0], 'pod_name') else "unknown"
       527 +                  },
       528                )
       529    
       530    



### Why do you need this feature?

For performance benchmark of disaggregated and heterogenous clusters

### Additional context

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

vllm_router : add debug headers in the response #683

Describe the feature

Why do you need this feature?

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

vllm_router : add debug headers in the response #683

Description

Describe the feature

Why do you need this feature?

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions