feat: add forward_headers support to inference passthrough provider#5134
feat: add forward_headers support to inference passthrough provider#5134skamenan7 wants to merge 3 commits intollamastack:mainfrom
Conversation
699c2ff to
b25b54a
Compare
|
cc: @leseb I have addressed your safety pr #5004 here and refactored to a common utility so this functionality can be easily reused for follow on PRs for other providers as per your suggestion. Thanks! ps: I can also open another github issue and pr for safety passthrough #5004 if keeping them separate from this PR makes sense. |
6c5a64b to
f8ecbf6
Compare
58ae1c4 to
18fa1ad
Compare
c3fd868 to
714278a
Compare
907a43c to
cfc16a8
Compare
952ca64 to
d4097a6
Compare
cdoern
left a comment
There was a problem hiding this comment.
looks pretty good. one question
c1f918b to
03d8dcb
Compare
cdoern
left a comment
There was a problem hiding this comment.
looks reasonable now, one question
f838254 to
e18e4f4
Compare
5c9b9be to
5c8c7e1
Compare
|
@skamenan7 unit tests are failing |
src/llama_stack/providers/remote/inference/passthrough/__init__.py
Outdated
Show resolved
Hide resolved
5c8c7e1 to
5b35b2e
Compare
9bf5003 to
1b67e61
Compare
@leseb looks like the unit-tests (3.12) failure is a pre-existing flake in test_remote_vllm.py::test_openai_chat_completion_is_async, unrelated to this PR. It's a timing-sensitive test that runs 4 parallel 0.5s coroutines and asserts the total time is <1.0s — on a loaded CI runner it occasionally fails over. The test passes locally consistently and isn't in our diff. |
850056a to
cd8018e
Compare
cd8018e to
30b3461
Compare
What does this PR do?
Adds per-request HTTP header forwarding to the
remote::passthroughinference provider, following the pattern established by the safety passthrough provider (PR #5004, already merged).A
forward_headersconfig field maps provider-data keys to outbound HTTP header names. Only explicitly listed keys are forwarded fromX-LlamaStack-Provider-Datato the downstream service (default-deny). Anextra_blocked_headersfield lets operators add custom blocked names on top of the core security list.The shared utility
providers/utils/forward_headers.pyis used by both the inference and safety passthrough providers, keeping the forwarding logic and blocked-header policy in one place.Closes #5040
Relates #4607
Test Plan
Unit tests cover the full path — config validation, header extraction, CRLF sanitization, blocked-header enforcement, auth priority chain, and concurrent request isolation:
Tests cover:
build_forwarded_headers()— key mapping, default-deny, CRLF stripping, SecretStr unwrap, case-insensitive dedupvalidate_forward_headers_config()— blocked header rejection, operator extra blocklist, invalid namesAlso tested end-to-end locally against a mock inference server and a mock
/v1/moderationsserver. Headers land on the downstream exactly as configured and blocked headers are rejected at stack startup, not at request time.Example config:
Callers pass credentials via
X-LlamaStack-Provider-Data:The downstream receives
Authorization: Bearer user-jwtandX-Tenant-ID: acme. Only keys explicitly listed inforward_headersare forwarded to the downstream service. Any keys inX-LlamaStack-Provider-Datathat don't have a mapping inforward_headersare ignored — they never leave the stack. This is the default-deny policy: if it's not in the config, it doesn't get forwarded.