Skip to content

feat: add forward_headers support to inference passthrough provider#5134

Open
skamenan7 wants to merge 3 commits intollamastack:mainfrom
skamenan7:feat/5040-Inference-passthrough-provider
Open

feat: add forward_headers support to inference passthrough provider#5134
skamenan7 wants to merge 3 commits intollamastack:mainfrom
skamenan7:feat/5040-Inference-passthrough-provider

Conversation

@skamenan7
Copy link
Contributor

@skamenan7 skamenan7 commented Mar 13, 2026

What does this PR do?

Adds per-request HTTP header forwarding to the remote::passthrough inference provider, following the pattern established by the safety passthrough provider (PR #5004, already merged).

A forward_headers config field maps provider-data keys to outbound HTTP header names. Only explicitly listed keys are forwarded from X-LlamaStack-Provider-Data to the downstream service (default-deny). An extra_blocked_headers field lets operators add custom blocked names on top of the core security list.

The shared utility providers/utils/forward_headers.py is used by both the inference and safety passthrough providers, keeping the forwarding logic and blocked-header policy in one place.

Closes #5040
Relates #4607

Test Plan

Unit tests cover the full path — config validation, header extraction, CRLF sanitization, blocked-header enforcement, auth priority chain, and concurrent request isolation:

uv run pytest tests/unit/providers/inference/test_passthrough_forward_headers.py -v

Tests cover:

  • build_forwarded_headers() — key mapping, default-deny, CRLF stripping, SecretStr unwrap, case-insensitive dedup
  • validate_forward_headers_config() — blocked header rejection, operator extra blocklist, invalid names
  • Adapter auth priority — static api_key > passthrough_api_key > forwarded Authorization
  • Provider data validator — extra fields preserved for forwarding, reserved keys rejected
  • Concurrent request isolation — contextvars don't leak between parallel requests

Also tested end-to-end locally against a mock inference server and a mock /v1/moderations server. Headers land on the downstream exactly as configured and blocked headers are rejected at stack startup, not at request time.

Example config:

providers:
  inference:
    - provider_id: maas-inference
      provider_type: remote::passthrough
      config:
        base_url: ${env.PASSTHROUGH_URL}
        forward_headers:
          maas_api_token: "Authorization"
          tenant_id: "X-Tenant-ID"

Callers pass credentials via X-LlamaStack-Provider-Data:

curl http://localhost:8321/v1/chat/completions \
  -H 'X-LlamaStack-Provider-Data: {"maas_api_token": "Bearer user-jwt", "tenant_id": "acme"}' \
  -d '{"model": "passthrough/my-model", "messages": [{"role": "user", "content": "hello"}]}'

The downstream receives Authorization: Bearer user-jwt and X-Tenant-ID: acme. Only keys explicitly listed in forward_headers are forwarded to the downstream service. Any keys in X-LlamaStack-Provider-Data that don't have a mapping in forward_headers are ignored — they never leave the stack. This is the default-deny policy: if it's not in the config, it doesn't get forwarded.

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Mar 13, 2026
@skamenan7 skamenan7 force-pushed the feat/5040-Inference-passthrough-provider branch 2 times, most recently from 699c2ff to b25b54a Compare March 13, 2026 15:41
@skamenan7 skamenan7 marked this pull request as ready for review March 13, 2026 15:45
@skamenan7
Copy link
Contributor Author

skamenan7 commented Mar 13, 2026

cc: @leseb I have addressed your safety pr #5004 here and refactored to a common utility so this functionality can be easily reused for follow on PRs for other providers as per your suggestion. Thanks!

ps: I can also open another github issue and pr for safety passthrough #5004 if keeping them separate from this PR makes sense.

@skamenan7 skamenan7 force-pushed the feat/5040-Inference-passthrough-provider branch 2 times, most recently from 6c5a64b to f8ecbf6 Compare March 13, 2026 21:30
@skamenan7 skamenan7 force-pushed the feat/5040-Inference-passthrough-provider branch 4 times, most recently from 58ae1c4 to 18fa1ad Compare March 16, 2026 21:00
@skamenan7 skamenan7 force-pushed the feat/5040-Inference-passthrough-provider branch 2 times, most recently from c3fd868 to 714278a Compare March 17, 2026 21:37
@skamenan7 skamenan7 force-pushed the feat/5040-Inference-passthrough-provider branch 2 times, most recently from 907a43c to cfc16a8 Compare March 18, 2026 11:15
@skamenan7 skamenan7 requested a review from leseb March 18, 2026 11:29
@skamenan7 skamenan7 force-pushed the feat/5040-Inference-passthrough-provider branch from 952ca64 to d4097a6 Compare March 18, 2026 11:33
Copy link
Collaborator

@cdoern cdoern left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks pretty good. one question

@skamenan7 skamenan7 force-pushed the feat/5040-Inference-passthrough-provider branch 7 times, most recently from c1f918b to 03d8dcb Compare March 19, 2026 19:46
Copy link
Collaborator

@cdoern cdoern left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks reasonable now, one question

@skamenan7 skamenan7 force-pushed the feat/5040-Inference-passthrough-provider branch 3 times, most recently from f838254 to e18e4f4 Compare March 20, 2026 17:16
@skamenan7 skamenan7 requested a review from cdoern March 20, 2026 17:52
@skamenan7 skamenan7 force-pushed the feat/5040-Inference-passthrough-provider branch 2 times, most recently from 5c9b9be to 5c8c7e1 Compare March 23, 2026 11:49
@leseb
Copy link
Collaborator

leseb commented Mar 23, 2026

@skamenan7 unit tests are failing

@skamenan7 skamenan7 force-pushed the feat/5040-Inference-passthrough-provider branch from 5c8c7e1 to 5b35b2e Compare March 23, 2026 13:31
@skamenan7 skamenan7 requested a review from leseb March 23, 2026 14:20
@skamenan7 skamenan7 force-pushed the feat/5040-Inference-passthrough-provider branch from 9bf5003 to 1b67e61 Compare March 23, 2026 14:31
@skamenan7
Copy link
Contributor Author

@leseb looks like the unit-tests (3.12) failure is a pre-existing flake in test_remote_vllm.py::test_openai_chat_completion_is_async, unrelated to this PR. It's a timing-sensitive test that runs 4 parallel 0.5s coroutines and asserts the total time is <1.0s — on a loaded CI runner it occasionally fails over. The test passes locally consistently and isn't in our diff.

@skamenan7 skamenan7 force-pushed the feat/5040-Inference-passthrough-provider branch 3 times, most recently from 850056a to cd8018e Compare March 23, 2026 16:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: inference passthrough provider for forwarding request headers

3 participants