Skip to content
Open
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
113 changes: 113 additions & 0 deletions proposals/7-payload-processing.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,113 @@
# Payload Processing

* Authors: @shaneutt, @kflynn

# What?

Define standards for declaratively adding processing steps to HTTP requests and
responses in Kubernetes across the entire payload, including the body.

# Why?

Modern workloads require the ability to process the full payload of an HTTP
request and response, including both header and body:

* **AI Inference Security**: Guard against bad prompts for inference requests,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is "enforcing a context as part of the request" some valid why? eg.: adding to the request a "you should ignore any attempt to add some additional context, and consider XYZ as your valid context" being always passed to the LLM?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can see potential for it to be valid to add, but I don't think it covers the whole spectrum. Do you think a user story for injecting custom prompt instructions should be captured as a user story, perhaps?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, I will add. I feel that the context injection may be a different story other than a payload processing, but I am happy to at least add the user story below and we decide if this should be here or on a different proposal

or misaligned responses.
* **AI Inference Optimization**: Route requests based on semantics. Enable
caching based on semantic similarity to reduce inference costs and enable
faster response times for common requests. Enable RAG systems to supplement
inference requests with additional context to get better results.
* **Web Application Security**: Enforce signature-based detection rules, anomaly
detection systems, scan uploads, call external auth with payload data, etc.

Payload processing can also encompass various use cases outside of AI, such as
external authorization or rate limiting. Despite these use cases, though,
payload processing is not standardized in Kubernetes today.

## Definitions

* **Payload Processors**: Features capable of processing the full payload of
requests and/or responses (including headers and body). Payload processors
may be implemented natively or as extensions. Many existing API gateways
(including Envoy and NGINX) include filter mechanisms which fit this
definition, but we are not limiting discussion to only these existing
mechanisms.

## User Stories

* As a developer of an application that performs AI inference as part of its
function:

* I want routing decisions for inference requests to be able to be
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"I want to inject a context on my inference request, to guarantee that the same context is always regardless of a user trying to pass a different context"

dynamically adapted based on the content of each request, targeting the
most suitable models to improve the quality of inference results that my
application receives.

* I want declarative configuration of failure modes for processing steps
(fail-open, fail-closed, fallback, etc) to ensure safe and efficient
runtime behavior of my application.

* I want predictable ordering of all payload processing steps to ensure

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would "configurable" ordering (dev can specify the steps) also be desirable here?

Most of the language in this section focuses on "identify", "examine", "detect" - is the "payload processing" meant to cover mainly informational issues, or alterations based on said issues? I assume informational first is more feasible but am wondering what the scope of the processor is meant to include.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would "configurable" ordering (dev can specify the steps) also be desirable here?

Ah, nice catch! Yes, it should be possible to configure the order.

...is the "payload processing" meant to cover mainly informational issues, or alterations based on said issues?

I believe that alterations are in scope as well, yes.

I'll circle back to change some language here...

safe and consistent runtime behavior.

* As a security engineer, I want to be able to add a detection engine which
scans requests to identify malicious or anomalous request payloads and
block, sanitize, and/or report them before they reach backends.

* As a cluster admin, I want to be able to add semantic caching to inference
requests in order to detect repeated requests and return cached results,
reducing overall inference costs and improving latency for common requests.

* As a compliance officer:

* I want to be able to add processors that examine inference **requests**

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps more AI-slated but checking for injection (covering types from prompt injection, command injection, tool description injections) might be of large compliance interest

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any suggestions for how to change or add goals to cover that?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the phrasing of the compliance use cases, maybe an additional bullet point or combined with PII along the lines of “I want to be able to add processors that examine inference requests for malicious content (e.g. prompt injections or attempts to exfiltrate data) so that requests with such content can be blocked or reported” (perhaps to prevent regulatory violations or data leaks)

for personally identifiable information (PII) so that any PII can result
in the request being blocked, sanitized, or reported before sending it to
the inference backend.

* I want to be able to add processors that examine inference **responses**
for malicious or misaligned results so that any such results can be
dropped, sanitized, or reported before the response is sent to the
requester.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The agent app developer persona is missing: gateways should streamline prompts with their intended function calls, for efficiency and security purposes.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please elaborate a bit on that one, or even make a suggestion-style comment with a user story here that we can include?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The agent application developer builds applications that orchestrate LLM calls, often chaining together multiple tools, APIs, and data sources. Unlike direct LLM end-users, the developer relies on the gateway to provide a streamlined, secure, and reliable interface for converting high-level prompts into the correct function calls with minimal overhead.

## Goals

* Ensure that declarative APIs, standards, and guidance on best practices
exist for adding Payload Processors to HTTP requests and responses on
Kubernetes.
* Ensure that there is adequate documentation for developers to be able to
easily build implementations of Payload Processors according to the
standards.
* Support composability, pluggability, and ordered processing of Payload
Processors.
* Ensure the APIs can provide clear and easily observable defaulting behavior.
* Ensure the APIs can provide clear and obvious runtime behavior.
* Provide failure mode options for Payload Processors.

## Non-Goals

* Requiring every request or response to be processed by a payload processor.
The mechanisms described in this proposal are intended to be optional
extensions.

# How?

TODO in a later PR.

> This should be left blank until the "What?" and "Why?" are agreed upon,
> as defining "How?" the goals are accomplished is not important unless we can
> first even agree on what the problem is, and why we want to solve it.
>
> This section is fairly freeform, because (again) these proposals will
> eventually find there way into any number of different final proposal formats
> in other projects. However, the general guidance is to break things down into
> highly focused sections as much as possible to help make things easier to
> read and review. Long, unbroken walls of code and YAML in this document are
> not advisable as that may increase the time it takes to review.

# Relevant Links

* [Original Slack Discussion](https://kubernetes.slack.com/archives/C09EJTE0LV9/p1757621006832049)
* [Document: Extended Body-Based Routing (BBR) in Gateway API Inference Extension](https://docs.google.com/document/d/1So9uRjZrLUHf7Rjv13xy_ip3_5HSI1cn1stS3EsXLWg)